Below are instructions on replication for: Niehaus, Zhou, Cook, and Jun. “bizicount: Bivariate Zero-Inflated Count Copula Regression using R”, published in JSS v109/i01. The full replication takes roughly 1 hour on Debian GNU/Linux 6.6.15-amd64 using 7 Intel i7-10510U CPUs @ 1.80GHz with 16 GB RAM. Alternatively, it takes 30 minutes on Windows 10 Pro using an Intel i7-12700K using 20 cores @ 4.5GHZ and 32 GB DDR4 RAM. Note that the exact numerical results may differ slightly, depending on the platform, see below for details.
The Figures/
folder has the tables and figures from
replication. These will be replaced if users run the replication
scripts.
All printed console output from the manuscript can be found in
the v109i01-replication.txt
file.
To re-run the replication materials, source the
v109i01-replication.R
script. If desired, the script can be
changed to exclude the simulation results.
R packages
bizicount
dplyr
tidyr
ggplot2
copula
doParallel
doRNG
RhpcBLASctl
These can be installed manually; however, if the replication
instructions found below are followed, they will be installed
automatically. See the session information printed below, or the
session_info.txt
file for exact versions of these
packages.
Platform and R package version dependencies
This replication was done using Debian GNU/Linux bit with the default BLAS/LAPACK installation using R version 4.3.3. Other platforms may use different C++ compiler versions that could lead to differences in numerical results. Using different linear algebra libraries may also give different results due to different algorithms for solving linear systems and differences in numerical precision among them.
The versions of each package are found below:
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux trixie/sid
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Vienna
tzcode source: system (glibc)
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] RhpcBLASctl_0.23-42 doRNG_1.8.6 rngtools_1.5.2
[4] doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2
[7] copula_1.1-3 ggplot2_3.5.0 tidyr_1.3.1
[10] dplyr_1.1.4 bizicount_1.3.2
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 lattice_0.22-5
[4] digest_0.6.35 lme4_1.1-35.1 magrittr_2.0.3
[7] grid_4.3.3 mvtnorm_1.2-4 Matrix_1.6-5
[10] Formula_1.2-5 httr_1.4.7 purrr_1.0.2
[13] fansi_1.0.6 scales_1.3.0 stabledist_0.7-1
[16] pbivnorm_0.6.0 codetools_0.2-19 numDeriv_2016.8-1.1
[19] cli_3.6.2 rlang_1.1.3 pspline_1.0-19
[22] texreg_1.39.3 gsl_2.1-8 munsell_0.5.0
[25] splines_4.3.3 withr_3.0.0 tools_4.3.3
[28] nloptr_2.0.3 minqa_1.2.6 colorspace_2.1-0
[31] boot_1.3-30 vctrs_0.6.5 R6_2.5.1
[34] stats4_4.3.3 lifecycle_1.0.4 ADGofTest_0.3
[37] MASS_7.3-60.0.1 pcaPP_2.0-4 pkgconfig_2.0.3
[40] pillar_1.9.0 gtable_0.3.4 glue_1.7.0
[43] Rcpp_1.0.12 tibble_3.2.1 tidyselect_1.2.1
[46] nlme_3.1-164 DHARMa_0.4.6 compiler_4.3.3
v109i01-replication.R
– Primary replication
script.install_dependencies.R
– Installs the R dependencies. This script will be executed
automatically if the instructions in replication are followed.montes_small.R
– Script used for simulation results
presented in appendix. If you are on Windows, the
script will most likely result in a prompt asking for R to have network
access. This is required for the parallel processing to work, as the
processing is done over a PSOCK cluster on Windows.plots_small.R
– Script for producing plots from monte
carlo results.empirical_replication.R
– The script that generates the
tables and figures in the main-text and appendix, with the exception of
the theoretical copula functions.plots.R
– Produces the plots for the theoretical copula
functions that are found in Figure 1.Figures/
– Folder containing all figures and tables
produced by the above scripts, including those from the simulations and
in the appendix. It also contains the console_output.txt
file, which is the output of all scripts printed to a text file.output_montes_small.RData
– Data produced from running
the simulations on our machine (overwritten if simulations are
re-run).Note: The Monte Carlo simulations can take about an hour to run.
Because of this, users can easily set the run_simulations
variable in the v109i01-replication.R
script to
FALSE
. In that case, the
output_montes_small.RData
file has the results as produced
for the paper.
If you choose to re-run the simulations and you are on Windows, there is a chance that you will be prompted for elevated network priveleges. This is because we utilize a PSOCK cluster to run the simulations in parallel. Denying these privileges will cause the replication to fail.
v109i01-replication.R
scriptv109i01-replication.R
script, either
by pressing ctrl/cmd
+ shift
+
enter/return
or by clicking
Code --> Source with Echo
at the top of RStudio.
Alternatively, type
source("v109i01-replication.R", echo = TRUE)
into the
console.Figures/
, and raw console output will
be in v109i01-replication.txt
.Note: The simulations will use all but one of the available CPUs.