Summary

Below are instructions on replication for: Niehaus, Zhou, Cook, and Jun. “bizicount: Bivariate Zero-Inflated Count Copula Regression using R”, published in JSS v109/i01. The full replication takes roughly 1 hour on Debian GNU/Linux 6.6.15-amd64 using 7 Intel i7-10510U CPUs @ 1.80GHz with 16 GB RAM. Alternatively, it takes 30 minutes on Windows 10 Pro using an Intel i7-12700K using 20 cores @ 4.5GHZ and 32 GB DDR4 RAM. Note that the exact numerical results may differ slightly, depending on the platform, see below for details.

Table of Contents

  1. Dependencies
  2. File descriptions
  3. Replicating the paper

Dependencies

R packages

These can be installed manually; however, if the replication instructions found below are followed, they will be installed automatically. See the session information printed below, or the session_info.txt file for exact versions of these packages.

Platform and R package version dependencies

This replication was done using Debian GNU/Linux bit with the default BLAS/LAPACK installation using R version 4.3.3. Other platforms may use different C++ compiler versions that could lead to differences in numerical results. Using different linear algebra libraries may also give different results due to different algorithms for solving linear systems and differences in numerical precision among them.

The versions of each package are found below:

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux trixie/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Vienna
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RhpcBLASctl_0.23-42 doRNG_1.8.6         rngtools_1.5.2     
 [4] doParallel_1.0.17   iterators_1.0.14    foreach_1.5.2      
 [7] copula_1.1-3        ggplot2_3.5.0       tidyr_1.3.1        
[10] dplyr_1.1.4         bizicount_1.3.2    

loaded via a namespace (and not attached):
 [1] utf8_1.2.4          generics_0.1.3      lattice_0.22-5     
 [4] digest_0.6.35       lme4_1.1-35.1       magrittr_2.0.3     
 [7] grid_4.3.3          mvtnorm_1.2-4       Matrix_1.6-5       
[10] Formula_1.2-5       httr_1.4.7          purrr_1.0.2        
[13] fansi_1.0.6         scales_1.3.0        stabledist_0.7-1   
[16] pbivnorm_0.6.0      codetools_0.2-19    numDeriv_2016.8-1.1
[19] cli_3.6.2           rlang_1.1.3         pspline_1.0-19     
[22] texreg_1.39.3       gsl_2.1-8           munsell_0.5.0      
[25] splines_4.3.3       withr_3.0.0         tools_4.3.3        
[28] nloptr_2.0.3        minqa_1.2.6         colorspace_2.1-0   
[31] boot_1.3-30         vctrs_0.6.5         R6_2.5.1           
[34] stats4_4.3.3        lifecycle_1.0.4     ADGofTest_0.3      
[37] MASS_7.3-60.0.1     pcaPP_2.0-4         pkgconfig_2.0.3    
[40] pillar_1.9.0        gtable_0.3.4        glue_1.7.0         
[43] Rcpp_1.0.12         tibble_3.2.1        tidyselect_1.2.1   
[46] nlme_3.1-164        DHARMa_0.4.6        compiler_4.3.3     

File descriptions


Replication

Note: The Monte Carlo simulations can take about an hour to run. Because of this, users can easily set the run_simulations variable in the v109i01-replication.R script to FALSE. In that case, the output_montes_small.RData file has the results as produced for the paper.

If you choose to re-run the simulations and you are on Windows, there is a chance that you will be prompted for elevated network priveleges. This is because we utilize a PSOCK cluster to run the simulations in parallel. Denying these privileges will cause the replication to fail.

  1. Open the v109i01-replication.R script
  2. Set working directory to the script location.
  3. Source the opened v109i01-replication.R script, either by pressing ctrl/cmd + shift + enter/return or by clicking Code --> Source with Echo at the top of RStudio. Alternatively, type source("v109i01-replication.R", echo = TRUE) into the console.
  4. Output will be in Figures/, and raw console output will be in v109i01-replication.txt.

Note: The simulations will use all but one of the available CPUs.