Authors: Ranjit Lall and Thomas Robinson
We provide a single replication script (Code/code.R
)
that substantively reproduces all results presented in the paper
(including a full imputation of the CCES data using MIDASpy).
This script takes approximately one hour to run. Generated figures
will be saved in the subdirectory Figures/Replication
. To
facilitate comparison, Figures
contains the figures
presented in the paper.
Note: due to the complex nature of our full tests, this is only a
substantive replication (in line with JSS guidelines). For a complete
replication, please run Code/full_code.R
. This file has a
runtime of 1.4 days, most of which is spent on the hyperparameter
test.
All file paths in scripts are relative to the main replication folder.
To aid replication, we include both a YAML file (in
Data
) that initializes a conda environment with the correct
Python package dependencies.
Please ensure you have conda installed on your machine. Next, in a terminal window, navigate to this replication folder. Then, run the following at the command line:
conda env create -f Data/midas-env.yml
rMIDAS and MIDASpy are compatible with Appleās new ARM64 architecture. However, we recommend using the miniforge installer rather than anaconda or miniconda, as it offers better support for the ARM64 architecture.
Once you have installed miniforge, Apple Silicon users should navigate to this replication folder and run the following at the command line:
conda env create -f Data/midas-env-arm64.yml
We replicated this code limiting the memory available to 8GB. The script was also tested on a MacBook Pro with Apple M1 Max chip using miniforge, and a Ubuntu 22.04 linux system.
The paper results generated from Code/full_code.R
were
produced on an Amazon AWS EC2 server using a c6a.8xlarge instance with
64GB RAM and Ubuntu 22.04 Server operating system.
Code/code.R: 58 minutes (single replication script - recommended)
Code/full_code.R: 1.4 days (full script for exact replication)
Code/py_example.py: 12 minutes 33 seconds (just the MIDASpy example)
As noted above, we provide a substantive replication script
(code.R
) due to the Lengthy runtime of the full replication
script (full_code.R
). The two scripts differ in the
following ways:
code.R
, we specify a subset of categorical and
binary columns rather than using all such columns, as in
full_code.R
. This reduces memory load.code.R
rather than using all such columns, as in
full_code.R
.code.R
, we do not run the hyperparameter or learning
rate tests in full_code.R
to avoid lengthy runtimes.
Instead, we leave the code for these tests as comments and read in the
results of our original run to generate the figures.full_code.R
, which matches the existing subsetted data in
code.R
.