Simulation Standards

Best practices for experiment design, inputs/outputs, and using QUEST

Experiment design

When setting up a simulation experiment, be considerate about the number of parameters to change and scenarios to run. There is rarely a first single large simulation that will provide immediate answer at once and is correctly setup. Plan for multiple iterations that build up on another to gain confidence and better understanding before building a larger simulation experiment.

  • Prepare a simulation plan (list of scenarios to run): which ones are the main simulation experiments you need to run in order to answer the research question? Which ones are out of scope?
    • You may find it helpful to create an excel sheet with the 1) parameter names and relevant values to explore and 2) combination of parameter values that define a simulation experiment to run. An example is provided [HERE]. More advanced use also includes notes about # number of scenarios, computational resources and time required.
  • Pilot before scaling up: Start simple and use a template scenario that has been validated, then add the intervention or feature of interest for one or a few settings before running all.
  • Technical feasibility before accurate predictions: It is OK to use placeholder parameters for test simulations to develop your code and scripts; however, do keep track of these and do not forget to update them to the correct parameters as soon as the initial testing is done.
    • In the pilot and test simulations, make sure to carefully investigate the input json and output files. See reviewing input and output files below.
  • Give your simulation experiments meaningful names that can be versioned and tracked across iterations. For instance, any test runs may include ‘test’ in its name and have a v0, v1 or a date or similar included in the name or folder in which your simulations are stored.

Some general best practices for scientific computing are described by PLOS Biology in addition to what we specifically recommend for members of this team.

Reviewing input & output files

When designing new experiments, you should make sure to review input and output files to make sure your simulations are doing what you think they are. It can be tricky to get everything setup correctly the first time, even for experienced EMOD users, so this review process will help you verify prior to scaling up. Questions to check for investigating new simulation runs include:

  • Were the campaigns actually deployed, at the correct coverage and time?
  • How often were the campaigns deployed?
  • How does the simulated population change over time?
  • When running with burnin, was the burnin actually “picked up” successfully?
  • Small simulations allow for individual event reports, at what ages and how often did individuals get an intervention or change property?
  • Look at the same metric (i.e. prevalence) noy only at aggregated level over time, but monthly.
  • Are the agebins correctly set up, extracted in the analyzer and aggregated?
  • How do your plots compare to other known relationships?

Using QUEST

Quest is Northwestern’s high-performance computing cluster (HPC) on which we run our EMOD simulations. Quest is a linux-based HPC with the workload manager slurm on it to schedule jobs among its users. Everyone will need to apply for access to the team’s Quest allocation, b1139, here. Once granted access, you will have 80GB of space on your home directory and access to the team allocation which has much more space. We recommend cloning GitHub repositories to the home directory but saving all outputs to an appropriate folder on the team allocation.

Resource Sharing

Because everyone on the team, as well as participating in the program, uses b1139 we need to be conscious of resource sharing. Please follow the best practices below so everyone can have the best experience using the cluster.

  • Be aware of how long you expect your jobs to run. If they will take a long time, it is considerate to run fewer sims at once or to wait until times of low usage (such as evenings or weekends) to start the jobs. You can enable email notification for your submitted jobs using the #SBATCH --mail-type=ALL and #SBATCH --mail-user=<your_email@northwestern.edu> arguments. Once submitted, you can also check in the terminal via squeue -u <username> or squeue –A b1139.
  • When running example exercises, or testing out new projects, you should run simulations on the partition ‘b1139testnode’, where you are less likely to be blocked by larger/longer jobs.
  • If you must run a “big job” with many simulations, discuss any urgent needs for the cluster as others may have time sensitive projects. Submit <100 jobs at a time on b1139 to avoid “clogging” the cluster. It is easy to limit the number of jobs able to run at one time with idmtools, so you can submit all your jobs at once but only the specified “max_run_jobs” will run at one time. As simulations finish, the next ones will start automatically.
  • Debug your simulations with small pilots (see experiment design) to make sure your simulations do what you expect before scaling up.
  • Managing disk space, you can check for space used via typing homedu or checkproject b1139 in the terminal. Simulations should typically be stored in respective project folders on b1139 so they are accessible to other team members for troubleshooting and due to low storage limits on home directories. Be sure to remove old and/or failed simulations when they are no longer needed as they can occupy a great deal of storage space.