|simrisc(1)||simrisc breast cancer simulation program||simrisc(1)|
NAME¶simrisc - This program performs simulations in the context of breast cancer
SYNOPSIS¶simrisc [options] analyses
The analyses argument is the name of the file specifying the analyses to perform. See section ANALYSES for details.
DESCRIPTION¶Simrisc was originally designed around 2010 by Marcel Greuter at the University Medical Center Groningen, and thereafter modified in 2015 by Chris de Jonge.
Changes introduced in version 14.04.00¶
- Parameters affected by spread: true
Parameters that may vary are specified using triplets: value, spread and distribution. In all cases the spread values and distribution names are optional: they can both be omitted or both must be specified. If these parameters are not specified then their value parameter won’t vary if spread: true is specified;
- The Mammo, Tomo, and MRI modalities are provided with std.dev and distribution parameters for their Dose, M, Beta, Specificity, and Sensitivity parameters;
- When spread: true is specified the actually used and original parameter values are listed in a file, by default spread-$.txt, where $ is replaced by the loop iteration index. Use the option -s to specify a non-default filename (cf. simrisc(1));
- Age ranges no longer have trailing colons;
- The Case-specific data matrix defines an extra (18th) column, showing the results of the screening rounds for each simulated case;
- The order of the beir7 beta and eta parameters is reversed: eta is specified first, followed by beta. The spread and distribution parameters following beta apply to beta, and not to eta, which is a fixed value.
OPTIONS¶Short options are provided between parentheses, immediately following their long option equivalents. Several parameters specify the path-names of files produced by simrisc. If a path-name starts with a tilde character (~) then the tilde is replaced by the user’s home directory. An initial + is replaced by the program’s base directory (see option base). When an analysis uses multiple iterations then `$’ characters in filename specifications are replaced by the analysis’ interation index.
All single-letter options referring to filesystem entries (directories, filenames) are capitalized, all other single-letter options are lowercase.
- --base=basedir (-B)
the base directory where the output files will be written. By default ./. If basedir doesn’t exist it is created by the program. If the directory cannot be created and exception is thrown, terminating the program. The basedir specifications may specify relative or absolute directory locations;
- --config=path (-C)
the location of the configuration file. By default ~/.config/simrisc’ is used.
- --data=path (-D)
path name of the file to contain the data of the cases generated by the simulation (default: ’<base>/data-$.txt’). If a data file should not be written specify ! (mnemonic: the logical not operator, i.e., --data !). See section OUTPUT for a description of the generated data;
- --death-age=age (-a)
run one simulation using a specific natural death-age. This option also requires the specification of tumor-age, and is mutually exclusive with the case option;
- --err (-e)
before version 14.05.00 the Beir7 risk vector was computed using an incorrect algorithm. The error was fixed in simrisc 14.05.00. When it is required to use the erroneous algorithm, e.g., to compare the results of simulations with previously obtained results, option -e must be specified. However, option -e should not be used for real simulations;
- --help (-h)
shows help information and terminates;
- --last-case=nCases (-l)
perform simulations until nCases cases have been analyzed and only write the data for the final case to the data file. The rounds and sensitivity files contain the summarized results of all nCases analyzed cases;
- --one-analysis (-o)
the program’s arguments specify the parameters of a single analysis, rather than the name of an analyses-specification file. The program’s arguments are optional and are used to alter the parameter values as defined in the config file or to define label specifications. See section ANALYSES for details;
- --parameters=path (-P)
path name of the file showing the actually used parameter specifications. By default no parameter file is written;
- --rounds=path (-R)
path name of the file to containing the summary info of the simulation rounds (default: ’<base>/rounds-$.txt’). If a rounds file should not be written specify ! (i.e., --rounds !). See section OUTPUT for a description of the generated summary info;
- --spread=path (-s)
path name of the file to contain the actually used and original parameter values when spread: true is specified (default: ’<base>/spread-$.txt’). If this file should not be written specify ! (mnemonic: the logical not operator, i.e., --spread !). See section OUTPUT for a sample of its content;
- --sensitivity=path (-S)
path name of the file to containing the summary info of the simulation’s sensitivity data (default: ’<base>/sensitivity.txt’). If a sensitivity file should not be written specify ! (i.e., --sensitivity !). See section OUTPUT for a description of the produced sensitivity summary;
- --tumor-age=age (-t)
run one simulation using a specific tumor self-detect age. This option also requires the specification of death-age, and is mutually exclusive with the case option;
- --verbose (-V)
provides additional information while running;
- --version (-v)
shows simrisc’s version information and terminates;
ANALYSES¶Unless the one-analysis option is used the program’s first and only required argument is the name of a file providing the details of the analyses to perform. Those files are called analysis files and must be standard ascii text files. I.e., they only contain 7-bit ascii printable and white-space characters. The identifiers used in analysis files and in configuration files are interpreted case sensitively.
Parameter specifications starting with uppercase letters (like Analysis: and Scenario:) specify (sub)sections and contain no additional specifications. Specifications starting with lowercase letters (like ageGroup:) are followed by actual parameter values. For a complete overview refer to the simriscparams(7) man-page.
Analysis files may define multiple analyses. Each analysis specification must begin with a line containing
Analysis:At each Analysis: specification the program’s initial configuration is reset: the default option values are used unless redefined by explicitly provided command-line options. Explicitly provided command-line options cannot be altered by specifications in Analysis: sections and remain active while simrisc is running.
Following Analysis: lines the characteristics of the analysis are specified. These specifications may, in the following order:
- define label: lines (label: lines, when used, must immediately follow Analysis: lines). The text following label: lines is written at the top of the output files;
- alter default simrisc options;
- redefine parameter specifications of configuration files;
All specifications in Analysis: sections are optional. An Analysis: specification merely containing the line Analysis: defines an analysis using the explicitly specified command-line options or the default program options and using the parameter specifications provided in the configuration file.
Empty lines, trailing white-space, and all characters on lines starting at the hash-mark (#) are ignored and may be used anywhere in analysis files.
Lines not conforming to the above description result in error messages, causing simrisc to end.
When parameters of configuration file sections (cf. simriscparams(7)) are omitted from Analysis: sections then the parameters as specified in the configuration file are used.
When program options are specified their long option names must be used. E.g.:
base: /tmp/ last-case: 20
Command-line options always overrule options specified in analysis files.
Multiple analysis sections should not use identically named output files, as the output files are (re)written for each separate analysis.
Analysis sections are commonly used to alter the default specifications of the configuration file. E.g., the default number of iterations equals 1. By specifying
iterations: 3the analysis performs 3 iterations.
Parameters are either read from the configuration file or they are redefined in Analysis: sections. E.g., in de provided configuration file screening rounds use two-year intervals between the ages of 50 and 74. To use screening rounds using 5-year intervals, between ages 50 and 65, then an Analysis: specification could be, e.g.,
Screening: round: 50 Mammo MRI round: 55 Mammo MRI round: 60 Mammo MRI round: 65 Mammo MRI
When the --one-analysis option is used parameters may be altered by providing comma-separated parameter specifications as program command-line arguments. E.g., to perform one analysis, writing the data file to /tmp/data, simulating 1000 cases, and using 20 as seed for the random number generator the command
simrisc -D /tmp/data -o Scenario:, cases: 1000, seed: 20can be used. Note that when using the one-analysis option parameter section names must precede parameter specifications. E.g., since the parameters cases and seed are defined in the `Scenario’ section (cf. simriscparams(7)) they must be preceded by the Scenario: specification.
Subsequent Analysis: specifications in analysis files use the program options as specified on the command line (or, if not specified, the default program options) and uses the configuration file’s parameter specifications. So when an Analysis: specification modifies parameters, then subsequent Analysis: sections start from the unmodified option and parameter specifications.
When Analysis: sections specify base: directory locations then file specifications using an initial `+’ character to represent the base directory (like --data +/data.txt) use the base directory specified at the Analysis:’s base: specification.
OUTPUT¶The first lines of the generated files contain time stamps showing the date and time when the files were written and the used SimRisc version. Here is an example, following the RFC 2822 format for the timestamp:
Thu, 08 Apr 2021 21:43:09 +0200 (SimRisc V. 14.04.00)If label: lines are used then the time stamp is followed by the label specifications, which is then followed by an empty line. After this header the file’s specific data are shown.
The data in all files (except for the file listing the actually used parameters (option --parameters (P))) are written using the standard comma-separated format (cf. RFC 4180). The initial lines contain table headings and column labels documenting the meanings of the various columns. Likewise there is a final line ending the tables.
Data of simulated cases
For each simulated case the values of the following variables are written to file (one line of comma-separated values per simulated case):
- case: the (0-based) case-index;
- cause of death: either Natural or Tumor;
- death age: the case’s age of death;
- natural death age: the case’s natural age of death (if no tumor occurs);
- death status: a numeric index specifying how and at what stage the
1: natural death in the pre-screening phase,
2: natural death in the screening phase,
3: natural death in the post-screening phase,
4: tumor caused death in the pre-screening phase,
5: tumor caused death in the screening phase,
6: tumor caused death in the post-screening phase;
- tumor present: Yes if the simulation resulted in a tumor, No if no tumor occurred;
- tumor detected: Yes if the tumor was detected, No if not;
- interval tumor: Yes if the tumor was an interval tumor, No if not;
- tumor diameter: the tumor’s diameter in mm when it was detected. 0.00 is shown if no tumor occurred. In the exceptional case where the simulation produced a tumor whose diameter exceeded 1000 mm the value 1001 is shown.
- tumor doubling days: the time (in days) it takes for the tumor to double its size;
- tumor preclinical period: the age at which the tumor is potentially detectible by Mammographic screening;
- tumor onset age: the age at which the tumor first occurred;
- tumor self-detect age: the age at which the tumor was self-detected. This age is the result of the simulation, and may exceed the case’s actual death age (if so, the case’s data report that no tumor is present);
- tumor death age: the age at which the tumor caused or would have caused he case’s death. The simulation process uses ages ranging from 0 through 100. If the age at which the tumor causes the case’s death exceeds 100, then 100.00 is reported;
- costs: the case’s screening and (if appliccable) treatment costs;
- self-detection indicator: 1 if the tumor was self-detected, 0 if not (also if there’s no tumor);
- detection round: 0-based round index at which the tumor was detected (or 1) if the tumor was self-detected, 0 if not (also if there’s no tumor).
- screening rounds: this column contains show which screening rounds were attended by the simulated cases, and if so whether false negative or false positive diagnoses were made. The following digits are used:
- 0: the case did not attend this screening round;
- 1: the case did attend this screening round;
- 2: the case did attend this screening round, resulting in a false negative diagnosis;
- 3: the case did attend this screening round, resulting in a false positive
diagnosis. There are as many digits as screening rounds. The leftmost
digit refers to the first screening round, the rightmost digit to the last
screening round. E.g., using 12 screening rounds the following indicators
could be obtained:
0011311110000Using screening round indices (which are also used to refer to rounds in the rounds-$.txt files), this case did not attent screening rounds 0, 1, 9, 10, 11 and 12, and at 4 a false positive diagnosis was obtained.
Actually used spread-values
When spread: true is specified then by default the actually used and orgiginal parameter values are written to the file spread-$.txt, where $ is replaced by the loop’s iteration index. Here is a sample from the content of such a file, showing the values of modality Mammo’s Beta parameters:
Beta: nr: 1 using -4.37906 (configured: -4.38) nr: 2 using 0.490436 (configured: 0.49) nr: 3 using -1.33749 (configured: -1.34) nr: 4 using -7.22025 (configured: -7.18)
- ~/.config/simrisc: the default location of the program’s configuration file;
- the simrisc distribution archive contains the default configuration file as simrisc-VERSION/stdconfig/simrisc, where VERSION is replaced by simrisc’s actual release version;
- when installing simrisc using Linux distribution archives (e.g., .deb files) the default configuration file is commonly available as /usr/shared/doc/simrisc/simrisc.gz
COPYRIGHT¶This is free software, distributed under the terms of the GNU General Public License (GPL).
AUTHOR¶Frank B. Brokken (email@example.com),