Scroll to navigation



cnvkit_batch - Run the complete CNVkit pipeline on one or more BAM files.


usage: cnvkit batch [-h] [-m {hybrid,amplicon,wgs}]

[--segment-method {cbs,flasso,haar,none,hmm,hmm-tumor,hmm-germline}]
[-y] [-c] [--drop-low-coverage] [-p [PROCESSES]] [--rscript-path PATH] [-n [FILES ...]] [-f FILENAME] [-t FILENAME] [-a FILENAME] [--annotate FILENAME] [--short-names] [--target-avg-size TARGET_AVG_SIZE] [-g FILENAME] [--antitarget-avg-size ANTITARGET_AVG_SIZE] [--antitarget-min-size ANTITARGET_MIN_SIZE] [--output-reference FILENAME] [--cluster] [-r REFERENCE] [-d DIRECTORY] [--scatter] [--diagram] [bam_files ...]

positional arguments:

Mapped sequence reads (.bam)


show this help message and exit
Sequencing assay type: hybridization capture ('hybrid'), targeted amplicon sequencing ('amplicon'), or whole genome sequencing ('wgs'). Determines whether and how to use antitarget bins. [Default: hybrid]
Method used in the 'segment' step. [Default: cbs]
Use or assume a male reference (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
Get read depths by counting read midpoints within each bin. (An alternative algorithm).
Drop very-low-coverage bins before segmentation to avoid false-positive deletions in poor-quality tumor samples.
Number of subprocesses used to running each of the BAM files in parallel. Without an argument, use the maximum number of available CPUs. [Default: process each BAM in serial]
Path to the Rscript executable to use for running R code. Use this option to specify a non-default R installation. [Default: Rscript]

To construct a new copy number reference:

Normal samples (.bam) used to construct the pooled, paired, or flat reference. If this option is used but no filenames are given, a "flat" reference will be built. Otherwise, all filenames following this option will be used.
Reference genome, FASTA format (e.g. UCSC hg19.fa)
Target intervals (.bed or .list)
Antitarget intervals (.bed or .list)
Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar.
Reduce multi-accession bait labels to be short and consistent.
Average size of split target bins (results are approximate).
Regions of accessible sequence on chromosomes (.bed), as output by the 'access' command.
Average size of antitarget bins (results are approximate).
Minimum size of antitarget bins (smaller regions are dropped).
Output filename/path for the new reference file being created. (If given, ignores the -o/--output-dir option and will write the file to the given path. Otherwise, "reference.cnn" will be created in the current directory or specified output directory.)
Calculate and use cluster-specific summary stats in the reference pool to normalize samples.

To reuse an existing reference:

Copy number reference file (.cnn).

Output options:

Output directory.
Create a whole-genome copy ratio profile as a PDF scatter plot.
Create an ideogram of copy ratios on chromosomes as a PDF.
September 2022 cnvkit batch 0.9.9