.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5. .TH SNIFFLES "1" "February 2022" "sniffles 2.0.2" "User Commands" .SH NAME sniffles \- structural variation caller using third-generation sequencing .SH DESCRIPTION usage: sniffles \fB\-\-input\fR SORTED_INPUT.bam [\-\-vcf OUTPUT.vcf] [\-\-snf MERGEABLE_OUTPUT.snf] [\-\-threads 4] [\-\-non\-germline] .PP Sniffles2: A fast structural variant (SV) caller for long\-read sequencing data .IP Version 2.0.2 Contact: moritz.g.smolka@gmail.com .IP Usage example A \- Call SVs for a single sample: .IP sniffles \fB\-\-input\fR sorted_indexed_alignments.bam \fB\-\-vcf\fR output.vcf .IP \&... OR, with CRAM input and bgzipped+tabix indexed VCF output: .IP sniffles \fB\-\-input\fR sample.cram \fB\-\-vcf\fR output.vcf.gz .IP \&... OR, producing only a SNF file with SV candidates for later multi\-sample calling: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-snf\fR sample1.snf .IP \&... OR, simultaneously producing a single\-sample VCF and SNF file for later multi\-sample calling: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-vcf\fR sample1.vcf.gz \fB\-\-snf\fR sample1.snf .IP \&... OR, with additional options to specify tandem repeat annotations (for improved call accuracy), reference (for DEL sequences) and non\-germline mode for detecting rare SVs: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-vcf\fR sample1.vcf.gz \fB\-\-tandem\-repeats\fR tandem_repeats.bed \fB\-\-reference\fR genome.fa \fB\-\-non\-germline\fR .IP Usage example B \- Multi\-sample calling: .IP Step 1. Create .snf for each sample: sniffles \fB\-\-input\fR sample1.bam \fB\-\-snf\fR sample1.snf Step 2. Combined calling: sniffles \fB\-\-input\fR sample1.snf sample2.snf ... sampleN.snf \fB\-\-vcf\fR multisample.vcf .IP \&... OR, using a .tsv file containing a list of .snf files, and custom sample ids in an optional second column (one sample per line): Step 2. Combined calling: sniffles \fB\-\-input\fR snf_files_list.tsv \fB\-\-vcf\fR multisample.vcf .IP Usage example C \- Determine genotypes for a set of known SVs (force calling): .IP sniffles \fB\-\-input\fR sample.bam \fB\-\-genotype\-vcf\fR input_known_svs.vcf \fB\-\-vcf\fR output_genotypes.vcf .IP Use \fB\-\-help\fR for full parameter/usage information .SS "optional arguments:" .TP \fB\-h\fR, \fB\-\-help\fR show this help message and exit .TP \fB\-\-version\fR show program's version number and exit .SS "Common parameters:" .TP \fB\-i\fR IN [IN ...], \fB\-\-input\fR IN [IN ...] For single\-sample calling: A coordinate\-sorted and indexed .bam/.cram (BAM/CRAM format) file containing aligned reads. \- OR \- For multi\-sample calling: Multiple .snf files (generated before by running Sniffles2 for individual samples with \fB\-\-snf\fR) (default: None) .TP \fB\-v\fR OUT.vcf, \fB\-\-vcf\fR OUT.vcf VCF output filename to write the called and refined SVs to. If the given filename ends with .gz, the VCF file will be automatically bgzipped and a .tbi index built for it. (default: None) .TP \fB\-\-snf\fR OUT.snf Sniffles2 file (.snf) output filename to store candidates for later multi\-sample calling (default: None) .TP \fB\-\-reference\fR reference.fasta (Optional) Reference sequence the reads were aligned against. To enable output of deletion SV sequences, this parameter must be set. (default: None) .TP \fB\-\-tandem\-repeats\fR IN.bed (Optional) Input .bed file containing tandem repeat annotations for the reference genome. (default: None) .TP \fB\-\-non\-germline\fR Call non\-germline SVs (rare, somatic or mosaic SVs) (default: False) .TP \fB\-\-phase\fR Determine phase for SV calls (requires the input alignments to be phased) (default: False) .TP \fB\-t\fR N, \fB\-\-threads\fR N Number of parallel threads to use (speed\-up for multi\-core CPUs) (default: 4) .SS "SV Filtering parameters:" .TP \fB\-\-minsupport\fR auto Minimum number of supporting reads for a SV to be reported (default: automatically choose based on coverage) (default: auto) .TP \fB\-\-minsupport\-auto\-mult\fR 0.1/0.025 Coverage based minimum support multiplier for germline/non\-germline modes (only for auto minsupport) (default: None) .TP \fB\-\-minsvlen\fR N Minimum SV length (in bp) (default: 35) .TP \fB\-\-minsvlen\-screen\-ratio\fR N Minimum length for SV candidates (as fraction of \fB\-\-minsvlen\fR) (default: 0.95) .TP \fB\-\-mapq\fR N Alignments with mapping quality lower than this value will be ignored (default: 25) .TP \fB\-\-no\-qc\fR Output all SV candidates, disregarding quality control steps. (default: False) .TP \fB\-\-qc\-stdev\fR True Apply filtering based on SV start position and length standard deviation (default: True) .TP \fB\-\-qc\-stdev\-abs\-max\fR N Maximum standard deviation for SV length and size (in bp) (default: 500) .TP \fB\-\-qc\-strand\fR False Apply filtering based on strand support of SV calls (default: False) .TP \fB\-\-qc\-coverage\fR N Minimum surrounding region coverage of SV calls (default: 1) .TP \fB\-\-long\-ins\-length\fR 2500 Insertion SVs longer than this value are considered as hard to detect based on the aligner and read length and subjected to more sensitive filtering. (default: 2500) .TP \fB\-\-long\-del\-length\fR 50000 Deletion SVs longer than this value are subjected to central coverage drop\-based filtering (Not applicable for \fB\-\-non\-germline\fR) (default: 50000) .TP \fB\-\-long\-del\-coverage\fR 0.66 Long deletions with central coverage (in relation to upstream/downstream coverage) higher than this value will be filtered (Not applicable for \fB\-\-non\-germline\fR) (default: 0.66) .TP \fB\-\-long\-dup\-length\fR 50000 Duplication SVs longer than this value are subjected to central coverage increase\-based filtering (Not applicable for \fB\-\-non\-germline\fR) (default: 50000) .TP \fB\-\-long\-dup\-coverage\fR 1.33 Long duplications with central coverage (in relation to upstream/downstream coverage) lower than this value will be filtered (Not applicable for \fB\-\-non\-germline\fR) (default: 1.33) .TP \fB\-\-max\-splits\-kb\fR N Additional number of splits per kilobase read sequence allowed before reads are ignored (default: 0.1) .TP \fB\-\-max\-splits\-base\fR N Base number of splits allowed before reads are ignored (in addition to \fB\-\-max\-splits\-kb\fR) (default: 3) .TP \fB\-\-min\-alignment\-length\fR N Reads with alignments shorter than this length (in bp) will be ignored (default: 1000) .TP \fB\-\-phase\-conflict\-threshold\fR F Maximum fraction of conflicting reads permitted for SV phase information to be labelled as PASS (only for \fB\-\-phase\fR) (default: 0.1) .TP \fB\-\-detect\-large\-ins\fR True Infer insertions that are longer than most reads and therefore are spanned by few alignments only. (default: True) .SS "SV Clustering parameters:" .TP \fB\-\-cluster\-binsize\fR N Initial screening bin size in bp (default: 100) .TP \fB\-\-cluster\-r\fR R Multiplier for SV start position standard deviation criterion in cluster merging (default: 2.5) .TP \fB\-\-cluster\-repeat\-h\fR H Multiplier for mean SV length criterion for tandem repeat cluster merging (default: 1.5) .TP \fB\-\-cluster\-repeat\-h\-max\fR N Max. merging distance based on SV length criterion for tandem repeat cluster merging (default: 1000) .TP \fB\-\-cluster\-merge\-pos\fR N Max. merging distance for insertions and deletions on the same read and cluster in non\-repeat regions (default: 150) .TP \fB\-\-cluster\-merge\-len\fR F Max. size difference for merging SVs as fraction of SV length (default: 0.33) .TP \fB\-\-cluster\-merge\-bnd\fR N Max. merging distance for breakend SV candidates. (default: 1500) .SS "SV Genotyping parameters:" .TP \fB\-\-genotype\-ploidy\fR N Sample ploidy (currently fixed at value 2) (default: 2) .TP \fB\-\-genotype\-error\fR N Estimated false positive rate for leads (relating to total coverage) (default: 0.05) .TP \fB\-\-sample\-id\fR SAMPLE_ID Custom ID for this sample, used for later multi\-sample calling (stored in .snf) (default: None) .TP \fB\-\-genotype\-vcf\fR IN.vcf Determine the genotypes for all SVs in the given input .vcf file (forced calling). Re\-genotyped .vcf will be written to the output file specified with \fB\-\-vcf\fR. (default: None) .SS "Multi-Sample Calling / Combine parameters:" .TP \fB\-\-combine\-high\-confidence\fR F Minimum fraction of samples in which a SV needs to have individually passed QC for it to be reported in combined output (a value of zero will report all SVs that pass QC in at least one of the input samples) (default: 0.0) .TP \fB\-\-combine\-low\-confidence\fR F Minimum fraction of samples in which a SV needs to be present (failed QC) for it to be reported in combined output (default: 0.2) .TP \fB\-\-combine\-low\-confidence\-abs\fR N Minimum absolute number of samples in which a SV needs to be present (failed QC) for it to be reported in combined output (default: 3) .TP \fB\-\-combine\-null\-min\-coverage\fR N Minimum coverage for a sample genotype to be reported as 0/0 (sample genotypes with coverage below this threshold at the SV location will be output as ./.) (default: 5) .TP \fB\-\-combine\-match\fR N Maximum deviation of multiple SV's start/end position for them to be combined across samples. Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter. (default: 500) .TP \fB\-\-combine\-consensus\fR Output the consensus genotype of all samples (default: False) .TP \fB\-\-combine\-separate\-intra\fR Disable combination of SVs within the same sample (default: False) .TP \fB\-\-combine\-output\-filtered\fR Include low\-confidence / putative non\-germline SVs in multi\-calling (default: False) .SS "SV Postprocessing, QC and output parameters:" .TP \fB\-\-output\-rnames\fR Output names of all supporting reads for each SV in the RNAMEs info field (default: False) .TP \fB\-\-no\-consensus\fR Disable consensus sequence generation for insertion SV calls (may improve performance) (default: False) .TP \fB\-\-no\-sort\fR Do not sort output VCF by genomic coordinates (may slightly improve performance) (default: False) .TP \fB\-\-no\-progress\fR Disable progress display (default: False) .TP \fB\-\-quiet\fR Disable all logging, except errors (default: False) .TP \fB\-\-max\-del\-seq\-len\fR N Maximum deletion sequence length to be output. Deletion SVs longer than this value will be written to the output as symbolic SVs. (default: 50000) .TP \fB\-\-symbolic\fR Output all SVs as symbolic, including insertions and deletions, instead of reporting nucleotide sequences. (default: False) .IP Usage example A \- Call SVs for a single sample: .IP sniffles \fB\-\-input\fR sorted_indexed_alignments.bam \fB\-\-vcf\fR output.vcf .IP \&... OR, with CRAM input and bgzipped+tabix indexed VCF output: .IP sniffles \fB\-\-input\fR sample.cram \fB\-\-vcf\fR output.vcf.gz .IP \&... OR, producing only a SNF file with SV candidates for later multi\-sample calling: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-snf\fR sample1.snf .IP \&... OR, simultaneously producing a single\-sample VCF and SNF file for later multi\-sample calling: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-vcf\fR sample1.vcf.gz \fB\-\-snf\fR sample1.snf .IP \&... OR, with additional options to specify tandem repeat annotations (for improved call accuracy), reference (for DEL sequences) and non\-germline mode for detecting rare SVs: .IP sniffles \fB\-\-input\fR sample1.bam \fB\-\-vcf\fR sample1.vcf.gz \fB\-\-tandem\-repeats\fR tandem_repeats.bed \fB\-\-reference\fR genome.fa \fB\-\-non\-germline\fR .IP Usage example B \- Multi\-sample calling: .IP Step 1. Create .snf for each sample: sniffles \fB\-\-input\fR sample1.bam \fB\-\-snf\fR sample1.snf Step 2. Combined calling: sniffles \fB\-\-input\fR sample1.snf sample2.snf ... sampleN.snf \fB\-\-vcf\fR multisample.vcf .IP \&... OR, using a .tsv file containing a list of .snf files, and custom sample ids in an optional second column (one sample per line): Step 2. Combined calling: sniffles \fB\-\-input\fR snf_files_list.tsv \fB\-\-vcf\fR multisample.vcf .IP Usage example C \- Determine genotypes for a set of known SVs (force calling): .IP sniffles \fB\-\-input\fR sample.bam \fB\-\-genotype\-vcf\fR input_known_svs.vcf \fB\-\-vcf\fR output_genotypes.vcf .SH AUTHOR This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.