Scroll to navigation

SNIFFLES(1) User Commands SNIFFLES(1)

NAME

sniffles - structural variation caller using third-generation sequencing

DESCRIPTION

usage: sniffles --input SORTED_INPUT.bam [--vcf OUTPUT.vcf] [--snf MERGEABLE_OUTPUT.snf] [--threads 4] [--non-germline]

Sniffles2: A fast structural variant (SV) caller for long-read sequencing data

Version 2.0.2 Contact: moritz.g.smolka@gmail.com
Usage example A - Call SVs for a single sample:
sniffles --input sorted_indexed_alignments.bam --vcf output.vcf
... OR, with CRAM input and bgzipped+tabix indexed VCF output:
sniffles --input sample.cram --vcf output.vcf.gz
... OR, producing only a SNF file with SV candidates for later multi-sample calling:
sniffles --input sample1.bam --snf sample1.snf
... OR, simultaneously producing a single-sample VCF and SNF file for later multi-sample calling:
sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf
... OR, with additional options to specify tandem repeat annotations (for improved call accuracy), reference (for DEL sequences) and non-germline mode for detecting rare SVs:
sniffles --input sample1.bam --vcf sample1.vcf.gz --tandem-repeats tandem_repeats.bed --reference genome.fa --non-germline
Usage example B - Multi-sample calling:
Step 1. Create .snf for each sample: sniffles --input sample1.bam --snf sample1.snf Step 2. Combined calling: sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf
... OR, using a .tsv file containing a list of .snf files, and custom sample ids in an optional second column (one sample per line): Step 2. Combined calling: sniffles --input snf_files_list.tsv --vcf multisample.vcf
Usage example C - Determine genotypes for a set of known SVs (force calling):
sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf
Use --help for full parameter/usage information

optional arguments:

show this help message and exit
show program's version number and exit

Common parameters:

For single-sample calling: A coordinate-sorted and indexed .bam/.cram (BAM/CRAM format) file containing aligned reads. - OR - For multi-sample calling: Multiple .snf files (generated before by running Sniffles2 for individual samples with --snf) (default: None)
VCF output filename to write the called and refined SVs to. If the given filename ends with .gz, the VCF file will be automatically bgzipped and a .tbi index built for it. (default: None)
Sniffles2 file (.snf) output filename to store candidates for later multi-sample calling (default: None)
(Optional) Reference sequence the reads were aligned against. To enable output of deletion SV sequences, this parameter must be set. (default: None)
(Optional) Input .bed file containing tandem repeat annotations for the reference genome. (default: None)
Call non-germline SVs (rare, somatic or mosaic SVs) (default: False)
Determine phase for SV calls (requires the input alignments to be phased) (default: False)
Number of parallel threads to use (speed-up for multi-core CPUs) (default: 4)

SV Filtering parameters:

Minimum number of supporting reads for a SV to be reported (default: automatically choose based on coverage) (default: auto)
Coverage based minimum support multiplier for germline/non-germline modes (only for auto minsupport) (default: None)
Minimum SV length (in bp) (default: 35)
Minimum length for SV candidates (as fraction of --minsvlen) (default: 0.95)
Alignments with mapping quality lower than this value will be ignored (default: 25)
Output all SV candidates, disregarding quality control steps. (default: False)
Apply filtering based on SV start position and length standard deviation (default: True)
Maximum standard deviation for SV length and size (in bp) (default: 500)
Apply filtering based on strand support of SV calls (default: False)
Minimum surrounding region coverage of SV calls (default: 1)
Insertion SVs longer than this value are considered as hard to detect based on the aligner and read length and subjected to more sensitive filtering. (default: 2500)
Deletion SVs longer than this value are subjected to central coverage drop-based filtering (Not applicable for --non-germline) (default: 50000)
Long deletions with central coverage (in relation to upstream/downstream coverage) higher than this value will be filtered (Not applicable for --non-germline) (default: 0.66)
Duplication SVs longer than this value are subjected to central coverage increase-based filtering (Not applicable for --non-germline) (default: 50000)
Long duplications with central coverage (in relation to upstream/downstream coverage) lower than this value will be filtered (Not applicable for --non-germline) (default: 1.33)
Additional number of splits per kilobase read sequence allowed before reads are ignored (default: 0.1)
Base number of splits allowed before reads are ignored (in addition to --max-splits-kb) (default: 3)
Reads with alignments shorter than this length (in bp) will be ignored (default: 1000)
Maximum fraction of conflicting reads permitted for SV phase information to be labelled as PASS (only for --phase) (default: 0.1)
Infer insertions that are longer than most reads and therefore are spanned by few alignments only. (default: True)

SV Clustering parameters:

Initial screening bin size in bp (default: 100)
Multiplier for SV start position standard deviation criterion in cluster merging (default: 2.5)
Multiplier for mean SV length criterion for tandem repeat cluster merging (default: 1.5)
Max. merging distance based on SV length criterion for tandem repeat cluster merging (default: 1000)
Max. merging distance for insertions and deletions on the same read and cluster in non-repeat regions (default: 150)
Max. size difference for merging SVs as fraction of SV length (default: 0.33)
Max. merging distance for breakend SV candidates. (default: 1500)

SV Genotyping parameters:

Sample ploidy (currently fixed at value 2) (default: 2)
Estimated false positve rate for leads (relating to total coverage) (default: 0.05)
Custom ID for this sample, used for later multi-sample calling (stored in .snf) (default: None)
Determine the genotypes for all SVs in the given input .vcf file (forced calling). Re-genotyped .vcf will be written to the output file specified with --vcf. (default: None)

Multi-Sample Calling / Combine parameters:

Minimum fraction of samples in which a SV needs to have individually passed QC for it to be reported in combined output (a value of zero will report all SVs that pass QC in at least one of the input samples) (default: 0.0)
Minimum fraction of samples in which a SV needs to be present (failed QC) for it to be reported in combined output (default: 0.2)
Minimum absolute number of samples in which a SV needs to be present (failed QC) for it to be reported in combined output (default: 3)
Minimum coverage for a sample genotype to be reported as 0/0 (sample genotypes with coverage below this threshold at the SV location will be output as ./.) (default: 5)
Maximum deviation of multiple SV's start/end position for them to be combined across samples. Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter. (default: 500)
Output the consensus genotype of all samples (default: False)
Disable combination of SVs within the same sample (default: False)
Include low-confidence / putative non-germline SVs in multi-calling (default: False)

SV Postprocessing, QC and output parameters:

Output names of all supporting reads for each SV in the RNAMEs info field (default: False)
Disable consensus sequence generation for insertion SV calls (may improve performance) (default: False)
Do not sort output VCF by genomic coordinates (may slightly improve performance) (default: False)
Disable progress display (default: False)
Disable all logging, except errors (default: False)
Maximum deletion sequence length to be output. Deletion SVs longer than this value will be written to the output as symbolic SVs. (default: 50000)
Output all SVs as symbolic, including insertions and deletions, instead of reporting nucleotide sequences. (default: False)
Usage example A - Call SVs for a single sample:
sniffles --input sorted_indexed_alignments.bam --vcf output.vcf
... OR, with CRAM input and bgzipped+tabix indexed VCF output:
sniffles --input sample.cram --vcf output.vcf.gz
... OR, producing only a SNF file with SV candidates for later multi-sample calling:
sniffles --input sample1.bam --snf sample1.snf
... OR, simultaneously producing a single-sample VCF and SNF file for later multi-sample calling:
sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf
... OR, with additional options to specify tandem repeat annotations (for improved call accuracy), reference (for DEL sequences) and non-germline mode for detecting rare SVs:
sniffles --input sample1.bam --vcf sample1.vcf.gz --tandem-repeats tandem_repeats.bed --reference genome.fa --non-germline
Usage example B - Multi-sample calling:
Step 1. Create .snf for each sample: sniffles --input sample1.bam --snf sample1.snf Step 2. Combined calling: sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf
... OR, using a .tsv file containing a list of .snf files, and custom sample ids in an optional second column (one sample per line): Step 2. Combined calling: sniffles --input snf_files_list.tsv --vcf multisample.vcf
Usage example C - Determine genotypes for a set of known SVs (force calling):
sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf

AUTHOR


This manpage was written by Andreas Tille for the Debian distribution and
can be used for any other usage of the program.

February 2022 sniffles 2.0.2