- stretch 1.3.1-1+b1
- testing 1.9-1
- stretch-backports 1.8-1~bpo9+1
- unstable 1.9-1
BCFTOOLS(1) | BCFTOOLS(1) |
NAME¶
bcftools - utilities for variant calling and manipulating VCFs and BCFs.SYNOPSIS¶
bcftools [--version|--version-only] [--help] [COMMAND] [OPTIONS]DESCRIPTION¶
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations. In general, whenever multiple VCFs are read simultaneously, they must be indexed and therefore also compressed.
BCFtools is designed to work on a stream. It regards an input file "-" as the standard input (stdin) and outputs to the standard output (stdout). Several commands can thus be combined with Unix pipes.
VERSION¶
This manual page was last updated 2016-04-18 14:18 BST and refers to bcftools git version 1.3-36-g47e811c+.BCF1¶
The BCF1 format output by versions of samtools <= 0.1.19 is not compatible with this version of bcftools. To read BCF1 files one can use the view command from old versions of bcftools packaged with samtools versions <= 0.1.19 to convert to VCF, which can then be read by this version of bcftools.samtools-0.1.19/bcftools/bcftools view file.bcf1 | bcftools view
VARIANT CALLING¶
See bcftools call for variant calling from the output of the samtools mpileup command. In versions of samtools <= 0.1.19 calling was done with bcftools view. Users are now required to choose between the old samtools calling model (-c/--consensus-caller) and the new multiallelic calling model (-m/--multiallelic-caller). The multiallelic calling model is recommended for most tasks.LIST OF COMMANDS¶
For a full list of available commands, run bcftools without arguments. For a full list of available options, run bcftools COMMAND without arguments.LIST OF SCRIPTS¶
Some helper scripts are bundled with the bcftools code.COMMANDS AND OPTIONS¶
Common Options¶
The following options are common to many bcftools commands. See usage for specific commands to see if they apply.FILE
-c, --collapse snps|indels|both|all|some|none|id
none
some
all
snps
indels
both
id
-f, --apply-filters LIST
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file FILE
-s, --samples [^]LIST
bcftools view -Ou -s sample1,sample2 file.vcf | bcftools query -f %INFO/AC\t%INFO/AN\n
-S, --samples-file FILE
sample1 1 sample2 2 sample3 2 or sample1 M sample2 F sample3 F or a .ped file (here is shown a minimum working example, the first column is ignored and the last indicates sex: 1=male, 2=female) ignored daughterA fatherA motherA 2 ignored sonB fatherB motherB 1
-t, --targets [^]chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file [^]FILE
bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz
--threads INT
bcftools annotate [OPTIONS] FILE¶
Add or remove annotations.-a, --annotations file
# Sample annotation file with columns CHROM, POS, STRING_TAG, NUMERIC_TAG 1 752566 SomeString 5 1 798959 SomeOtherString 6 # etc.
-c, --columns list
-e, --exclude EXPRESSION
-h, --header-lines file
##INFO=<ID=NUMERIC_TAG,Number=1,Type=Integer,Description="Example header line"> ##INFO=<ID=STRING_TAG,Number=1,Type=String,Description="Yet another header line">
-I, --set-id [+]FORMAT
bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' file.vcf
-i, --include EXPRESSION
-m, --mark-sites TAG
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
--rename-chrs file
-s, --samples [^]LIST
-S, --samples-file FILE
--threads INT
-x, --remove list
Examples:
# Remove three fields bcftools annotate -x ID,INFO/DP,FORMAT/DP file.vcf.gz # Remove all INFO fields and all FORMAT fields except for GT and PL bcftools annotate -x INFO,^FORMAT/GT,FORMAT/PL file.vcf # Add ID, QUAL and INFO/TAG, not replacing TAG if already present bcftools annotate -a src.bcf -c ID,QUAL,+TAG dst.bcf # Carry over all INFO and FORMAT annotations except FORMAT/GT bcftools annotate -a src.bcf -c INFO,^FORMAT/GT dst.bcf # Annotate from a tab-delimited file with six columns (the fifth is ignored), # first indexing with tabix. The coordinates are 1-based. tabix -s1 -b2 -e2 annots.tab.gz bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,POS,REF,ALT,-,TAG file.vcf # Annotate from a tab-delimited file with regions (1-based coordinates, inclusive) tabix -s1 -b2 -e3 annots.tab.gz bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,FROM,TO,TAG inut.vcf # Annotate from a bed file (0-based coordinates, half-closed, half-open intervals) bcftools annotate -a annots.bed.gz -h annots.hdr -c CHROM,FROM,TO,TAG input.vcf
bcftools cnv [OPTIONS] FILE¶
Copy number variation caller, requires a VCF annotated with the Illumina’s B-allele frequency (BAF) and Log R Ratio intensity (LRR) values. The HMM considers the following copy number states: CN 2 (normal), 1 (single-copy loss), 0 (complete loss), 3 (single-copy gain).General Options:
-c, --control-sample string
-f, --AF-file file
*-o, --output-dir path
*-p, --plot-threshold float
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --query-sample string
-t, --targets LIST
-T, --targets-file FILE
HMM Options:
-a, --aberrant float[,float]
-b, --BAF-weight float
d, --BAF-dev float[,float]
-e, --err-prob float
-l, --LRR-weight float
-L, --LRR-smooth-win int
-O, --optimize float
-P, --same-prob float
-x, --xy-prob float
bcftools call [OPTIONS] FILE¶
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.File format options:
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
--ploidy ASSEMBLY[?]
--ploidy-file FILE
X 1 60000 M 1 X 2699521 154931043 M 1 Y 1 59373566 M 1 Y 1 59373566 F 0 MT 1 16569 M 1 MT 1 16569 F 1 * * * M 2 * * * F 2
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --samples LIST
-S, --samples-file FILE
-t, --targets LIST
-T, --targets-file FILE
--threads INT
Input/output options:
-A, --keep-alts
-f, --format-fields list
-g, --gvcf INT
-i, --insert-missed INT
-M, --keep-masked-ref
-V, --skip-variants snps|indels
-v, --variants-only
Consensus/variant calling options:
-c, --consensus-caller
-C, --constrain alleles|trio
alleles
trio
-m, --multiallelic-caller
-n, --novel-rate float[,...]
-p, --pval-threshold float
-P, --prior float
-t, --targets file|chr|chr:pos|chr:from-to|chr:from-[,...]
-X, --chromosome-X
-Y, --chromosome-Y
bcftools concat [OPTIONS] FILE1 FILE2 [...]¶
Concatenate or combine VCF/BCF files. All source files must have the same sample columns appearing in the same order. Can be used, for example, to concatenate chromosome VCFs into one VCF, or combine a SNP VCF and an indel VCF into one. The input files must be sorted by chr and position. The files must be given in the correct order to produce sorted VCF on output unless the -a, --allow-overlaps option is specified. With the --naive option, the files are concatenated without being recompressed, which is very fast but dangerous if the BCF headers differ.-a, --allow-overlaps
-c, --compact-PS
-d, --rm-dups snps|indels|both|all|none
-D, --remove-duplicates
-f, --file-list FILE
-l, --ligate
--no-version
-n, --naive
-o, --output FILE
-O, --output-type b|u|z|v
-q, --min-PQ INT
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file FILE
--threads INT
bcftools consensus [OPTIONS] FILE¶
Create consensus sequence by applying VCF variants to a reference fasta file.-f, --fasta-ref FILE
-H, --haplotype 1|2
-i, --iupac-codes
-m, --mask FILE
-o, --output FILE
-s, --sample NAME
Examples:
# Apply variants present in sample "NA001", output IUPAC codes for hets bcftools consensus -i -s NA001 -f in.fa in.vcf.gz > out.fa # Create consensus for one region. The fasta header lines are then expected # in the form ">chr:from-to". samtools faidx ref.fa 8:11870-11890 | bcftools consensus in.vcf.gz -o out.fa
bcftools convert [OPTIONS] FILE¶
VCF input options:-e, --exclude EXPRESSION
-i, --include EXPRESSION
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file FILE
-s, --samples LIST
-S, --samples-file FILE
-t, --targets LIST
-T, --targets-file FILE
VCF output options:
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
--threads INT
GEN/SAMPLE conversion:
-G, --gensample2vcf prefix or gen-file,sample-file
-g, --gensample prefix or gen-file,sample-file
.gen ---- 1:111485207_G_A 1:111485207_G_A 111485207 G A 0 1 0 0 1 0 1:111494194_C_T 1:111494194_C_T 111494194 C T 0 1 0 0 0 1 .sample ------- ID_1 ID_2 missing 0 0 0 sample1 sample1 0 sample2 sample2 0
--tag STRING
gVCF conversion:
--gvcf2vcf
-f, --fasta-ref file
HAPS/SAMPLE conversion:
--hapsample2vcf prefix or haps-file,sample-file
.haps ---- 1:111485207_G_A rsID1 111485207 G A 0 1 0 0 1:111494194_C_T rsID2 111494194 C T 0 1 0 0 1:111495231_A_<DEL>_111495784 rsID3 111495231 A <DEL> 0 0 1 0
--hapsample prefix or haps-file,sample-file
--haploid2diploid
--vcf-ids
HAPS/LEGEND/SAMPLE conversion:
-H, --haplegendsample2vcf prefix or haps-file,legend-file,sample-file
-h, --haplegendsample prefix or haps-file,legend-file,sample-file
.haps ----- 0 1 0 0 1 0 0 1 0 0 0 1 .legend ------- id position a0 a1 1:111485207_G_A 111485207 G A 1:111494194_C_T 111494194 C T .sample ------- sample population group sex sample1 sample1 sample1 2 sample2 sample2 sample2 2
--haploid2diploid
--vcf-ids
TSV conversion:
--tsv2vcf file
-c, --columns list
-f, --fasta-ref file
-s, --samples LIST
-S, --samples-file FILE
Example:
# Convert 23andme results into VCF bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -Oz -o out.vcf.gz
bcftools filter [OPTIONS] FILE¶
Apply fixed-threshold filters.-e, --exclude EXPRESSION
-g, --SnpGap INT
The SNPs at positions 1 and 7 are filtered, positions 0 and 8 are not: 0123456789 ref .G.GT..G.. del .A.G-..A.. Here the positions 1 and 6 are filtered, 0 and 7 are not: 0123-456789 ref .G.G-..G.. ins .A.GT..A..
-G, --IndelGap INT
The second indel is filtered: 012345678901 ref .GT.GT..GT.. del .G-.G-..G-.. And similarly here, the second is filtered: 01 23 456 78 ref .A-.A-..A-.. ins .AT.AT..AT..
-i, --include EXPRESSION
-m, --mode [+x]
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --soft-filter STRING|+
-S, --set-GTs .|0
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
--threads INT
bcftools gtcheck [OPTIONS] [-g genotypes.vcf.gz] query.vcf.gz¶
Checks sample identity or, without -g, multi-sample cross-check is performed.-a, --all-sites
-g, --genotypes genotypes.vcf.gz
-G, --GTs-only INT
-H, --homs-only
-p, --plot PREFIX
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --query-sample STRING
-S, --target-sample STRING
-t, --targets file
-T, --targets-file file
Output files format:
CN, Discordance
\sum_s { min_G { PL_a(G) + PL_b(G) } },
SM, Average Discordance
SM, Average Depth
SM, Average Number of sites
bcftools index [OPTIONS] <in.bcf>|<in.vcf.gz>¶
Creates index for bgzip compressed VCF/BCF files for random access. CSI (coordinate-sorted index) is created by default. The CSI format supports indexing of chromosomes up to length 2^31. TBI (tabix index) index files, which support chromosome lengths up to 2^29, can be created by using the -t/--tbi option or using the tabix program packaged with htslib. When loading an index file, bcftools will try the CSI first and then the TBI.Indexing options:
-c, --csi
-f, --force
-m, --min-shift INT
-t, --tbi
Stats options:
-n, --nrecords
-s, --stats
bcftools isec [OPTIONS] A.vcf.gz B.vcf.gz [...]¶
Creates intersections, unions and complements of VCF files. Depending on the options, the program can output records from one (or more) files which have (or do not have) corresponding records with the same position in the other files.-c, --collapse snps|indels|both|all|some|none
-C, --complement
-e, --exclude -|EXPRESSION
-f, --apply-filters LIST
-i, --include EXPRESSION
-n, --nfiles [+-=]INT|~BITMAP
-o, --output FILE
-O, --output-type b|u|z|v
-p, --prefix DIR
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
-w, --write LIST
Examples:
Create intersection and complements of two sets saving the output in dir/*
bcftools isec -p dir A.vcf.gz B.vcf.gz
Filter sites in A and B (but not in C) and create intersection
bcftools isec -e'MAF<0.01' -i'dbSNP=1' -e- A.vcf.gz B.vcf.gz C.vcf.gz -p dir
Extract and write records from A shared by both A and B using exact allele match
bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz
Extract records private to A or B comparing by position only
bcftools isec -p dir -n-1 -c all A.vcf.gz B.vcf.gz
Print a list of records which are present in A and B but not in C and D
bcftools isec -n~1100 -c all A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz
bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz [...]¶
Merge multiple VCF/BCF files from non-overlapping sample sets to create one multi-sample file. For example, when merging file A.vcf.gz containing samples S1, S2 and S3 and file B.vcf.gz containing samples S3 and S4, the output file will contain four samples named S1, S2, S3, 2:S3 and S4.Note that it is responsibility of the user to ensure that the sample names are unique across all files. If they are not, the program will exit with an error unless the option --force-samples is given. The sample names can be also given explicitly using the --print-header and --use-header options.
Note that only records from different files can be merged, never from the same file. For "vertical" merge take a look at bcftools norm instead.
--force-samples
--print-header
--use-header FILE
-f, --apply-filters LIST
-i, --info-rules -|TAG:METHOD[,...]
-l, --file-list FILE
-m, --merge snps|indels|both|all|none|id
-m none .. no new multiallelics, output multiple records instead -m snps .. allow multiallelic SNP records -m indels .. allow multiallelic indel records -m both .. both SNP and indel records can be multiallelic -m all .. SNP records can be merged with indel records -m id .. merge by ID
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
--threads INT
bcftools norm [OPTIONS] file.vcf.gz¶
Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows. Left-alignment and normalization will only be applied if the --fasta-ref option is supplied.-c, --check-ref e|w|x|s
-d, --rm-dup snps|indels|both|all|none
-D, --remove-duplicates
-f, --fasta-ref FILE
-m, --multiallelics ←|+>[snps|indels|both|any]
--no-version
-N, --do-not-normalize
-o, --output FILE
-O, --output-type b|u|z|v
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --strict-filter
-t, --targets LIST
-T, --targets-file FILE
--threads INT
-w, --site-win INT
bcftools [plugin NAME|+NAME] [OPTIONS] FILE — [PLUGIN OPTIONS]¶
VCF input options:-e, --exclude EXPRESSION
-i, --include EXPRESSION
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
VCF output options:
--no-version
-o, --output FILE
-O, --output-type b|u|z|v
--threads INT
Plugin options:
-h, --help
-l, --list-plugins
By default, appropriate system directories are searched for installed plugins. You can override this by setting the BCFTOOLS_PLUGINS environment variable to a colon-separated list of directories to search. If BCFTOOLS_PLUGINS begins with a colon, ends with a colon, or contains adjacent colons, the system directories are also searched at that position in the list of directories.
If htslib is not installed systemwide, set the environment variable LD_LIBRARY_PATH (linux) or DYLD_LIBRARY_PATH (Mac OS X) to include the directory where libhts.so.1 is located.
-v, --verbose
-V, --version
List of plugins coming with the distribution:
counts
dosage
fill-AN-AC
fix-ploidy
frameshifts
missing2ref
tag2tag
vcf2sex
Examples:
# List options common to all plugins bcftools plugin # List available plugins bcftools plugin -l # Run a plugin bcftools plugin counts in.vcf # Run a plugin using the abbreviated "+" notation bcftools +counts in.vcf # The input VCF can be streamed just like in other commands cat in.vcf | bcftools +counts # Print usage information of plugin "dosage" bcftools +dosage -h # Replace missing genotypes with 0/0 bcftools +missing2ref in.vcf # Replace missing genotypes with 0|0 bcftools +missing2ref in.vcf -- -p
Plugins troubleshooting:
Things to check if your plugin does not show up in the bcftools plugin -l output:
Plugins API:
// Short description used by 'bcftools plugin -l' const char *about(void); // Longer description used by 'bcftools +name -h' const char *usage(void); // Called once at startup, allows initialization of local variables. // Return 1 to suppress normal VCF/BCF header output, -1 on critical // errors, 0 otherwise. int init(int argc, char **argv, bcf_hdr_t *in_hdr, bcf_hdr_t *out_hdr); // Called for each VCF record, return NULL to suppress the output bcf1_t *process(bcf1_t *rec); // Called after all lines have been processed to clean up void destroy(void);
bcftools polysomy [OPTIONS] file.vcf.gz¶
Detect number of chromosomal copies in VCFs annotates with the Illumina’s B-allele frequency (BAF) values. Note that this command is not compiled in by default, see the section Optional Compilation with GSL in the INSTALL file for help.General options:
-o, --output-dir path
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --sample string
-t, --targets LIST
-T, --targets-file FILE
-v, --verbose
Algorithm options:
-b, --peak-size float
-c, --cn-penalty float
-f, --fit-th float
-i, --include-aa
-m, --min-fraction float
-p, --peak-symmetry float
bcftools query [OPTIONS] file.vcf.gz [file.vcf.gz [...]]¶
Extracts fields from VCF or BCF files and outputs them in user-defined format.-c, --collapse snps|indels|both|all|some|none
-e, --exclude EXPRESSION
-f, --format FORMAT
-H, --print-header
-i, --include EXPRESSION
-l, --list-samples
-o, --output FILE
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --samples LIST
-S, --samples-file FILE
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
-u, --allow-undef-tags
-v, --vcf-list FILE
Format:
%CHROM The CHROM column (similarly also other columns: POS, ID, REF, ALT, QUAL, FILTER) %INFO/TAG Any tag in the INFO column %TYPE Variant type (REF, SNP, MNP, INDEL, OTHER) %MASK Indicates presence of the site in other files (with multiple files) %TAG{INT} Curly brackets to subscript vectors (0-based) %FIRST_ALT Alias for %ALT{0} [] The brackets loop over all samples %GT Genotype (e.g. 0/1) %TGT Translated genotype (e.g. C/A) %IUPACGT Genotype translated to IUPAC ambiguity codes (e.g. M instead of C/A) %LINE Prints the whole line %SAMPLE Sample name
Examples:
bcftools query -f '%CHROM %POS %REF %ALT{0}\n' file.vcf.gz bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz
bcftools reheader [OPTIONS] file.vcf.gz¶
Modify header of VCF/BCF files, change sample names.-h, --header FILE
-o, --output FILE
-s, --samples FILE
bcftools roh [OPTIONS] file.vcf.gz¶
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.The HMM model:
Notation: D = Data, AZ = autozygosity, HW = Hardy-Weinberg (non-autozygosity), f = non-ref allele frequency Emission probabilities: oAZ = P_i(D|AZ) = (1-f)*P(D|RR) + f*P(D|AA) oHW = P_i(D|HW) = (1-f)^2 * P(D|RR) + f^2 * P(D|AA) + 2*f*(1-f)*P(D|RA) Transition probabilities: tAZ = P(AZ|HW) .. from HW to AZ, the -a parameter tHW = P(HW|AZ) .. from AZ to HW, the -H parameter ci = P_i(C) .. probability of cross-over at site i, from genetic map AZi = P_i(AZ) .. probability of site i being AZ/non-AZ, scaled so that AZi+HWi = 1 HWi = P_i(HW) P_{i+1}(AZ) = oAZ * max[(1 - tAZ * ci) * AZ{i-1} , tAZ * ci * (1-AZ{i-1})] P_{i+1}(HW) = oHW * max[(1 - tHW * ci) * (1-AZ{i-1}) , tHW * ci * AZ{i-1}]
General Options:
--AF-dflt FLOAT
--AF-tag TAG
--AF-file FILE
bcftools query -f'%CHROM\t%POS\t%REF,%ALT\t%INFO/TAG\n' file.vcf | bgzip -c > freqs.tab.gz
-e, --estimate-AF FILE
-G, --GTs-only FLOAT
-I, --skip-indels
-m, --genetic-map FILE
-M, --rec-rate FLOAT
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --sample name
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
HMM Options:
-a, --hw-to-az FLOAT
-H, --az-to-hw FLOAT
-V, --viterbi-training
bcftools stats [OPTIONS] A.vcf.gz [B.vcf.gz]¶
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats. When two files are given, the program generates separate stats for intersection and the complements. By default only sites are compared, -s/-S must given to include also sample columns.-1, --1st-allele-only
-c, --collapse snps|indels|both|all|some|none
-d, --depth INT,INT,INT
--debug
-e, --exclude EXPRESSION
-E, --exons file.gz
tabix -s1 -b2 -e3 file.gz
-f, --apply-filters LIST
-F, --fasta-ref ref.fa
-i, --include EXPRESSION
-I, --split-by-ID
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-s, --samples LIST
-S, --samples-file FILE
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
-u, --user-tstv <TAG[:min:max:n]>
-v, --verbose
bcftools view [OPTIONS] file.vcf.gz [REGION [...]]¶
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF. Former bcftools subset.Output options
-G, --drop-genotypes
-h, --header-only
-H, --no-header
-l, --compression-level [0-9]
--no-version
-O, --output-type b|u|z|v
-o, --output-file FILE: output file name. If not present, the default is to print to standard output (stdout).
-r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
-R, --regions-file file
-t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
-T, --targets-file file
--threads INT
Subset options:
-a, --trim-alt-alleles
--force-samples
-I, --no-update
-s, --samples LIST
-S, --samples-file FILE
Filter options:
Note that filter options below dealing with counting the number of alleles will, for speed, first check for the values of AC and AN in the INFO column to avoid parsing all the genotype (FORMAT/GT) fields in the VCF. This means that a filter like --min-af 0.1 will be based ‘AC/AN’ where AC and AN come from either INFO/AC and INFO/AN if available or FORMAT/GT if not. It will not filter on another field like INFO/AF. The --include and --exclude filter expressions should instead be used to explicitly filter based on fields in the INFO column, e.g. --exclude AF<0.1.
-c, --min-ac INT[:nref|:alt1|:minor|:major|:'nonmajor']
-C, --max-ac INT[:nref|:alt1|:minor|:'major'|:'nonmajor']
-e, --exclude EXPRESSION
-f, --apply-filters LIST
-g, --genotype [^][hom|het|miss]
-i, --include EXPRESSION
-k, --known
-m, --min-alleles INT
-M, --max-alleles INT
-n, --novel
-p, --phased
-P, --exclude-phased
-q, --min-af FLOAT[:nref|:alt1|:minor|:major|:nonmajor]
-Q, --max-af FLOAT[:nref|:alt1|:minor|:major|:nonmajor]
-u, --uncalled
-U, --exclude-uncalled
-v, --types snps|indels|mnps|other
-V, --exclude-types snps|indels|mnps|other
-x, --private
-X, --exclude-private
bcftools help [COMMAND] | bcftools --help [COMMAND]¶
Display a brief usage message listing the bcftools commands available. If the name of a command is also given, e.g., bcftools help view, the detailed usage message for that particular command is displayed.bcftools [--version|-v]¶
Display the version numbers and copyright information for bcftools and the important libraries used by bcftools.bcftools [--version-only]¶
Display the full bcftools version number in a machine-readable format.EXPRESSIONS¶
These filtering expressions are accepted by annotate, filter, query and view commands.Valid expressions may contain:
1, 1.0, 1e-4 "String" @file_name
+,*,-,/
== (same as =), >, >=, <=, <, !=
INFO/HAYSTACK ~ "needle"
(, )
&& (same as &), ||, |
INFO/DP or DP FORMAT/DV, FMT/DV, or DV FILTER, QUAL, ID, POS, REF, ALT[0]
FlagA=1 && FlagB=0
DP=".", DP!=".", ALT="."
GT="."
TYPE="indel" | TYPE="snp"
(DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3 DP4[*] == 0 CSQ[*] ~ "missense_variant.*deleterious"
MAX, MIN, AVG, SUM, STRLEN, ABS
N_ALT, N_SAMPLES, AC, MAC, AF, MAF, AN
Notes:
Examples:
MIN(DV)>5
MIN(DV/DP)>0.3
MIN(DP)>10 & MIN(DV)>3
FMT/DP>10 & FMT/GQ>10 .. both conditions must be satisfied within one sample
FMT/DP>10 && FMT/GQ>10 .. the conditions can be satisfied in different samples
QUAL>10 | FMT/GQ>10 .. selects only GQ>10 samples
QUAL>10 || FMT/GQ>10 .. selects all samples at QUAL>10 sites
TYPE="snp" && QUAL>=10 && (DP4[2]+DP4[3] > 2)
MIN(DP)>35 && AVG(GQ)>50
ID=@file .. selects lines with ID present in the file
ID!=@~/file .. skip lines with ID present in the ~/file
MAF[0]<0.05 .. select rare variants at 5% cutoff
POS>=100 .. restrict your range query, e.g. 20:100-200 to strictly sites with POS in that range.
Shell expansion:
Note that expressions must often be quoted because some characters have special meaning in the shell. An example of expression enclosed in single quotes which cause that the whole expression is passed to the program as intended:
bcftools view -i '%ID!="." & MAF[0]<0.01'
Please refer to the documentation of your shell for details.
SCRIPTS AND OPTIONS¶
plot-vcfstats [OPTIONS] file.vchk [...]¶
Script for processing output of bcftools stats. It can merge results from multiple outputs (useful when running the stats for each chromosome separately), plots graphs and creates a PDF presentation.-m, --merge
-p, --prefix PATH
-P, --no-PDF
-r, --rasterize
-s, --sample-names
-t, --title STRING
-T, --main-title STRING
PERFORMANCE¶
HTSlib was designed with BCF format in mind. When parsing VCF files, all records are internally converted into BCF representation. Simple operations, like removing a single column from a VCF file, can be therefore done much faster with standard UNIX commands, such as awk or cut. Therefore it is recommended to use BCF as input/output format whenever possible to avoid large overhead of the VCF → BCF → VCF conversion.BUGS¶
Please report any bugs you encounter on the github website: http://github.com/samtools/bcftoolsAUTHORS¶
Heng Li from the Sanger Institute wrote the original C version of htslib, samtools and bcftools. Bob Handsaker from the Broad Institute implemented the BGZF library. Petr Danecek, Shane McCarthy and John Marshall are maintaining and further developing bcftools. Many other people contributed to the program and to the file format specifications, both directly and indirectly by providing patches, testing and reporting bugs. We thank them all.RESOURCES¶
BCFtools GitHub website: http://github.com/samtools/bcftoolsSamtools GitHub website: http://github.com/samtools/samtools
HTSlib GitHub website: http://github.com/samtools/htslib
File format specifications: http://samtools.github.io/hts-specs
BCFtools documentation: http://samtools.github.io/bcftools
BCFtools wiki page: https://github.com/samtools/bcftools/wiki
COPYING¶
The MIT/Expat License or GPL License, see the LICENSE document for details. Copyright (c) Genome Research Ltd.2016-04-18 14:18 BST |