.\" Manpage for QTLtools pca. .\" Contact halitongen@gmail.com to correct errors or typos. .TH QTLtools-pca 1 "06 May 2020" "QTLtools-v1.3" "Bioinformatics tools" .SH NAME QTLtools pca \- Conducts PCA .SH SYNOPSIS .B QTLtools pca \-\-vcf .IR [in.vcf | in.vcf.gz | in.bcf] .B | \-\-bed .IR in.bed.gz .B \-\-out .IR output.txt .I [OPTIONS] .SH DESCRIPTION This mode allows performing a Principal Component Analysis (PCA) either on molecular phenotype quantifications or genotype data. It is typically used (i) to detect outliers in the data, (ii) to detect stratification in the data or (iii) to build a covariate matrix before QTL mapping. QTLtools' PCA implementation utilizes singular value decomposition (SVD). When building a covariate matrix to account for technical covariates we recommend using \fB\-\-center\fR and \fB\-\-scale\fR. .SH OPTIONS .TP .B \-\-vcf [\fIin.vcf\fR|\fIin.bcf\fR|\fIin.vcf.gz\fR|\fIin.bed.gz\fB] Genotypes in VCF/BCF/BED format. REQUIRED unless \fB\-\-bed\fR. .TP .B \-\-bed \fIquantifications.bed.gz\fR Quantifications in BED format. REQUIRED unless \fB\-\-vcf\fR. .TP .B \-\-out \fIoutput_prefix\fR Output file prefix. REQUIRED. .TP .B \-\-center Center the variables (genotypes or phenotypes) by subtracting the mean from each value .TP .B \-\-scale Scale the variables (genotypes or phenotypes) by dividing each value by the standard deviation .TP .B \-\-region \fIchr:start-end\fR Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5 .TP .B \-\-exclude-chrs \fIstring\fR The chromosomes to exclude given as a space separated list. Only applies to \fB\-\-vcf\fR. DEFAULT="X Y M MT XY chrX chrY chrM chrMT chrXY" .TP .B \-\-maf \fIfloat\fR Exclude sites with minor allele frequency less than this. Only applies to \fB\-\-vcf\fR. DEFAULT=0.0 .TP .B \-\-distance \fIinteger\fR Only include sites separated with this many base pairs. Only applies to \fB\-\-vcf\fR. DEFAULT=0 .SH OUTPUT FILES .TP 1 .B .pca This file contains the principal components that were calculated. The names of the principal components, which is given in the first column, is composed of the output file prefix, whether the data was centered, whether the data was scaled, and the principal component number. .TP 1 .B .pca_stats This file contains the standard deviation of each principal component, and the variance and the cumulative variance explained by each PC. .SH EXAMPLES .IP o 2 Running pca on RNAseq quantifications to calculate technical covariates: .IP "" 2 QTLtools pca \-\-bed genes.50percent.chr22.bed.gz \-\-out genes.50percent.chr22 \-\-center \-\-scale .IP o 2 Running pca on genotypes to detect population stratification: .IP "" 2 QTLtools pca \-\-vcf genotypes.chr22.vcf.gz \-\-out genotypes.chr22 \-\-center \-\-scale --maf 0.05 --distance 5000 .SH SEE ALSO .IR QTLtools (1) .\".IR QTLtools-bamstat (1), .\".IR QTLtools-mbv (1), .\".IR QTLtools-pca (1), .\".IR QTLtools-correct (1), .\".IR QTLtools-cis (1), .\".IR QTLtools-trans (1), .\".IR QTLtools-fenrich (1), .\".IR QTLtools-fdensity (1), .\".IR QTLtools-rtc (1), .\".IR QTLtools-rtc-union (1), .\".IR QTLtools-extract (1), .\".IR QTLtools-quan (1), .\".IR QTLtools-rep (1), .\".IR QTLtools-gwas (1), .PP QTLtools website: .SH BUGS .IP o 2 Versions up to and including 1.2, suffer from a bug in reading missing genotypes in VCF/BCF files. This bug affects variants with a DS field in their genotype's FORMAT and have a missing genotype (DS fiels is .) in one of the samples, in which case genotypes for all the samples are set to missing, effectively removing this variant from the analyses. .PP Please submit bugs to .SH AUTHORS Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)