.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8. .TH SCOARY "1" "January 2019" "scoary 1.6.16" "User Commands" .SH NAME scoary \- pangenome-wide association studies .SH SYNOPSIS scoary [\-h] [\-t TRAITS] [\-g GENES] [\-n NEWICKTREE] [\-s START_COL] [\-\-delimiter DELIMITER] [\-r RESTRICT_TO] [\-o OUTDIR] [\-u] [\-p P_VALUE_CUTOFF [P_VALUE_CUTOFF ...]] [\-c [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]]] [\-m MAX_HITS] [\-\-include_input_columns GRABCOLS] [\-w] [\-\-no\-time] [\-e PERMUTE] [\-\-no_pairwise] [\-\-collapse] [\-\-threads THREADS] [\-\-test] [\-\-citation] [\-\-version] .SH OPTIONS .SS "optional arguments:" .TP \fB\-h\fR, \fB\-\-help\fR show this help message and exit .SS "Input options:" .TP \fB\-t\fR TRAITS, \fB\-\-traits\fR TRAITS Input trait table (comma\-separated\-values). Trait presence is indicated by 1, trait absence by 0. Assumes strain names in the first column and trait names in the first row .TP \fB\-g\fR GENES, \fB\-\-genes\fR GENES Input gene presence/absence table (comma\-separatedvalues) from ROARY. Strain names must be equal to those in the trait table .TP \fB\-n\fR NEWICKTREE, \fB\-\-newicktree\fR NEWICKTREE Supply a custom tree (Newick format) for phylogenetic analyses instead instead of calculating it internally. .TP \fB\-s\fR START_COL, \fB\-\-start_col\fR START_COL On which column in the gene presence/absence file do individual strain info start. Default=15. (1\-based indexing) .TP \fB\-\-delimiter\fR DELIMITER The delimiter between cells in the gene presence/absence and trait files, as well as the output file. .TP \fB\-r\fR RESTRICT_TO, \fB\-\-restrict_to\fR RESTRICT_TO Use if you only want to analyze a subset of your strains. Scoary will read the provided comma\-separated table of strains and restrict analyzes to these. .SS "Output options:" .TP \fB\-o\fR OUTDIR, \fB\-\-outdir\fR OUTDIR Directory to place output files. Default = . .TP \fB\-u\fR, \fB\-\-upgma_tree\fR This flag will cause Scoary to write the calculated UPGMA tree to a newick file .TP \fB\-p\fR P_VALUE_CUTOFF [P_VALUE_CUTOFF ...], \fB\-\-p_value_cutoff\fR P_VALUE_CUTOFF [P_VALUE_CUTOFF ...] P\-value cut\-off / alpha level. For Fishers, Bonferronis, and Benjamini\-Hochbergs tests, SCOARY will not report genes with higher p\-values than this. For empirical p\-values, this is treated as an alpha level instead. I.e. 0.02 will filter all genes except the lower and upper percentile from this test. Run with "\-p 1.0" to report all genes. Accepts standard form (e.g. 1E\-8). Provide a single value (applied to all) or exactly as many values as correction criteria and in corresponding order. (See example under correction). Default = 0.05 .TP \fB\-c\fR [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]], \fB\-\-correction\fR [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]] Apply the indicated filtration measure. Allowed values are I, B, BH, PW, EPW, P. I=Individual (naive) p\-value. B=Bonferroni adjusted p\-value. BH=BenjaminiHochberg adjusted p. PW=Best (lowest) pairwise comparison. EPW=Entire range of pairwise comparison p\-values. P=Empirical p\-value from permutations. You can enter as many correction criteria as you would like. These will be associated with the p_value_cutoffs you enter. For example "\-c I EPW \fB\-p\fR 0.1 0.05" will apply the following cutoffs: Naive p\-value must be lower than 0.1 AND the entire range of pairwise comparison values are below 0.05 for this gene. Note that the empirical p\-values should be interpreted at both tails. Therefore, running "\-c P \fB\-p\fR 0.05" will apply an alpha of 0.05 to the empirical (permuted) p\-values, i.e. it will filter everything except the upper and lower 2.5 percent of the distribution. Default = Individual p\-value. (I) .TP \fB\-m\fR MAX_HITS, \fB\-\-max_hits\fR MAX_HITS Maximum number of hits to report. SCOARY will only report the top max_hits results per trait .TP \fB\-\-include_input_columns\fR GRABCOLS Grab columns from the input Roary file. and puts them in the output. Handles comma and ranges, e.g. \fB\-\-include_input_columns\fR 4,6,8,16\-23. The special keyword ALL will include all relevant input columns in the output .TP \fB\-w\fR, \fB\-\-write_reduced\fR Use with \fB\-r\fR if you want Scoary to create a new gene presence absence file from your reduced set of isolates. Note: Columns 1\-14 (No. sequences, Avg group size nuc etc) in this file do not reflect the reduced dataset. These are taken from the full dataset. .TP \fB\-\-no\-time\fR Output file in the form TRAIT.results.csv, instead of TRAIT_TIMESTAMP.csv. When used with the \fB\-w\fR argument will output a reduced gene matrix in the form gene_presence_absence_reduced.csv rather than gene_presence_absence_reduced_TIMESTAMP.csv .SS "Analysis options:" .TP \fB\-e\fR PERMUTE, \fB\-\-permute\fR PERMUTE Perform N number of permutations of the significant results post\-analysis. Each permutation will do a label switching of the phenotype and a new p\-value is calculated according to this new dataset. After all N permutations are completed, the results are ordered in ascending order, and the percentile of the original result in the permuted p\-value distribution is reported. .TP \fB\-\-no_pairwise\fR Do not perform pairwise comparisons. Inthis mode, Scoary will perform population structure\-naive calculations only. (Fishers test, ORs etc). Useful for summary operations and exploring sets. (Genes unique in groups, intersections etc) but not causal analyses. .TP \fB\-\-collapse\fR Add this to collapse correlated genes (genes that have identical distribution patterns in the sample) into merged units. .SS "Misc options:" .TP \fB\-\-threads\fR THREADS Number of threads to use. Default = 1 .TP \fB\-\-test\fR Run Scoary on the test set in exampledata, overriding all other parameters. .TP \fB\-\-citation\fR Show citation information, and exit. .TP \fB\-\-version\fR Display Scoary version, and exit. .PP by Ola Brynildsrud (olbb@fhi.no) .SH AUTHOR This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.