.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
.TH SCOARY "1" "January 2019" "scoary 1.6.16" "User Commands"
.SH NAME
scoary \- pangenome-wide association studies
.SH SYNOPSIS
scoary [\-h] [\-t TRAITS] [\-g GENES] [\-n NEWICKTREE] [\-s START_COL]
[\-\-delimiter DELIMITER] [\-r RESTRICT_TO] [\-o OUTDIR] [\-u]
[\-p P_VALUE_CUTOFF [P_VALUE_CUTOFF ...]]
[\-c [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]]] [\-m MAX_HITS]
[\-\-include_input_columns GRABCOLS] [\-w] [\-\-no\-time] [\-e PERMUTE]
[\-\-no_pairwise] [\-\-collapse] [\-\-threads THREADS] [\-\-test]
[\-\-citation] [\-\-version]
.SH OPTIONS
.SS "optional arguments:"
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.SS "Input options:"
.TP
\fB\-t\fR TRAITS, \fB\-\-traits\fR TRAITS
Input trait table (comma\-separated\-values). Trait
presence is indicated by 1, trait absence by 0.
Assumes strain names in the first column and trait
names in the first row
.TP
\fB\-g\fR GENES, \fB\-\-genes\fR GENES
Input gene presence/absence table (comma\-separatedvalues) from ROARY. Strain names must be equal to
those in the trait table
.TP
\fB\-n\fR NEWICKTREE, \fB\-\-newicktree\fR NEWICKTREE
Supply a custom tree (Newick format) for phylogenetic
analyses instead instead of calculating it internally.
.TP
\fB\-s\fR START_COL, \fB\-\-start_col\fR START_COL
On which column in the gene presence/absence file do
individual strain info start. Default=15. (1\-based
indexing)
.TP
\fB\-\-delimiter\fR DELIMITER
The delimiter between cells in the gene
presence/absence and trait files, as well as the
output file.
.TP
\fB\-r\fR RESTRICT_TO, \fB\-\-restrict_to\fR RESTRICT_TO
Use if you only want to analyze a subset of your
strains. Scoary will read the provided comma\-separated
table of strains and restrict analyzes to these.
.SS "Output options:"
.TP
\fB\-o\fR OUTDIR, \fB\-\-outdir\fR OUTDIR
Directory to place output files. Default = .
.TP
\fB\-u\fR, \fB\-\-upgma_tree\fR
This flag will cause Scoary to write the calculated
UPGMA tree to a newick file
.TP
\fB\-p\fR P_VALUE_CUTOFF [P_VALUE_CUTOFF ...], \fB\-\-p_value_cutoff\fR P_VALUE_CUTOFF [P_VALUE_CUTOFF ...]
P\-value cut\-off / alpha level. For Fishers,
Bonferronis, and Benjamini\-Hochbergs tests, SCOARY
will not report genes with higher p\-values than this.
For empirical p\-values, this is treated as an alpha
level instead. I.e. 0.02 will filter all genes except
the lower and upper percentile from this test. Run
with "\-p 1.0" to report all genes. Accepts standard
form (e.g. 1E\-8). Provide a single value (applied to
all) or exactly as many values as correction criteria
and in corresponding order. (See example under
correction). Default = 0.05
.TP
\fB\-c\fR [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]], \fB\-\-correction\fR [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]]
Apply the indicated filtration measure. Allowed values
are I, B, BH, PW, EPW, P. I=Individual (naive)
p\-value. B=Bonferroni adjusted p\-value. BH=BenjaminiHochberg adjusted p. PW=Best (lowest) pairwise
comparison. EPW=Entire range of pairwise comparison
p\-values. P=Empirical p\-value from permutations. You
can enter as many correction criteria as you would
like. These will be associated with the
p_value_cutoffs you enter. For example "\-c I EPW \fB\-p\fR
0.1 0.05" will apply the following cutoffs: Naive
p\-value must be lower than 0.1 AND the entire range of
pairwise comparison values are below 0.05 for this
gene. Note that the empirical p\-values should be
interpreted at both tails. Therefore, running "\-c P \fB\-p\fR
0.05" will apply an alpha of 0.05 to the empirical
(permuted) p\-values, i.e. it will filter everything
except the upper and lower 2.5 percent of the
distribution. Default = Individual p\-value. (I)
.TP
\fB\-m\fR MAX_HITS, \fB\-\-max_hits\fR MAX_HITS
Maximum number of hits to report. SCOARY will only
report the top max_hits results per trait
.TP
\fB\-\-include_input_columns\fR GRABCOLS
Grab columns from the input Roary file. and puts them
in the output. Handles comma and ranges, e.g.
\fB\-\-include_input_columns\fR 4,6,8,16\-23. The special
keyword ALL will include all relevant input columns in
the output
.TP
\fB\-w\fR, \fB\-\-write_reduced\fR
Use with \fB\-r\fR if you want Scoary to create a new gene
presence absence file from your reduced set of
isolates. Note: Columns 1\-14 (No. sequences, Avg group
size nuc etc) in this file do not reflect the reduced
dataset. These are taken from the full dataset.
.TP
\fB\-\-no\-time\fR
Output file in the form TRAIT.results.csv, instead of
TRAIT_TIMESTAMP.csv. When used with the \fB\-w\fR argument
will output a reduced gene matrix in the form
gene_presence_absence_reduced.csv rather than
gene_presence_absence_reduced_TIMESTAMP.csv
.SS "Analysis options:"
.TP
\fB\-e\fR PERMUTE, \fB\-\-permute\fR PERMUTE
Perform N number of permutations of the significant
results post\-analysis. Each permutation will do a
label switching of the phenotype and a new p\-value is
calculated according to this new dataset. After all N
permutations are completed, the results are ordered in
ascending order, and the percentile of the original
result in the permuted p\-value distribution is
reported.
.TP
\fB\-\-no_pairwise\fR
Do not perform pairwise comparisons. Inthis mode,
Scoary will perform population structure\-naive
calculations only. (Fishers test, ORs etc). Useful for
summary operations and exploring sets. (Genes unique
in groups, intersections etc) but not causal analyses.
.TP
\fB\-\-collapse\fR
Add this to collapse correlated genes (genes that have
identical distribution patterns in the sample) into
merged units.
.SS "Misc options:"
.TP
\fB\-\-threads\fR THREADS
Number of threads to use. Default = 1
.TP
\fB\-\-test\fR
Run Scoary on the test set in exampledata, overriding
all other parameters.
.TP
\fB\-\-citation\fR
Show citation information, and exit.
.TP
\fB\-\-version\fR
Display Scoary version, and exit.
.PP
by Ola Brynildsrud (olbb@fhi.no)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.