.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.48.5.
.TH UNIKMER "1" "August 2022" "unikmer 0.19.0" "User Commands"
.SH NAME
unikmer \- Toolkit for nucleic acid k-mer analysis
.SH DESCRIPTION
unikmer \- Toolkit for k\-mer with taxonomic information
.PP
unikmer is a toolkit for nucleic acid k\-mer analysis, providing functions
including set operation on k\-mers optional with TaxIds but without count
information.
.PP
K\-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64',
and serialized in binary file with extension '.unik'.
.PP
TaxIds can be assigned when counting k\-mers from genome sequences,
and LCA (Lowest Common Ancestor) is computed during set opertions
including computing union, intersection, set difference, unique and
repeated k\-mers.
.PP
Version: v0.19.0
.PP
Author: Wei Shen <shenwei356@gmail.com>
.PP
Documents  : https://bioinf.shenwei.me/unikmer
Source code: https://github.com/shenwei356/unikmer
.PP
Dataset (optional):
.IP
Manipulating k\-mers with TaxIds needs taxonomy file from e.g.,
NCBI Taxonomy database, please extract "nodes.dmp", "names.dmp",
"delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/ ,
ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz ,
or some other directory, and later you can refer to using flag
\fB\-\-data\-dir\fR or environment variable UNIKMER_DB.
.IP
For GTDB, use 'taxonkit create\-taxdump' to create NCBI\-style
taxonomy dump files, or download from:
.IP
https://github.com/shenwei356/gtdb\-taxonomy
.IP
Note that TaxIds are represented using uint32 and stored in 4 or
less bytes, all TaxIds should be in the range of [1, 4294967295]
.SS "Usage:"
.IP
unikmer [command]
.SS "Available Commands:"
.IP
autocompletion Generate shell autocompletion script (bash|zsh|fish|powershell)
common         Find k\-mers shared by most of multiple binary files
concat         Concatenate multiple binary files without removing duplicates
count          Generate k\-mers (sketch) from FASTA/Q sequences
decode         Decode encoded integer to k\-mer text
diff           Set difference of multiple binary files
dump           Convert plain k\-mer text to binary format
encode         Encode plain k\-mer text to integer
filter         Filter out low\-complexity k\-mers (experimental)
grep           Search k\-mers from binary files
head           Extract the first N k\-mers
info           Information of binary files
inter          Intersection of multiple binary files
locate         Locate k\-mers in genome
merge          Merge k\-mers from sorted chunk files
num            Quickly inspect number of k\-mers in binary files
rfilter        Filter k\-mers by taxonomic rank
sample         Sample k\-mers from binary files
sort           Sort k\-mers in binary files to reduce file size
split          Split k\-mers into sorted chunk files
tsplit         Split k\-mers according to taxid
union          Union of multiple binary files
uniqs          Mapping k\-mers back to genome and find unique subsequences
version        Print version information and check for update
view           Read and output binary format to plain text
.SS "Flags:"
.TP
\fB\-c\fR, \fB\-\-compact\fR
write compact binary file with little loss of speed
.TP
\fB\-\-compression\-level\fR int
compression level (default \fB\-1\fR)
.TP
\fB\-\-data\-dir\fR string
directory containing NCBI Taxonomy files, including nodes.dmp,
names.dmp, merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer")
.TP
\fB\-h\fR, \fB\-\-help\fR
help for unikmer
.TP
\fB\-I\fR, \fB\-\-ignore\-taxid\fR
ignore taxonomy information
.TP
\fB\-i\fR, \fB\-\-infile\-list\fR string
file of input files list (one file per line), if given, they are
appended to files from cli arguments
.TP
\fB\-\-max\-taxid\fR uint32
for smaller TaxIds, we can use less space to store TaxIds. default value
is 1<<32\-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295)
.TP
\fB\-C\fR, \fB\-\-no\-compress\fR
do not compress binary file (not recommended)
.TP
\fB\-\-nocheck\-file\fR
do not check binary file, when using process substitution or named pipe
.TP
\fB\-j\fR, \fB\-\-threads\fR int
number of CPUs to use (default 4)
.TP
\fB\-\-verbose\fR
print verbose information
.PP
Use "unikmer [command] \fB\-\-help\fR" for more information about a command.