.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5. .TH UNIKMER "1" "August 2022" "unikmer 0.19.0" "User Commands" .SH NAME unikmer \- Toolkit for nucleic acid k-mer analysis .SH DESCRIPTION unikmer \- Toolkit for k\-mer with taxonomic information .PP unikmer is a toolkit for nucleic acid k\-mer analysis, providing functions including set operation on k\-mers optional with TaxIds but without count information. .PP K\-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64', and serialized in binary file with extension '.unik'. .PP TaxIds can be assigned when counting k\-mers from genome sequences, and LCA (Lowest Common Ancestor) is computed during set opertions including computing union, intersection, set difference, unique and repeated k\-mers. .PP Version: v0.19.0 .PP Author: Wei Shen .PP Documents : https://bioinf.shenwei.me/unikmer Source code: https://github.com/shenwei356/unikmer .PP Dataset (optional): .IP Manipulating k\-mers with TaxIds needs taxonomy file from e.g., NCBI Taxonomy database, please extract "nodes.dmp", "names.dmp", "delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/ , ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz , or some other directory, and later you can refer to using flag \fB\-\-data\-dir\fR or environment variable UNIKMER_DB. .IP For GTDB, use 'taxonkit create\-taxdump' to create NCBI\-style taxonomy dump files, or download from: .IP https://github.com/shenwei356/gtdb\-taxonomy .IP Note that TaxIds are represented using uint32 and stored in 4 or less bytes, all TaxIds should be in the range of [1, 4294967295] .SS "Usage:" .IP unikmer [command] .SS "Available Commands:" .IP autocompletion Generate shell autocompletion script (bash|zsh|fish|powershell) common Find k\-mers shared by most of multiple binary files concat Concatenate multiple binary files without removing duplicates count Generate k\-mers (sketch) from FASTA/Q sequences decode Decode encoded integer to k\-mer text diff Set difference of multiple binary files dump Convert plain k\-mer text to binary format encode Encode plain k\-mer text to integer filter Filter out low\-complexity k\-mers (experimental) grep Search k\-mers from binary files head Extract the first N k\-mers info Information of binary files inter Intersection of multiple binary files locate Locate k\-mers in genome merge Merge k\-mers from sorted chunk files num Quickly inspect number of k\-mers in binary files rfilter Filter k\-mers by taxonomic rank sample Sample k\-mers from binary files sort Sort k\-mers in binary files to reduce file size split Split k\-mers into sorted chunk files tsplit Split k\-mers according to taxid union Union of multiple binary files uniqs Mapping k\-mers back to genome and find unique subsequences version Print version information and check for update view Read and output binary format to plain text .SS "Flags:" .TP \fB\-c\fR, \fB\-\-compact\fR write compact binary file with little loss of speed .TP \fB\-\-compression\-level\fR int compression level (default \fB\-1\fR) .TP \fB\-\-data\-dir\fR string directory containing NCBI Taxonomy files, including nodes.dmp, names.dmp, merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer") .TP \fB\-h\fR, \fB\-\-help\fR help for unikmer .TP \fB\-I\fR, \fB\-\-ignore\-taxid\fR ignore taxonomy information .TP \fB\-i\fR, \fB\-\-infile\-list\fR string file of input files list (one file per line), if given, they are appended to files from cli arguments .TP \fB\-\-max\-taxid\fR uint32 for smaller TaxIds, we can use less space to store TaxIds. default value is 1<<32\-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295) .TP \fB\-C\fR, \fB\-\-no\-compress\fR do not compress binary file (not recommended) .TP \fB\-\-nocheck\-file\fR do not check binary file, when using process substitution or named pipe .TP \fB\-j\fR, \fB\-\-threads\fR int number of CPUs to use (default 4) .TP \fB\-\-verbose\fR print verbose information .PP Use "unikmer [command] \fB\-\-help\fR" for more information about a command.