.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16. .TH SEQKIT "1" "January 2021" "seqkit 0.15.0+ds" "User Commands" .SH NAME seqkit \- cross-platform and ultrafast toolkit for FASTA/Q file manipulation .SH DESCRIPTION SeqKit \fB\-\-\fR a cross\-platform and ultrafast toolkit for FASTA/Q file manipulation .PP Version: 0.15.0 .PP Author: Wei Shen .PP Documents : http://bioinf.shenwei.me/seqkit Source code: https://github.com/shenwei356/seqkit Please cite: https://doi.org/10.1371/journal.pone.0163962 .SS "Usage:" .IP seqkit [command] .SS "Available Commands:" .TP amplicon retrieve amplicon (or specific region around it) via primer(s) .TP bam monitoring and online histograms of BAM record features .TP common find common sequences of multiple files by id/name/sequence .TP concat concatenate sequences with same ID from multiple files .TP convert convert FASTQ quality encoding between Sanger, Solexa and Illumina .TP duplicate duplicate sequences N times .TP faidx create FASTA index file and extract subsequence .TP fish look for short sequences in larger sequences using local alignment .TP fq2fa convert FASTQ to FASTA .TP fx2tab convert FASTA/Q to tabular format (with length/GC content/GC skew) .IP genautocomplete generate shell autocompletion script grep search sequences by ID/name/sequence/sequence motifs, mismatch allowed head print first N FASTA/Q records help Help about any command locate locate subsequences/motifs, mismatch allowed mutate edit sequence (point mutation, insertion, deletion) pair match up paired\-end reads from two fastq files range print FASTA/Q records in a range (start:end) rename rename duplicated IDs replace replace name/sequence by regular expression restart reset start position for circular genome rmdup remove duplicated sequences by id/name/sequence sample sample sequences by number or proportion sana sanitize broken single line fastq files scat real time recursive concatenation and streaming of fastx files seq transform sequences (revserse, complement, extract ID...) shuffle shuffle sequences sliding sliding sequences, circular genome supported sort sort sequences by id/name/sequence/length split split sequences into files by id/seq region/size/parts (mainly for FASTA) split2 split sequences into files by size/parts (FASTA, PE/SE FASTQ) stats simple statistics of FASTA/Q files subseq get subsequences by region/gtf/bed, including flanking sequences tab2fx convert tabular format to FASTA/Q format translate translate DNA/RNA to protein sequence (supporting ambiguous bases) version print version information and check for update watch monitoring and online histograms of sequence features .SS "Flags:" .TP \fB\-\-alphabet\-guess\-seq\-length\fR int length of sequence prefix of the first FASTA record based on which seqkit guesses the sequence type (0 for whole seq) (default 10000) .TP \fB\-h\fR, \fB\-\-help\fR help for seqkit .TP \fB\-\-id\-ncbi\fR FASTA head is NCBI\-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud... .TP \fB\-\-id\-regexp\fR string regular expression for parsing ID (default "^(\e\eS+)\e\es?") .TP \fB\-\-infile\-list\fR string file of input files list (one file per line), if given, they are appended to files from cli arguments .TP \fB\-w\fR, \fB\-\-line\-width\fR int line width when outputting FASTA format (0 for no wrap) (default 60) .TP \fB\-o\fR, \fB\-\-out\-file\fR string out file ("\-" for stdout, suffix .gz for gzipped out) (default "\-") .TP \fB\-\-quiet\fR be quiet and do not show extra information .TP \fB\-t\fR, \fB\-\-seq\-type\fR string sequence type (dna|rna|protein|unlimit|auto) (for auto, it automatically detect by the first sequence) (default "auto") .TP \fB\-j\fR, \fB\-\-threads\fR int number of CPUs. (default value: 1 for single\-CPU PC, 2 for others. can also set with environment variable SEQKIT_THREADS) (default 2) .PP Use "seqkit [command] \fB\-\-help\fR" for more information about a command. .SH AUTHOR This manpage was written by Nilesh Patra for the Debian distribution and can be used for any other usage of the program.