.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.48.1.
.TH GMAP_BUILD "1" "February 2021" "gmap_build 2021-02-22+ds-1" "User Commands"
.SH NAME
gmap_build \- Tool for genome database creation for GMAP or GSNAP
.SH SYNOPSIS
.B gmap_build
[\fI\,options\/\fR...] \fI\,-d <genome> \/\fR[\fI\,-c <transcriptome> -T <transcript_fasta>\/\fR] \fI\,<genome_fasta_files>\/\fR
.SH DESCRIPTION
gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP.
Part of GMAP package, version 2021\-02\-22.
.PP
You are free to name <genome> and <transcriptome> as you wish.  You
will use the same names when performing alignments subsequently using
GMAP or GSNAP.
.PP
Note: If adding a transcriptome to an existing genome, then there is
no need to specify the genome_fasta_files.  This way you can add
transcriptome information to an existing genome database.
.SH OPTIONS
.TP
\fB\-D\fR, \fB\-\-dir\fR=\fI\,STRING\/\fR
Destination directory for installation (defaults to gmapdb
directory specified at configure time)
.TP
\fB\-d\fR, \fB\-\-genomedb\fR=\fI\,STRING\/\fR
Genome name (required)
.TP
\fB\-n\fR, \fB\-\-names\fR=\fI\,STRING\/\fR
Substitute names for contigs, provided in a file.
.IP
The file can have two formats:
.TP
1.
A file with one column per line, with each line
corresponding to a FASTA file, in the order given to
gmap_build.  The chromosome name for each FASTA file will
be replaced with the desired chromosome name in the file.
Every chromosome in the FASTA must have a corresponding line
in the file.  This is useful if you want to rename chromosomes
with a systematic numbering pattern.
.TP
2.
A file with two columns per line, separated by white
space.  In each line, the original FASTA chromosome name
should be in column 1 and the desired chromosome name
will be in column 2.
.IP
The meaning of file format 2 depends on whether
\fB\-\-limit\-to\-names\fR is specified.  If so, the genome build will
be limited to those chromosomes in this file.  Otherwise,
all chromosomes in the FASTA file will be included,
but only those chromosomes in this file will be re\-named, which
provides an easy way to change just a few chromosome names.
.IP
This file can be combined with the \fB\-\-sort\fR=\fI\,names\/\fR option, in
which the order of chromosomes is that given in the file.  In
this case, every chromosome must be listed in the file, and
for chromosome names that should not be changed, column 2 can
be blank (or the same as column 1).  The option of a blank
column 2 is allowed only when specifying \fB\-\-sort\fR=\fI\,names\/\fR,
because otherwise, the program cannot distinguish between a
1\-column and 2\-column names file.
.TP
\fB\-L\fR, \fB\-\-limit\-to\-names\fR
Determines whether to limit the genome build to the lines listed
in the \fB\-\-names\fR file.  You can limit a genome build to certain
chromosomes with this option, plus a \fB\-\-names\fR file that either
renames chromosomes, or lists the same names in both columns for
the desired chromosomes.
.TP
\fB\-k\fR, \fB\-\-kmer\fR=\fI\,INT\/\fR
k\-mer value for genomic index (allowed: 15 or less, default is 15)
.TP
\fB\-q\fR INT
sampling interval for genomoe (allowed: 1\-3, default 3)
.TP
\fB\-s\fR, \fB\-\-sort\fR=\fI\,STRING\/\fR
Sort chromosomes using given method:
none \- use chromosomes as found in FASTA file(s) (default)
alpha \- sort chromosomes alphabetically (chr10 before chr 1)
numeric\-alpha \- chr1, chr1U, chr2, chrM, chrU, chrX, chrY
chrom \- chr1, chr2, chrM, chrX, chrY, chr1U, chrU
names \- sort chromosomes based on file provided to \fB\-\-names\fR flag
.TP
\fB\-g\fR, \fB\-\-gunzip\fR
Files are gzipped, so need to gunzip each file first
.TP
\fB\-E\fR, \fB\-\-fasta\-pipe\fR=\fI\,STRING\/\fR
Interpret argument as a command, instead of a list of FASTA files
.TP
\fB\-Q\fR, \fB\-\-fastq\fR
Files are in FASTQ format
.TP
\fB\-R\fR, \fB\-\-revcomp\fR
Reverse complement all contigs
.TP
\fB\-w\fR INT
Wait (sleep) this many seconds after each step (default 2)
.TP
\fB\-o\fR, \fB\-\-circular\fR=\fI\,STRING\/\fR
Circular chromosomes (either a list of chromosomes separated
by a comma, or a filename containing circular chromosomes,
one per line).  If you use the \fB\-\-names\fR feature, then you
should use the substitute name of the chromosome, not the
original name, for this option.  (NOTE: This behavior is different
from previous versions, and starts with version 2020\-10\-20.)
.TP
\fB\-2\fR, \fB\-\-altscaffold\fR=\fI\,STRING\/\fR
File with alt scaffold info, listing alternate scaffolds,
one per line, tab\-delimited, with the following fields:
(1) alt_scaf_acc, (2) parent_name, (3) orientation,
(4) alt_scaf_start, (5) alt_scaf_stop, (6) parent_start, (7) parent_end.
.TP
\fB\-e\fR, \fB\-\-nmessages\fR=\fI\,INT\/\fR
Maximum number of messages (warnings, contig reports) to report (default 50)
.SS "Options for older genome formats:"
.TP
\fB\-M\fR, \fB\-\-mdflag\fR=\fI\,STRING\/\fR
Use MD file from NCBI for mapping contigs to
chromosomal coordinates
.TP
\fB\-C\fR, \fB\-\-contigs\-are\-mapped\fR
Find a chromosomal region in each FASTA header line.
Useful for contigs that have been mapped
to chromosomal coordinates.  Ignored if the \fB\-\-mdflag\fR is provided.
.SS "Options for transcriptome-guided alignment:"
.TP
\fB\-c\fR, \fB\-\-transcriptomedb\fR=\fI\,STRING\/\fR
Transcriptome name
.TP
\fB\-T\fR, \fB\-\-transcripts\fR=\fI\,FILE\/\fR
FASTA file containing transcripts (required if specifying
\fB\-\-transcriptomedb\fR)
.TP
\fB\-t\fR, \fB\-\-nthreads\fR=\fI\,INT\/\fR
Number of threads for GMAP alignment of transcripts to genome
(default 8)

.TP
Other tools of GMAP suite are located in /usr/lib/gmap