.\" Man page generated from reStructuredText.
.
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "MIRTOP" "1" "Feb 04, 2024" "0.3" "mirtop"
.SH NAME
mirtop \- mirtop Documentation
.SH LOGO COMPETITION
.sp
Looking for a logo, enter the competition \fI\%here\fP\&.
Deadline 07/07/2018. Win a t\-shirt and stickers if your logo is selected!
.sp
We got a logo: \fI\%https://github.com/miRTop/mirtop/tree/master/artwork\fP
# Installation
.sp
## bioconda
.sp
\fIconda install mirtop \-c bioconda\fP
.sp
## pypi
.sp
\fIpip install mirtop\fP
.sp
## update to develop version from pip
.sp
\fB\(ga
pip install \-\-upgrade \-\-no\-deps git+https://github.com/miRTop/mirtop.git#egg=mirtop
\(ga\fP
.sp
## install develop version
.sp
Thes best solution is to install conda to get an independent environment.
.sp

.nf
\(ga\(ga
.fi
\(ga
wget \fI\%http://repo.continuum.io/miniconda/Miniconda\-latest\-Linux\-x86_64.sh\fP
.sp
bash Miniconda\-latest\-Linux\-x86_64.sh \-b \-p ~/mirtop_env
.sp
export PATH=$PATH:~/mirtop_env
.sp
conda install \-c bioconda bioconda bedtools samtools pip nose pysam pandas dateutil pyyaml pybedtools biopython setuptools
.sp
git clone \fI\%http://github.com/miRTop/mirtop\fP
cd mirtop
git fetch origin dev
git checkout dev
.sp
python setup.py develop

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

# Quick Start
.sp
## Importer
.sp
### From Bam files to GFF3
.sp
\fB\(ga
git clone mirtop
cd mirtop/data
\(ga\fP
.sp
You can use the example data. Here the reads have been mapped to the precursor sequences.
.sp
\fB\(ga
mirtop gff \-sps hsa \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 \-o test_out sim_isomir.bam
\(ga\fP
.sp
### From \fIseqbuster::miraligner\fP files to GFF3
.sp
miRNA annotation generated from [miraligner](\fI\%https://github.com/lpantano/seqbuster\fP) tool:
.sp
\fB\(ga
mirtop gff \-\-format seqbuster \-\-sps hsa \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 \-o test_out examples/seqbuster/reads.mirna
\(ga\fP
.sp
### From \fIsRNAbench\fP files to GFF3
.sp
miRNA annotation generated from [sRNAbench](\fI\%http://bioinfo2.ugr.es:8080/ceUGR/srnabench/\fP) tool:
.sp
\fB\(ga
mirtop gff \-\-format sranbench \-sps hsa \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 \-o test_out srnabench examples/srnabench
\(ga\fP
.sp
### From \fIPROST!\fP files to GFF3
.sp
miRNA annotation generated from [PROST!]() tool. Export isomiRs tab from excel file to a tabular text format file.
.sp
\fB\(ga
mirtop gff \-\-format prost \-sps hsa \-\-hairpin  examples/annotate/hairpin.fa \-\-gtf  examples/annotate/hsa.gff3 \-o test_out examples/prost/prost.example.txt
\(ga\fP
.sp
### From \fIisomiR\-SEA\fP files to GFF3
.sp
miRNA annotation generated from [isomiR\-SEA]() tool.
.sp
\fB\(ga
mirtop validate examples/gff/correct_file.gff
\(ga\fP
.sp
## Operations
.sp
### Validator
.sp
To validate your mirGFF3 file and make sure if follows the current format:
.sp
\fB\(ga
mirtop gff \-\-format isomirsea \-sps hsa \-\-hairpin  examples/annotate/hairpin.fa \-\-gtf  examples/annotate/hsa.gff3 \-o  test_out examples/isomir\-sea/tagMir\-all.gff
\(ga\fP
.sp
### Get statistics from GFF
.sp
Get number of isomiRs and miRNAs annotated in the GFF file by isomiR category.
.sp
\fB\(ga
cd mirtop/data
mirtop stats \-o test_out example/gff/correct_file.gff
\(ga\fP
.sp
### Compare GFF file with reference
.sp
Compare the sequences from two or more GFF files. The first one will be used as the reference data.
.sp
\fB\(ga
cd mirtop/data
mirtop compare \-o test_out example/gff/correct_file.gff example/gff/alternative.gff
\(ga\fP
.sp
### Updates mirGFF3
.sp
Updates older versions with the most current one.
.sp
\fB\(ga
cd mirtop/data
mirtop update \-o test_out_mirs examples/versions/version1.0.gff
\(ga\fP
.sp
## Export
.sp
### Export file to isomiRs format
.sp
To be compatible with [isomiRs](\fI\%https://bioconductor.org/packages/release/bioc/html/isomiRs.html\fP) bioconductor package use:
.sp
\fB\(ga
cd mirtop/data
mirtop export \-o test_out_mirs \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 examples/gff correct_file.gff
\(ga\fP
.sp
### Export file to FASTA format
.sp
\fB\(ga
cd mirtop/data
mirtop export \-o test_out_mirs \-\-format fasta \-d \-vd \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 examples/gff/correct_file.gff
\(ga\fP
.sp
### Export file to VCF format
.sp
\fB\(ga
cd mirtop/data
mirtop export \-o test_out_mirs \-\-format vcf \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/a
nnotate/hsa.gff3 examples/gff/correct_file.gff
\(ga\fP
.sp
### Get count file
.sp
This file it is useful to load into R as a matrix. It contains the minimal information about each sequence and the count data in columns for each samples.
.sp
\fB\(ga
cd mirtop/data
mirtop counts \-o test_out_mirs \-\-hairpin examples/annotate/hairpin.fa \-\-gtf examples/annotate/hsa.gff3 examples/synthetic/let7a\-5p.gtf
\(ga\fP
# Output
.sp
## GFF command
.sp
The \fImirtop gff\fP generates the GFF3 adapter format to capture miRNA variations. The output is explained [here](\fI\%https://github.com/miRTop/incubator/blob/master/format/definition.md\fP).
.sp
## Stats command
.sp
The \fImirtop stats\fP generates a table with different statistics for each type of isomiRs:
.INDENT 0.0
.IP \(bu 2
total counts
.IP \(bu 2
average counts
.IP \(bu 2
total sequences
.UNINDENT
.sp
It generates as well a JSON file with the same information to be integrated easily with QC tools like [MultiQC](\fI\%https://multiqc.info/\fP).
.sp
## Compare command
.sp
The \fImirtop compare\fP generates a tabular file with information about the difference and similarities. The first file in the command line will be considered the reference and the following files will be compared to the reference. Each line of the output has the following information for each file:
.INDENT 0.0
.IP \(bu 2
sample
.IP \(bu 2
idu
.IP \(bu 2
seq
.IP \(bu 2
tag: \fIE\fP if not in reference, \fID\fP detected in both, \fIM\fP missing in target file
.IP \(bu 2
same_mirna: if the sequence map to the same miRNA in the reference and target file
.IP \(bu 2
one column for each isomiR type with the following tags: \fIFP\fP (variation not in reference), \fITP\fP (variation in both), \fIFN\fP (variation not in target file)
.UNINDENT
.sp
## Counts command
.sp
The \fImirtop counts\fP generates a tabular file with the following columns:
.INDENT 0.0
.IP \(bu 2
unique identifier
.IP \(bu 2
read sequence
.IP \(bu 2
miRNA name
.IP \(bu 2
Variant attribute from GFF3 column
.IP \(bu 2
One column for each isomiR type showing the exact variation
.IP \(bu 2
One column for each sample with the counts for that sequence
.UNINDENT
.sp
## Export command
.sp
The \fImirtop export\fP generates different files from a mirGFF3 file:
.INDENT 0.0
.IP \(bu 2
[isomiRs](\fI\%https://bioconductor.org/packages/release/bioc/html/isomiRs.html\fP) compatible files
.IP \(bu 2
[FASTA files](\fI\%https://en.wikipedia.org/wiki/FASTA_format\fP)
.IP \(bu 2
[VCF files](\fI\%https://samtools.github.io/hts\-specs/VCFv4.2.pdf\fP)
.UNINDENT
# Structure of the code
.INDENT 0.0
.IP \(bu 2
mirtop/bam
* __bam.py__
.INDENT 2.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fIread_bam\fP: reads BAM files with pysamtools and store in a key \- value object
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 2.0
.IP \(bu 2
__filter.py__
* \fItune\fP: if option \fI\-\-clean\fP is on, filter according generic rules
* \fIclean_hits\fP: get the top hits
.UNINDENT
.IP \(bu 2
mirtop/gff
* __init.py__ wraps the conversion process to GFF3
* __body.py__ \fIcreate\fP will create the line according GFF format established.
.INDENT 2.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fIread_gff_line\fP: Inside a for loop to read line of the file. It\(aqll return and structure key:value dictionary for each column.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 2.0
.IP \(bu 2
__header.py__ generate header and read header section.
.IP \(bu 2
__check.py__ checks header and single lines to be valid according GFF format  (NOT IMPLEMENTED)
.IP \(bu 2
__stats.py__ GFF stats counting number of isomiR, their total and average expression
.IP \(bu 2
__query.py__ accept SQlite queries after option \-q \(dq\(dq
.IP \(bu 2
__convert.py__
* \fIcreate_counts\fP table of counts
* allow filtering by attribute
* allow collapse by miRNA/isomiR type
.IP \(bu 2
__filter.py__, parse from query (NOT IMPLEMENTED)
.UNINDENT
.IP \(bu 2
mirtop/mirna
* __fasta.py__:
.INDENT 2.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fIread_precursor\fP fasta file: key \- value
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 2.0
.IP \(bu 2
__realign.py__:
* \fIhits\fP: class that defines hits
* \fIisomir\fP: class that defines each sequence
* \fIcigar_correction\fP: function that use CIGAR to make sequence to miRNA alignemt
* \fIread_id\fP and \fImake_id\fP: shorter ID for sequences
* \fImake_cigar\fP: giving an alignment return the CIGAR of it
* \fIreverse_complement\fP: return the reverse complement of a sequence
* \fIalign\fP: uses biopython to align two sequences of the same size
* \fIexpand_cigar\fP: from a 12M to MMMMMMMMMMMM
* \fIcigar2snp\fP: from CIGAR code to list of changes with position and reference and target nts
.IP \(bu 2
__mapper.py__:
* \fIread_gtf\fP file: map genomic miRNA position to precursos position, then it needs genomic position for the miRNA and the precursor. Return would be like {mirna: [start, end]}
.IP \(bu 2
__annotate.py__:
* \fIannotate\fP: read isomiRs and populate all attributes related to isomiRs
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
.INDENT 2.0
.TP
.B mirtop/importer:
.INDENT 7.0
.IP \(bu 2
seqbuster.py
.IP \(bu 2
prost.py
.IP \(bu 2
srnabench.py
.IP \(bu 2
isomirsea.py
.UNINDENT
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B mirtop/exporter:
.INDENT 7.0
.IP \(bu 2
isomirs.py: export file to match [isomiRs BioC package](\fI\%https://github.com/lpantano/isomiRs\fP).
.UNINDENT
.UNINDENT
.IP \(bu 2
data/examples/
* check gff files: example of correct, invalid, warning GFF files
* check BAM file
* check mapping from genome position to precursor position, example of +/\- strand. Using \fImirtop/mirna/map.read_gtf\fP\&.
* check clean option: sequence mapping to multiple precursors/mirna, get the best score. Using \fImirtop/bam/filter.clean_hits\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
To add new sub\-commands, modify the following:
.INDENT 0.0
.IP \(bu 2
mirtop/lib/parse.py
* query: TODO
* transform: TODO
* create: TODO
* check: TODO
.UNINDENT
# Examples of contributions
.sp
## How to add a new sub\-command
.sp
\fBYou need first to clone and install the tool in [develop mode](installation.html)\fP
.sp
Let\(aqs say that you want to add a new operation to \fImirtop\fP, for instance, similar to the \fIstats\fP command to work with sGFF3 files. Assume a \fItest\fP function for this example to just read the file and print \fIHello GFF3.\fP
.INDENT 0.0
.IP \(bu 2
Create the folder inside \fImirtop/test\fP\&. The create to empty files named:
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fItest.py\fP
.IP \(bu 2
\fI__init__.py\fP
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Modify the \fItest.py\fP file with this content:
.UNINDENT
.sp

.nf
\(ga\(ga
.fi
\(ga
from mirtop.gff.body import read_gff_line
.sp
import mirtop.libs.logger as mylog
logger = mylog.getLogger(__name__)
.INDENT 0.0
.TP
.B def test(args):
.INDENT 7.0
.TP
.B for fn in args.files:
_test(fn)
logger.info(\(dqHello GFF3: %s\(dq % fn)
.UNINDENT
.TP
.B def _test(fn):
logger.debug(\(dqI am going to read this file: %s\(dq % fn)
for line in fn:
.INDENT 7.0
.INDENT 3.5
read_gff_line(line)
.UNINDENT
.UNINDENT
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.INDENT 0.0
.IP \(bu 2
Choose a sub_command name, in this case: \fItest\fP\&.
.IP \(bu 2
Add the arguments function at the end of this file: \fI\%https://github.com/miRTop/mirtop/blob/dev/mirtop/libs/parse.py\fP, using a naming following \fIadd_subparser_test\fP\&.
.UNINDENT
.sp

.nf
\(ga\(ga
.fi
\(ga
def add_subparser_test(subparsers):
.INDENT 0.0
.INDENT 3.5
parser = subparsers.add_parser(\(dqtest\(dq, help=\(dqtest function\(dq)
parser.add_argument(\(dqfiles\(dq, nargs=\(dq*\(dq, help=\(dqGFF/GTF files.\(dq)
parser = _add_debug_option(parser)
return parser
.UNINDENT
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.INDENT 0.0
.IP \(bu 2
Add the function name to \fIparse_cl\fP function, at the end of the \fIsub_cmds\fP array.
.UNINDENT
.INDENT 0.0
.TP
.B 
.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.INDENT 7.0
.TP
.B sub_cmds = {\(dqgff\(dq: add_subparser_gff,
\(dqstats\(dq: add_subparser_stats,
\(dqcompare\(dq: add_subparser_compare,
\(dqtarget\(dq: add_subparser_target,
\(dqsimulator\(dq: add_subparser_simulator,
\(dqcounts\(dq: add_subparser_counts,
\(dqexport\(dq: add_subparser_export,
\(dqtest\(dq: add_subparser_test
}
.UNINDENT
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.INDENT 0.0
.IP \(bu 2
To get the function re\-directed from the command line when you use the sub_cmd name, add a line to the \fIcommand_line.py\fP file, adding another \fIelse\fP statement:
.UNINDENT
.INDENT 0.0
.TP
.B 
.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

elif \(dqtest\(dq in kwargs:
logger.info(\(dqRun test.\(dq)
test(kwargs[\(dqargs\(dq])
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.INDENT 0.0
.IP \(bu 2
The function you use to link to the operation added need to be imported at the beginning. Let\(aqs say that the \fItest\fP function is at \fImirtop/test/test.py\fP:
.UNINDENT
.sp
\fB\(ga
from mirtop.test import test
\(ga\fP
.sp
Try the new operation:
.sp
\fB\(ga
mirtop test data/examples/correct_file.gff
\(ga\fP
.sp
## Add a unit test
.sp
## for the internal function
.sp
Add to the end of \fItest/test_functions.py\fP, but inside \fIclass FunctionsTest(unittest.TestCase):\fP this code:
.INDENT 0.0
.TP
.B 
.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

@attr(fn_test=True)
def test_function_test(self):
.INDENT 7.0
.INDENT 3.5
from mirtop import test
test._test(\(dqdata/examples/gff/correct_file.gff\(dq)
.UNINDENT
.UNINDENT
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.sp
## for the sub\-command
.sp
Add to the end of \fItest/test_function.py\fP, but inside \fIclass AutomatedAnalysisTest(unittest.TestCase):\fP this code:
.INDENT 0.0
.TP
.B 
.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

@attr(cmd_test=True)
def test_srnaseq_annotation_bam(self):
.INDENT 7.0
.INDENT 3.5
\(dq\(dq\(dqRun test analysis
\(dq\(dq\(dq
with make_workdir():
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B clcode = [\(dqmirtop\(dq,
\(dqtest\(dq,
\(dq../../data/examples/gff/correct_file.gff\(dq]
.UNINDENT
.sp
print(\(dq\(dq)
print(\(dq \(dq.join(clcode))
subprocess.check_call(clcode)
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.sp

.nf
\(ga\(ga
.fi

.nf
\(ga
.fi

.sp
## test the unit
.sp
\fBnose is needed: pip install nose\fP
.sp
Run the function test from the top parent folder:
.sp
\fB\(ga
\&./run_test.sh fn_test
\(ga\fP
.sp
Run the command test from the top parent folder:
.sp
\fB\(ga
\&./run_test.sh cmd_test
\(ga\fP
.SH DOCUMENTATION FOR THE CODE
.SS bam
.sp
Read bam files
.INDENT 0.0
.TP
.B mirtop.bam.bam.read_bam(bam_fn, args, clean=True)
Read bam file and perform realignment of hits
.INDENT 7.0
.TP
.B Args:
\fIbam_fn\fP: a BAM file with alignments to the precursor
.INDENT 7.0
.TP
.B \fIprecursors\fP: dict with keys being precursor names and values
being sequences. Come from mirtop.mirna.fasta.read_precursor().
.UNINDENT
.sp
\fIclean\fP: Use mirtop.filter.clean_hits() to remove lower score hits.
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIreads (dict)\fP:
keys are read_id and values are \fImirtop.realign.hits\fP
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.bam.filter.clean_hits(reads)
Select only best matches from a list of hits from the same read.
.INDENT 7.0
.TP
.B Args:
\fIreads\fP: dictionary as:
.sp
.EX
>>> {\(aqread_id\(aq: mirtop.realign.hits, ...}
.EE
.UNINDENT
.sp
Returns:
.INDENT 7.0
.INDENT 3.5
\fIreads\fP: same than input but with best hits only.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.bam.filter.tune(seq, precursor, start, cigar)
The actual fn that will realign the sequence to find the nt changes
at 5\(aq, 3\(aq sequence and nt variations.
.INDENT 7.0
.TP
.B Args:
\fIseq (str)\fP: sequence of the read.
.sp
\fIprecursor (str)\fP: sequence of the precursor.
.sp
\fIstart (int)\fP: start position of sequence on the precursor, +1.
.sp
\fIcigar (str)\fP: similar to SAM CIGAR attribute.
.UNINDENT
.sp
Returns:
.INDENT 7.0
.INDENT 3.5
\fIlist\fP with:
.INDENT 0.0
.INDENT 3.5
subs (list): substitutions
.sp
add (list): nt added to the end
.sp
cigar (str): updated cigar
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS exporter
.sp
Read GFF files and output isomiRs compatible format
.INDENT 0.0
.TP
.B mirtop.exporter.isomirs.convert(args)
Main function to convert from GFF3 to isomiRs Bioc Package.
.sp
Reads a GFF file to produces output file containing Expression counts
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments parsed from command line with
\fImirtop.libs.parse.add_subparser_counts()\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIfile (file)\fP: with columns like:
UID miRNA Variant Sample1 Sample2 ... Sample N
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read GFF files and output FASTA format
.INDENT 0.0
.TP
.B mirtop.exporter.fasta.convert(args)
Main function to convert from GFF3 to FASTA format.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs\fP: supported options for this sub\-command.
See \fImirtop.libs.parse.add_subparser_export()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.exporter.vcf.cigar_2_key(cigar, readseq, refseq, pos, var5p, var3p, parent_ini_pos, parent_end_pos, hairpin)
.INDENT 7.0
.TP
.B Args:
\(aqcigar(str)\(aq: CIGAR standard of a compressed alignment representation, this CIGAR omits the \(aq1\(aq integer.
\(aqreadseq(str)\(aq: the read sequence
\(aqrefseq(str)\(aq: the reference sequence
\(aqpos(str)\(aq: the start current position
\(aqvar5p(int)\(aq: extra nucleotides not in the reference miRNA (5p strand)
\(aqvar3p(int)\(aq: extra nucleotides not in the reference miRNA (3p strand)
\(aqparent_ini_pos(int)\(aq: the start position of the parent miRNA
\(aqparent_end_pos(int)\(aq: the last position of the parent miRNA
\(aqhairpin(str)\(aq: the string of the hairpin for all the miRNA
.TP
.B Returns:
\(aqkey_pos(str list)\(aq: a list with the positions of the variants.
\(aqkey_var(str list)\(aq: a list with the variant keys found.
\(aqref(str)\(aq: reference base(s).
\(aqalt(str)\(aq: altered base(s).
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.exporter.vcf.convert(args)
Main function to convert from GFF3 to VCF.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs\fP: supported options for this sub\-command.
See \fImirtop.libs.parse.add_subparser_export()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.exporter.vcf.create_vcf(mirgff3, precursor, gtf, vcffile)
.INDENT 7.0
.TP
.B Args:
\(aqmirgff3(str)\(aq: File with mirGFF3 format that will be converted
\(aqprecursor(str)\(aq: Fasta format sequences of all miRNA hairpins
\(aqgtf(str)\(aq: Genome coordinates
\(aqvcffile\(aq: name of the file to be saved
.TP
.B Returns:
Nothing is returned, instead, a VCF file is generated
.UNINDENT
.UNINDENT
.SS gff
.sp
GFF reader and creator helpers
.INDENT 0.0
.TP
.B mirtop.gff.body.create(reads, database, sample, args, quiet=False)
Read \fI\%https://github.com/miRTop/mirtop/issues/9\fP
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.lift_to_genome(line, mapper)
.INDENT 7.0
.TP
.B Function to get a class of type feature from classgff.py
and map the precursors coordinates to the genomic coordinates
.TP
.B Args:
\fIline(str)\fP: string GFF line.
\fImapper(dict)\fP: dict with mirna\-precursor\-genomic coordinas from
.INDENT 7.0
.INDENT 3.5
mirna.mapper.read_gtf_to_mirna function.
.UNINDENT
.UNINDENT
.TP
.B Returns:
\fI(line)\fP: string with GFF line with updated chr, star, end, strand
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.paste_columns(line, sep=\(aq \(aq)
Create GFF/GTF line from read_gff_line
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.read(fn, args)
Read GTF/GFF file and load into annotate, chrom counts, sample, line
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.read_gff_line(line)
Read GFF/GTF line and return dictionary with fields
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.read_variant(attrb, sep=\(aq \(aq)
Read string in variants attribute.
.INDENT 7.0
.TP
.B Args:
\fIattrb(str)\fP: string in Variant attribute.
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fI(gff_dict)\fP: dictionary with:
.sp
.EX
>>> {\(aqiso_3p\(aq: \-3, ...}
.EE
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.body.variant_with_nt(line, precursors, matures)
Return nucleotides changes for each variant type
using Variant attribute, precursor sequences and
mature position.
.UNINDENT
.sp
Compare multiple GFF files to a reference
.INDENT 0.0
.TP
.B mirtop.gff.compare.compare(args)
From a list of GFF files produce comparison with a reference set.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments parsed from command line with
\fImirtop.libs.parse.add_subparser_compare()\fP\&.
First file will be considered the reference set.
.UNINDENT
.TP
.B Returns:
\fI(out_file)\fP: comparison of the GFF files with the reference.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.compare.read_reference(fn)
Read GFF into UID:Variant
.INDENT 7.0
.TP
.B Args:
\fIfn (str)\fP: GFF file.
.TP
.B Returns:
\fIsrna (dict)\fP: dict with >>> {\(aqUID\(aq: \(aqiso_snp:\-2,...\(aq}
.UNINDENT
.UNINDENT
.sp
Helpers to define the header fo the GFF file
.INDENT 0.0
.TP
.B mirtop.gff.header.create(samples, database, custom, filter=None)
Create header for GFF file.
.INDENT 7.0
.TP
.B Args:
\fIsamples (list)\fP: character list with names for samples
.sp
\fIdatabase (str)\fP: name of the database.
.sp
\fIcustom (str)\fP: extra lines.
.sp
\fIfilter (list)\fP: character list with filter definition.
.TP
.B Returns:
\fIheader (str)\fP: header string.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.header.read_samples(fn)
Read samples from the header of a GFF file.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: GFF file to read.
.TP
.B Returns:
\fI(list)\fP: character list with sample names.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.header.read_version(fn)
Extract mirGFF3 version
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.merge.merge(dts, samples)
For dictionary with sample as keys and values as lines
merge them into one GFF file.
.INDENT 7.0
.TP
.B Args:
\fIdts(dict)\fP: dictionary as >>> {\(aqfile\(aq: {\(aqmirna\(aq: {start: gff_list}}}.
gff_list has the format as defined in \fImirtop.gff.body.read()\fP\&.
.sp
\fIsamples(list)\fP: character list with sample names.
.TP
.B Returns:
\fImerged_lines (nested dicts)\fP:gff_list has the format as defined in \fImirtop.gff.body.read()\fP\&.
.UNINDENT
.UNINDENT
.sp
Produce stats from GFF3 format
.INDENT 0.0
.TP
.B mirtop.gff.stats.stats(args)
From a list of GFF files produce general isomiRs stats.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs (namedtupled)\fP: arguments parsed from command line with
\fImirtop.libs.parse.add_subparser_stats()\fP\&.
.UNINDENT
.TP
.B Returns:
\fI(stdout) or (out_file)\fP: GFF general stats.
.UNINDENT
.UNINDENT
.sp
Update gff3 files to newest version
.INDENT 0.0
.TP
.B mirtop.gff.update.convert(args)
Update previous GFF3 versions.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs (namedtupled)\fP: arguments parsed from command line with
\fImirtop.libs.parse.add_subparser_update()\fP\&.
.UNINDENT
.TP
.B Returns:
\fI(out_file)\fP: most updated GFF3 file.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.update.update_file(gff_file, new_gff_file)
Update file from file version to current version
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.gff.validator.check_multiple(args)
Check GFF3 format.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIargs (namedtupled)\fP: arguments parsed from command line with
\fImirtop.libs.parse.add_subparser_validator()\fP\&.
.UNINDENT
.TP
.B Returns:
\fI(std_out)\fP: warnings or errors of the files showing issues with the format.
.UNINDENT
.UNINDENT
.SS importer
.sp
Read isomiR GFF files
.INDENT 0.0
.TP
.B mirtop.importer.isomirsea.cigar2variants(cigar, sequence, tag)
From cigar to Variants in GFF format
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.importer.isomirsea.header(fn)
Custom header for isomiR\-SEA importer.
.INDENT 7.0
.TP
.B Args:
\fIfn (str)\fP: file name with isomiR\-SEA GFF output
.TP
.B Returns:
\fI(str)\fP: isomiR\-SEA header string.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.importer.isomirsea.read_file(fn, args)
Read isomiR\-SEA file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with isomiR\-SEA output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIreads (nested dicts)\fP:gff_list has the format as
defined in \fImirtop.gff.body.read()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read prost! files
.INDENT 0.0
.TP
.B mirtop.importer.prost.header()
Custom header for PROST! importer.
.INDENT 7.0
.TP
.B Returns:
\fI(str)\fP: PROST! header string.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.importer.prost.read_file(fn, hairpins, database, mirna_gtf)
Read PROST! file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with PROST output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
\fIreads\fP: dictionary where keys are read_id and values are \fImirtop.realign.hits\fP
.UNINDENT
.UNINDENT
.sp
Read seqbuster files
.INDENT 0.0
.TP
.B mirtop.importer.seqbuster.header()
Custom header for seqbuster importer.
.INDENT 7.0
.TP
.B Returns:
\fI(str)\fP: seqbuster header string.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.importer.seqbuster.read_file(fn, args)
Read seqbuster file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with seqbuster output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
\fIreads\fP: dictionary where keys are read_id and values are \fImirtop.realign.hits\fP
.UNINDENT
.UNINDENT
.sp
Read sRNAbench files
.INDENT 0.0
.TP
.B mirtop.importer.srnabench.read_file(folder, args)
Read sRNAbench file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with sRNAbench output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIreads (nested dicts)\fP:gff_list has the format as
defined in \fImirtop.gff.body.read()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read isomiR GFF files from optimir tool
.INDENT 0.0
.TP
.B mirtop.importer.optimir.read_file(fn, args)
Read OptimiR file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with isomiR\-SEA output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIreads (nested dicts)\fP:gff_list has the format as
defined in \fImirtop.gff.body.read()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read Manatee files
.INDENT 0.0
.TP
.B mirtop.importer.manatee.read_file(fn, database, args)
Read Manatee file and convert to mirtop GFF format.
.INDENT 7.0
.TP
.B Args:
\fIfn(str)\fP: file name with Manatee output information.
.sp
\fIdatabase(str)\fP: database name.
.INDENT 7.0
.TP
.B \fIargs(namedtuple)\fP: arguments from command line.
See \fImirtop.libs.parse.add_subparser_gff()\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIreads (nested dicts)\fP:gff_list has the format as
defined in \fImirtop.gff.body.read()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.SS libs
.sp
Centralize running of external commands, providing logging and tracking. Integrated from bcbio package with some changes.
.INDENT 0.0
.TP
.B mirtop.libs.do.find_bash()
Find bash full path
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.do.find_cmd(cmd)
Find command in session
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.do.run(cmd, data=None, checks=None, region=None, log_error=True, log_stdout=False)
Run the provided command, logging details and checking for errors.
.UNINDENT
.sp
Helpers to work with fastq files
.INDENT 0.0
.TP
.B mirtop.libs.fastq.is_fastq(in_file)
.INDENT 7.0
.TP
.B Check whether file is fastq accepting
txt, fq and fastq extensions understanding
compression with gzip: .gzip and .gz
(copy from bcbio)
.TP
.B Args:
\fIin_file(str)\fP: file name.
.TP
.B Returns:
\fI(boolean)\fP: Yes or Not.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.fastq.open_fastq(in_file)
.INDENT 7.0
.TP
.B open a fastq file, using gzip if it is gzipped
(from bcbio package)
.TP
.B Args:
\fIin_file(str)\fP: file name.
.TP
.B Returns:
\fI(File)\fP: file handler.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.fastq.splitext_plus(fn)
.INDENT 7.0
.TP
.B Split on file extensions, allowing for zipped extensions.
(copy from bcbio)
.TP
.B Args:
\fIfn(str)\fP: file name.
.TP
.B Returns:
\fIbase, ext(str, str)\fP: basename and extension.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.parse.parse_cl(in_args)
Function to parse the subcommands arguments.
.UNINDENT
.sp
utils from \fI\%http://www.github.com/chapmanb/bcbio\-nextgen.git\fP
.INDENT 0.0
.TP
.B mirtop.libs.utils.chdir(p)
Change dir temporaly using \fIwith\fP:
.sp
.EX
>>> with chdir(temporal):
        do_something()
.EE
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.utils.file_exists(fname)
Check if a file exists and is non\-empty.
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.utils.safe_dirs(dirs)
Create folder if not exitsts
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.libs.utils.safe_remove(fn)
Remove file skipping
.UNINDENT
.SS mirna
.sp
Read bam files
.INDENT 0.0
.TP
.B mirtop.mirna.annotate.annotate(reads, mature_ref, precursors, quiet=False)
Using coordinates, mismatches and realign to annotate isomiRs
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIreads(dicts of hits)\fP:
dict object that comes from \fImirotp.bam.bam.read_bam()\fP
.TP
.B \fImirbase_ref (dict of mirna positions)\fP:
dict object that comers from \fImirtop.mirna.read_mature()\fP
.TP
.B \fIprecursors dict object (key : fasta)\fP:
that comes from \fImirtop.mirna.fasta.read_precursor()\fP
.TP
.B \fIquiet(boolean)\fP:
verbosity state
.UNINDENT
.TP
.B Return:
.INDENT 7.0
.TP
.B \fIreads (dict)\fP:
dictionary where keys are read_id and
values are \fImirtop.realign.hits\fP
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read precursor fasta file
.INDENT 0.0
.TP
.B mirtop.mirna.fasta.read_precursor(precursor, sps=None)
Load precursor file for that species
.INDENT 7.0
.TP
.B Args:
\fIprecursor(str)\fP: file name with fasta sequences
.INDENT 7.0
.TP
.B \fIsps(str)\fP: if any, select species to keep.
It\(aqll do a \fIheader_sequence.find(sps)\fP\&.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIhairpin(dict)\fP: keys are precursor names and
values are precursor sequences.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Read database information
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.get_primary_transcript(database)
.INDENT 7.0
.TP
.B Get the ID to identify the primary transcript in the
GTF file with the miRNA and precursor coordinates
to be able to parse BAM files with genomic
coordinates.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.guess_database(args)
Guess database name from GFF file.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
\fIdatabase(str)\fP: name of the database
.UNINDENT
.sp
TODO: this needs to be generic to other databases.
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.read_gtf_chr2mirna(gtf)
Load GTF file with precursor positions on genome.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIdb_mir(dict)\fP: dictionary with keys being chr and values
mirna and genomic positions.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.read_gtf_to_mirna(gtf)
Load GTF file with precursor positions on genome.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fIdb_mir(dict)\fP: dictionary with keys being mirnas and values
genomic positions.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.read_gtf_to_precursor(gtf)
Load GTF file with precursor positions on genome
Return dict with key being precursor name and
value a dict of mature miRNA with relative position
to precursor.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
\fImap_dict(dict)\fP:
.sp
.EX
>>> {\(aqparent\(aq: {mirna: [start, end]}}
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.read_gtf_to_precursor_mirbase(gtf, format=\(aqprecursor\(aq)
Load GTF file with precursor positions on genome
Return dict with key being precursor name and
value a dict of mature miRNA with relative position
to precursor. For miRBase and similar GFF3 files.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
\fImap_dict(dict)\fP:
.sp
.EX
>>> {\(aqparent\(aq: {mirna: [start, end]}}
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.mapper.read_gtf_to_precursor_mirgenedb(gtf, format=\(aqprecursor\(aq)
Load GTF file with precursor positions on genome
Return dict with key being precursor name and
value a dict of mature miRNA with relative position
to precursor. For MirGeneDB and similar GFF3 files.
.INDENT 7.0
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIgtf(str)\fP: file name with GFF miRNA genomic positions and
header lines.
.UNINDENT
.TP
.B Returns:
\fImap_dict(dict)\fP:
.sp
.EX
>>> {\(aqparent\(aq: {mirna: [start, end]}}
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.align(x, y, local=False)
Pairwise alignments between two sequenes.
\fI\%https://medium.com/towards\-data\-science/pairwise\-sequence\-alignment\-using\-biopython\-d1a9d0ba861f\fP
.INDENT 7.0
.TP
.B Args:
\fIx(str)\fP: short sequence.
.sp
\fIy(str)\fP: long sequence.
.sp
\fIlocal(boolean)\fP: local or global alignment.
.TP
.B Returns:
\fIaligned_x(hit)\fP: alignment information, socre and positions.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.align_from_variants(sequence, mature, variants)
.INDENT 7.0
.TP
.B Giving the sequence read,
the mature from get_mature_sequence,
and the variant GFF annotation:
get a list of substitutions
.TP
.B Args:
\fIsequence(str)\fP: read sequence.
.INDENT 7.0
.TP
.B \fImature(str)\fP: mature sequence from
\fImirtop.mirna.realing.get_mature_sequence()\fP\&.
.UNINDENT
.sp
\fIvariants(str)\fP: string from Variant attribute in GFF file.
.TP
.B Returns:
\fIsnp(list)\fP: [[pos, target, reference]]
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.cigar2snp(cigar, reference)
From a CIGAR string and reference sequence
detect mistmatches positions and reference and
target nucleotides.
.INDENT 7.0
.TP
.B Args:
\fIcigar(str)\fP: CIGAR string.
.sp
\fIreference(str)\fP: reference sequence.
.TP
.B Returns:
\fIsnp(list)\fP: position of mismatches (indels included) as:
.sp
.EX
>>> [pos, seq_nt, ref_nt]
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.cigar_correction(cigarLine, query, target)
Read from CIGAR in BAM file to define mismatches.
.INDENT 7.0
.TP
.B Args:
\fIcirgarLine(str)\fP: CIGAR string from BAM file.
.sp
\fIquery(str)\fP: read sequence.
.sp
\fItarget(str)\fP: target sequence.
.TP
.B Returns:
\fI(list)\fP: [query_nts, target_nts]
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.expand_cigar(cigar)
From short CIGAR version to long CIGAR version
where each character is each nts in the sequence.
.INDENT 7.0
.TP
.B Args:
\fIcigar(str)\fP: CIGAR string.
.sp
.EX
>>> 10MA3M
.EE
.TP
.B Returns:
\fIcigar_long(str)\fP: CIGAR long.
.sp
.EX
>>> MMMMMMMMMMAMMM
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.get_mature_sequence(precursor, mature, exact=False, nt=5)
.INDENT 7.0
.TP
.B From precursor and mature positions
get mature sequence with +/\- 4 flanking nts.
.TP
.B Args:
\fIprecursor(str)\fP: long sequence.
.sp
\fImature(list)\fP: [start, end].
.sp
\fIexact(boolean)\fP: not add 4+/\- flanking nts.
.sp
\fInt(int)\fP: number of nts to get.
.TP
.B Returns:
\fI(str)\fP: mature sequence.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class mirtop.mirna.realign.hits
\(dqClass with alignment information.
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.is_sequence(seq)
This function check whether the sequence is valid or not.
.INDENT 7.0
.TP
.B Args:
\fIseq(str)\fP: string acting as a sequence.
.TP
.B Returns:
\fIboolean\fP: whether is or not a valid nucleotide sequence.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class mirtop.mirna.realign.isomir
Class to represent isomiRs information.
.INDENT 7.0
.TP
.B format(sep=\(aq\et\(aq)
Create tabular line from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B formatGFF()
Create Variant attribute.
.UNINDENT
.INDENT 7.0
.TP
.B format_id(sep=\(aq\et\(aq)
Create simple identifier from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B get_score(sc)
Get score from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B is_iso()
Define whether element is isomiR or not.
.UNINDENT
.INDENT 7.0
.TP
.B set_pos(start, l, strand=\(aq+\(aq)
Set end position
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.make_cigar(seq, mature)
Function that will create CIGAR string from aligment
between read and reference sequence.
.INDENT 7.0
.TP
.B Args:
\fIseq(str)\fP: read sequence.
.sp
\fImature(str)\fP: short sequence.
.TP
.B Return:
\fIshort(str)\fP: CIGAR string.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.make_id(seq)
Create a unique identifier for the sequence from the nucleotides,
replacing 5 nts for a unique sequence.
.sp
It uses the code from \fImirtop.mirna.keys()\fP\&.
.sp
Inspired by MINTplate: \fI\%https://cm.jefferson.edu/MINTbase\fP
\fI\%https://github.com/TJU\-CMC\-Org/MINTmap/tree/master/MINTplates\fP
.INDENT 7.0
.TP
.B Args:
\fIseq(str)\fP: nucleotides sequences.
.TP
.B Returns:
\fIidName(str)\fP: unique identifier for the sequence.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.read_id(idu)
Read a unique identifier for the sequence and
convert it to the nucleotides,
replacing an unique code for 5 nts.
.sp
It uses the code from \fImirtop.mirna.keys()\fP\&.
.sp
Inspired by MINTplate: \fI\%https://cm.jefferson.edu/MINTbase\fP
\fI\%https://github.com/TJU\-CMC\-Org/MINTmap/tree/master/MINTplates\fP
.INDENT 7.0
.TP
.B Args:
\fIidu(str)\fP: unique identifier for the sequence.
.TP
.B Returns:
\fIseq(str)\fP: nucleotides sequences.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.reverse_complement(seq)
Get reverse complement of a sequences
.INDENT 7.0
.TP
.B Args:
\fIseq(str)\fP: sequence.
.sp
.EX
>>> GCAT
.EE
.TP
.B Returns:
\fI(str)\fP: reverse complemente sequence:
.sp
.EX
>>> ATGC
.EE
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.variant_to_3p(hairpin, pos, variant)
.INDENT 7.0
.TP
.B From a sequence and a start position get the nts
+/\- indicated by iso_3p. Pos option is 0\-base\-index
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIhairpin(str)\fP: long sequence:
.sp
.EX
>>> AAATTTT
.EE
.UNINDENT
.sp
\fIposition(int)\fP: >>> 3
.INDENT 7.0
.TP
.B \fIvariant(int)\fP: number of nts involved in the variant:
.sp
.EX
>>> \-1
.EE
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fI(str)\fP: nucleotide involved in the variant:
.sp
.EX
>>> A
.EE
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.variant_to_5p(hairpin, pos, variant)
.INDENT 7.0
.TP
.B From a sequence and a start position get the nts
+/\- indicated by iso_5p. Pos option is 0\-base\-index
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIhairpin(str)\fP: long sequence:
.sp
.EX
>>> AAATTTT
.EE
.UNINDENT
.sp
\fIposition(int)\fP: >>> 3
.INDENT 7.0
.TP
.B \fIvariant(int)\fP: number of nts involved in the variant:
.sp
.EX
>>> \-1
.EE
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fI(str)\fP: nucleotide involved in the variant:
.sp
.EX
>>> T
.EE
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.realign.variant_to_add(read, variant)
.INDENT 7.0
.TP
.B From a sequence and a start position get the nts
+/\- indicated by iso_3p. Pos option is 0\-base\-index
.TP
.B Args:
.INDENT 7.0
.TP
.B \fIhairpin(str)\fP: long sequence:
.sp
.EX
>>> AAATTTT
.EE
.UNINDENT
.sp
\fIposition(int)\fP: >>> 3
.INDENT 7.0
.TP
.B \fIvariant(int)\fP: number of nts involved in the variant:
.sp
.EX
>>> 2
.EE
.UNINDENT
.TP
.B Returns:
.INDENT 7.0
.TP
.B \fI(str)\fP: nucleotide involved in the variant:
.sp
.EX
>>> TT
.EE
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.snps.create_vcf(isomirs, matures, gtf, vcf_file=None)
Create vcf file of changes for all samples.
PASS will be ones with > 3 isomiRs supporting the position
and > 30% of reads, otherwise LOW
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.snps.liftover(pass_pos, matures)
Make position at precursor scale
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.snps.liftover_to_genome(pass_pos, gtf)
Liftover from precursor to genome
.UNINDENT
.INDENT 0.0
.TP
.B mirtop.mirna.snps.print_vcf(data)
Print vcf line following rules.
.UNINDENT
.SS classes
.INDENT 0.0
.TP
.B class mirtop.mirna.realign.hits
\(dqClass with alignment information.
.UNINDENT
.INDENT 0.0
.TP
.B class mirtop.mirna.realign.isomir
Class to represent isomiRs information.
.INDENT 7.0
.TP
.B format(sep=\(aq\et\(aq)
Create tabular line from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B formatGFF()
Create Variant attribute.
.UNINDENT
.INDENT 7.0
.TP
.B format_id(sep=\(aq\et\(aq)
Create simple identifier from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B get_score(sc)
Get score from variant fields.
.UNINDENT
.INDENT 7.0
.TP
.B is_iso()
Define whether element is isomiR or not.
.UNINDENT
.INDENT 7.0
.TP
.B set_pos(start, l, strand=\(aq+\(aq)
Set end position
.UNINDENT
.UNINDENT
.SH AUTHOR
Lorena Pantano, Thomas Desvignes, Karen EIlbeck, Ioannis Vlachos, Bastian Fromm, Marc K. Halushka, Michael Hackenberg, Gianvito Urgese
.SH COPYRIGHT
2024, Lorena Pantano, Thomas Desvignes, Karen EIlbeck, Ioannis Vlachos, Bastian Fromm, Marc K. Halushka, Michael Hackenberg, Gianvito Urgese
.\" Generated by docutils manpage writer.
.