Scroll to navigation

PYCHOPPER(1) package documentation PYCHOPPER(1)

NAME

pychopper - package documentation

COMMAND LINE TOOLS

Command line tools

cdna_classifier

Tool to identify, orient and rescue full-length cDNA reads.

usage: cdna_classifier [-h] [-b primers] [-g phmm_file] [-c config_file]

[-k kit] [-q cutoff] [-Q min_qual] [-z min_len]
[-r report_pdf] [-u unclass_output]
[-l len_fail_output] [-w rescue_output]
[-S stats_output] [-K qc_fail_output] [-Y autotune_nr]
[-L autotune_samples] [-A scores_output] [-m method]
[-x rescue] [-p] [-t threads] [-B batch_size]
[-D read stats]
input_fastx output_fastx


Positional Arguments


Named Arguments

Primers fasta.
File with custom profile HMMs (None).
File to specify primer configurations for each direction (None).
Use primer sequences from this kit (PCS109).

Default: "PCS109"

Cutoff parameter (autotuned).
Minimum mean base quality (7.0).

Default: 7.0

Minimum segment length (50).

Default: 50

Report PDF (cdna_classifier_report.pdf).

Default: "cdna_classifier_report.pdf"

Write unclassified reads to this file.
Write fragments failing the length filter in this file.
Write rescued reads to this file.
Write statistics to this file.

Default: "cdna_classifier_report.tsv"

Write reads failing mean quality filter to this file.
Approximate number of reads used for tuning the cutoff parameter (10000).

Default: 10000

Number of samples taken when tuning cutoff parameter (30).

Default: 30

Write alignment scores to this BED file.
Detection method: phmm or edlib (phmm).

Default: "phmm"

Protocol-specific read rescue: DCS109 (None).
Keep primers, but trim the rest.

Default: False

Number of threads to use (8).

Default: 8

Maximum number of reads processed in each batch (1000000).

Default: 1000000

Tab separated file with per-read stats (None).

FULL API REFERENCE

pychopper

pychopper package

Subpackages

pychopper.phmm_data package

Module contents

pychopper.primer_data package

Module contents

pychopper.tests package

Submodules

pychopper.tests.test_detector module

Bases: unittest.case.TestCase

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.




pychopper.tests.test_regression_simple module

Bases: unittest.case.TestCase

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

Integration test.


Module contents

Submodules

pychopper.alignment_hits module

Process alignment hits by removing overlaps

pychopper.chopper module

Segment reads based on alignment hits using dynamic programming. The algorithm is based on the rule that each primer alignment hit can be used only once. Hence if a segment is included, the next one has to be excluded.



Convert segments to output reads with annotation

pychopper.common_structures module

Bases: tuple

Create new instance of Hit(Ref, RefStart, RefEnd, Query, QueryStart, QueryEnd, Score)

Alias for field number 3

Alias for field number 5

Alias for field number 4

Alias for field number 0

Alias for field number 2

Alias for field number 1

Alias for field number 6


Bases: tuple

Create new instance of Segment(Left, Start, End, Right, Strand, Len)

Alias for field number 2

Alias for field number 0

Alias for field number 5

Alias for field number 3

Alias for field number 1

Alias for field number 4


Bases: tuple

Create new instance of Seq(Id, Name, Seq, Qual)

Alias for field number 0

Alias for field number 1

Alias for field number 3

Alias for field number 2


pychopper.edlib_backend module

Find alignment hits of all primers in all reads using the edlib/parasail backend

pychopper.hmmer_backend module

Find alignment hits of all primers in all reads using the pHMM/nhmmscan backend

pychopper.parasail_backend module

Extract details of the first operation in a cigar string.


Process an alignment, extracting score, start and end.


pychopper.report module

Bases: object

Class for plotting utilities on the top of matplotlib. Plots are saved in the specified file through the PDF backend.

  • self -- object.
  • pdf -- Output pdf.

The report object.
Report

Close PDF backend. Do not forget to call this at the end of your script or your output will be damaged!
self -- object
None
object


Plot multiple pairs of data arrays.
  • self -- object.
  • data_map -- A dictionary with labels as keys and tupples of data arrays (x,y) as values.
  • title -- Figure title.
  • xlab -- X axis label.
  • ylab -- Y axis label.
  • marker -- Marker passed to the plot function.
  • legend_loc -- Location of legend.
  • legend -- Plot legend if True
  • vlines -- Dictionary with labels and positions of vertical lines to draw.
  • vlcolor -- Color of vertical lines drawn.
  • vlwidth -- Width of vertical lines drawn.

None
object


Plot simple bar chart from input dictionary.
  • self -- object.
  • data_map -- A dictionary with labels as keys and data as values.
  • title -- Figure title.
  • xlab -- X axis label.
  • ylab -- Y axis label.
  • alpha -- Alpha value.
  • xticks_rotation -- Rotation value for x tick labels.
  • auto_limit -- Set y axis limits automatically.

None
object


Plot histograms of multiple data arrays.
  • self -- object.
  • data_map -- A dictionary with labels as keys and data arrays as values.
  • title -- Figure title.
  • xlab -- X axis label.
  • ylab -- Y axis label.
  • bins -- Number of bins.
  • alpha -- Transparency value for histograms.
  • legend_loc -- Location of legend.
  • legend -- Plot legend if True.
  • vlines -- Dictionary with labels and positions of vertical lines to draw.

None
object


Utility method to save and close figure.


pychopper.seq_utils module

Return complement of base.

Performs the subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is identical to the argument for all other values.

k -- A base.
Complement of base.
str


Generate list of error rates for qualities less than equal than n.

Load primers from fasta file

Parse out runid from sequence description.

Calculate average basecall quality of a read. Receive the ascii quality scores of a read and return the average quality for that read First convert Phred scores to probabilities, calculate average error probability convert average back to Phred scale

Return random floats in the half-open interval [0.0, 1.0). Alias for random_sample to ease forward-porting to the new random API.

Below function taken from https://github.com/lh3/readfq/blob/master/readfq.py Much faster parsing of large files compared to Biopyhton.


Reverse complement sequence record

Return reverse complement of a string (base) sequence.
seq -- Input sequence.
Reverse complement of input sequence.
str


Write read to fastq file

pychopper.utils module







Module contents

  • genindex
  • modindex
  • search

AUTHOR

ONT Applications Group

COPYRIGHT

2020, Oxford Nanopore Technologies Ltd.

October 26, 2020 2.5.0