PYCHOPPER(1)

package documentation

PYCHOPPER(1)

NAME¶

pychopper - package documentation

COMMAND LINE TOOLS¶

Command line tools¶

cdna_classifier¶

Tool to identify, orient and rescue full-length cDNA reads.

usage: cdna_classifier [-h] [-b primers] [-g phmm_file] [-c config_file]


                       [-k kit] [-q cutoff] [-Q min_qual] [-z min_len]


                       [-r report_pdf] [-u unclass_output]


                       [-l len_fail_output] [-w rescue_output]


                       [-S stats_output] [-K qc_fail_output] [-Y autotune_nr]


                       [-L autotune_samples] [-A scores_output] [-m method]


                       [-x rescue] [-p] [-t threads] [-B batch_size]


                       [-D read stats]


                       input_fastx output_fastx

Positional Arguments¶

Input file.
Output file.

Named Arguments¶

b: Primers fasta.
g: File with custom profile HMMs (None).
c: File to specify primer configurations for each direction (None).
k: Use primer sequences from this kit (PCS109).
Default: "PCS109"
q: Cutoff parameter (autotuned).
Q: Minimum mean base quality (7.0).
Default: 7.0
z: Minimum segment length (50).
Default: 50
r: Report PDF (cdna_classifier_report.pdf).
Default: "cdna_classifier_report.pdf"
u: Write unclassified reads to this file.
l: Write fragments failing the length filter in this file.
w: Write rescued reads to this file.
S: Write statistics to this file.
Default: "cdna_classifier_report.tsv"
K: Write reads failing mean quality filter to this file.
Y: Approximate number of reads used for tuning the cutoff parameter (10000).
Default: 10000
L: Number of samples taken when tuning cutoff parameter (30).
Default: 30
A: Write alignment scores to this BED file.
m: Detection method: phmm or edlib (phmm).
Default: "phmm"
x: Protocol-specific read rescue: DCS109 (None).
p: Keep primers, but trim the rest.
Default: False
t: Number of threads to use (8).
Default: 8
B: Maximum number of reads processed in each batch (1000000).
Default: 1000000
D: Tab separated file with per-read stats (None).

FULL API REFERENCE¶

pychopper¶

pychopper package¶

Subpackages¶

pychopper.phmm_data package¶

Module contents¶

pychopper.primer_data package¶

Module contents¶

pychopper.tests package¶

Submodules¶

pychopper.tests.test_detector module¶

class pychopper.tests.test_detector.TestDetector(methodName='runTest'): Bases: unittest.case.TestCase
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

testPairAlign()

testScoreCutoff()

pychopper.tests.test_regression_simple module¶

class pychopper.tests.test_regression_simple.TestIntegration(methodName='runTest'): Bases: unittest.case.TestCase
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

testIntegration(): Integration test.

Module contents¶

Submodules¶

pychopper.alignment_hits module¶

pychopper.alignment_hits.process_hits(hits, max_score): Process alignment hits by removing overlaps

pychopper.chopper module¶

pychopper.chopper.analyse_hits(hits, config): Segment reads based on alignment hits using dynamic programming. The algorithm is based on the rule that each primer alignment hit can be used only once. Hence if a segment is included, the next one has to be excluded.

pychopper.chopper.chopper_edlib(reads, primers, config, max_ed, cutoff, pool, min_batch): Segment using the edlib/parasail backend

pychopper.chopper.chopper_phmm(reads, phmm_file, config, cutoff, threads, pool, min_batch): Segment using the profile HMM backend

pychopper.chopper.segments_to_reads(read, segments, keep_primers): Convert segments to output reads with annotation

pychopper.common_structures module¶

class pychopper.common_structures.Hit(Ref, RefStart, RefEnd, Query, QueryStart, QueryEnd, Score): Bases: tuple
Create new instance of Hit(Ref, RefStart, RefEnd, Query, QueryStart, QueryEnd, Score)

Query: Alias for field number 3

QueryEnd: Alias for field number 5

QueryStart: Alias for field number 4

Ref: Alias for field number 0

RefEnd: Alias for field number 2

RefStart: Alias for field number 1

Score: Alias for field number 6

class pychopper.common_structures.Segment(Left, Start, End, Right, Strand, Len): Bases: tuple
Create new instance of Segment(Left, Start, End, Right, Strand, Len)

End: Alias for field number 2

Left: Alias for field number 0

Len: Alias for field number 5

Right: Alias for field number 3

Start: Alias for field number 1

Strand: Alias for field number 4

class pychopper.common_structures.Seq(Id, Name, Seq, Qual): Bases: tuple
Create new instance of Seq(Id, Name, Seq, Qual)

Id: Alias for field number 0

Name: Alias for field number 1

Qual: Alias for field number 3

Seq: Alias for field number 2

pychopper.edlib_backend module¶

pychopper.edlib_backend.find_locations(reads, all_primers, max_ed, pool, min_batch): Find alignment hits of all primers in all reads using the edlib/parasail backend

pychopper.hmmer_backend module¶

pychopper.hmmer_backend.find_locations(reads, phmm_file, E, pool, min_batch): Find alignment hits of all primers in all reads using the pHMM/nhmmscan backend

pychopper.parasail_backend module¶

pychopper.parasail_backend.first_cigar(cigar): Extract details of the first operation in a cigar string.

pychopper.parasail_backend.pair_align(reference, query, query_name, subs_mat, params): Perform pairwise local alignment using parsail-python

pychopper.parasail_backend.process_alignment(aln, query, query_name, aln_params): Process an alignment, extracting score, start and end.

pychopper.parasail_backend.refine_locations(read, all_primers, locations, aln_params={'gap_extend': 1, 'gap_open': 1, 'match': 1, 'mismatch': -2}, subs_mat=<parasail.bindings_v2.Matrix object>): Refine alignment edges based on local alignment

pychopper.report module¶

class pychopper.report.Report(pdf): Bases: object
Class for plotting utilities on the top of matplotlib. Plots are saved in the specified file through the PDF backend.

Parameters

self -- object.
pdf -- Output pdf.

Returns: The report object.
Return type: Report

close(): Close PDF backend. Do not forget to call this at the end of your script or your output will be damaged!

Parameters: self -- object
Returns: None
Return type: object

plot_arrays(data_map, title='', xlab='', ylab='', marker='.', legend_loc='best', legend=True, vlines=None, vlcolor='green', vlwitdh=0.5): Plot multiple pairs of data arrays.

Parameters

self -- object.
data_map -- A dictionary with labels as keys and tupples of data arrays (x,y) as values.
title -- Figure title.
xlab -- X axis label.
ylab -- Y axis label.
marker -- Marker passed to the plot function.
legend_loc -- Location of legend.
legend -- Plot legend if True
vlines -- Dictionary with labels and positions of vertical lines to draw.
vlcolor -- Color of vertical lines drawn.
vlwidth -- Width of vertical lines drawn.

Returns: None
Return type: object

plot_bars_simple(data_map, title='', xlab='', ylab='', alpha=0.6, xticks_rotation=0, auto_limit=False): Plot simple bar chart from input dictionary.

Parameters

self -- object.
data_map -- A dictionary with labels as keys and data as values.
title -- Figure title.
xlab -- X axis label.
ylab -- Y axis label.
alpha -- Alpha value.
xticks_rotation -- Rotation value for x tick labels.
auto_limit -- Set y axis limits automatically.

Returns: None
Return type: object

plot_histograms(data_map, title='', xlab='', ylab='', bins=50, alpha=0.7, legend_loc='best', legend=True, vlines=None): Plot histograms of multiple data arrays.

Parameters

self -- object.
data_map -- A dictionary with labels as keys and data arrays as values.
title -- Figure title.
xlab -- X axis label.
ylab -- Y axis label.
bins -- Number of bins.
alpha -- Transparency value for histograms.
legend_loc -- Location of legend.
legend -- Plot legend if True.
vlines -- Dictionary with labels and positions of vertical lines to draw.

Returns: None
Return type: object

save_close(): Utility method to save and close figure.

pychopper.seq_utils module¶

pychopper.seq_utils.base_complement(k): Return complement of base.
Performs the subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is identical to the argument for all other values.

Parameters: k -- A base.
Returns: Complement of base.
Return type: str

pychopper.seq_utils.errs_tab(n): Generate list of error rates for qualities less than equal than n.

pychopper.seq_utils.get_primers(primers): Load primers from fasta file

pychopper.seq_utils.get_runid(desc): Parse out runid from sequence description.

pychopper.seq_utils.mean_qual(quals, qround=False, tab=[1.0, 0.7943282347242815, 0.6309573444801932, 0.5011872336272722, 0.3981071705534972, 0.31622776601683794, 0.251188643150958, 0.19952623149688797, 0.15848931924611134, 0.12589254117941673, 0.1, 0.07943282347242814, 0.06309573444801933, 0.05011872336272722, 0.039810717055349734, 0.03162277660168379, 0.025118864315095794, 0.0199526231496888, 0.015848931924611134, 0.012589254117941675, 0.01, 0.007943282347242814, 0.00630957344480193, 0.005011872336272725, 0.003981071705534973, 0.0031622776601683794, 0.0025118864315095794, 0.001995262314968879, 0.001584893192461114, 0.0012589254117941675, 0.001, 0.0007943282347242813, 0.000630957344480193, 0.0005011872336272725, 0.00039810717055349735, 0.00031622776601683794, 0.00025118864315095795, 0.00019952623149688788, 0.00015848931924611142, 0.00012589254117941674, 0.0001, 7.943282347242822e-05, 6.309573444801929e-05, 5.011872336272725e-05, 3.9810717055349695e-05, 3.1622776601683795e-05, 2.5118864315095822e-05, 1.9952623149688786e-05, 1.584893192461114e-05, 1.2589254117941661e-05, 1e-05, 7.943282347242822e-06, 6.30957344480193e-06, 5.011872336272725e-06, 3.981071705534969e-06, 3.162277660168379e-06, 2.5118864315095823e-06, 1.9952623149688787e-06, 1.584893192461114e-06, 1.2589254117941661e-06, 1e-06, 7.943282347242822e-07, 6.30957344480193e-07, 5.011872336272725e-07, 3.981071705534969e-07, 3.162277660168379e-07, 2.5118864315095823e-07, 1.9952623149688787e-07, 1.584893192461114e-07, 1.2589254117941662e-07, 1e-07, 7.943282347242822e-08, 6.30957344480193e-08, 5.011872336272725e-08, 3.981071705534969e-08, 3.162277660168379e-08, 2.511886431509582e-08, 1.9952623149688786e-08, 1.5848931924611143e-08, 1.2589254117941661e-08, 1e-08, 7.943282347242822e-09, 6.309573444801943e-09, 5.011872336272715e-09, 3.981071705534969e-09, 3.1622776601683795e-09, 2.511886431509582e-09, 1.9952623149688828e-09, 1.584893192461111e-09, 1.2589254117941663e-09, 1e-09, 7.943282347242822e-10, 6.309573444801942e-10, 5.011872336272714e-10, 3.9810717055349694e-10, 3.1622776601683795e-10, 2.511886431509582e-10, 1.9952623149688828e-10, 1.584893192461111e-10, 1.2589254117941662e-10, 1e-10, 7.943282347242822e-11, 6.309573444801942e-11, 5.011872336272715e-11, 3.9810717055349695e-11, 3.1622776601683794e-11, 2.5118864315095823e-11, 1.9952623149688828e-11, 1.5848931924611107e-11, 1.2589254117941662e-11, 1e-11, 7.943282347242821e-12, 6.309573444801943e-12, 5.011872336272715e-12, 3.9810717055349695e-12, 3.1622776601683794e-12, 2.5118864315095823e-12, 1.9952623149688827e-12, 1.584893192461111e-12, 1.258925411794166e-12, 1e-12, 7.943282347242822e-13, 6.309573444801942e-13, 5.011872336272715e-13, 3.981071705534969e-13, 3.162277660168379e-13, 2.511886431509582e-13, 1.9952623149688827e-13, 1.584893192461111e-13]): Calculate average basecall quality of a read. Receive the ascii quality scores of a read and return the average quality for that read First convert Phred scores to probabilities, calculate average error probability convert average back to Phred scale

pychopper.seq_utils.random(size=None): Return random floats in the half-open interval [0.0, 1.0). Alias for random_sample to ease forward-porting to the new random API.