|GENOME-MUSIC-PATH-SCAN(1p)||User Contributed Perl Documentation||GENOME-MUSIC-PATH-SCAN(1p)|
genome music path-scan¶
genome music path-scan - Find significantly mutated pathways in a cohort given a list of somatic mutations.
This document describes genome music path-scan version 0.04 (2018-07-05 at 09:17:13)
genome music path-scan --gene-covg-dir=? --bam-list=? --pathway-file=? --maf-file=? --output-file=? [--bmr=?] [--genes-to-ignore=?] [--min-mut-genes-per-path=?] [--skip-non-coding] [--skip-silent]
... music path-scan \ --bam-list input_dir/bam_file_list \ --gene-covg-dir output_dir/gene_covgs/ \ --maf-file input_dir/myMAF.tsv \ --output-file output_dir/sm_pathways \ --pathway-file input_dir/pathway_dbs/KEGG.txt \ --bmr 8.7E-07
- gene-covg-dir Text
- Directory containing per-gene coverage files (Created using music bmr calc-covg)
- bam-list Text
- Tab delimited list of BAM files [sample_name, normal_bam, tumor_bam] (See Description)
- pathway-file Text
- Tab-delimited file of pathway information (See Description)
- maf-file Text
- List of mutations using TCGA MAF specifications v2.3
- output-file Text
- Output file that will list the significant pathways and their p-values
- bmr Number
- Background mutation rate in the targeted regions
Default value '1e-06' if not specified
- genes-to-ignore Text
- Comma-delimited list of genes whose mutations should be ignored
- min-mut-genes-per-path Number
- Pathways with fewer mutated genes than this, will be ignored
Default value '1' if not specified
- skip-non-coding Boolean
- Skip non-coding mutations from the provided MAF file
Default value 'true' if not specified
- noskip-non-coding Boolean
- Make skip-non-coding 'false'
- skip-silent Boolean
- Skip silent mutations from the provided MAF file
Default value 'true' if not specified
- noskip-silent Boolean
- Make skip-silent 'false'
Only the following four columns in the MAF are used. All other columns may be left blank.
Col 1: Hugo_Symbol (Need not be HUGO, but must match gene names used in the pathway file) Col 2: Entrez_Gene_Id (Matching Entrez ID trump gene name matches between pathway file and MAF) Col 9: Variant_Classification Col 16: Tumor_Sample_Barcode (Must match the name in sample-list, or contain it as a substring)
The Entrez_Gene_Id can also be left blank (or set to 0), but it is highly recommended, in case genes are named differently in the pathway file and the MAF file.
- This is a tab-delimited file prepared from a pathway database (such as KEGG), with the columns: [path_id, path_name, class, gene_line, diseases, drugs, description] The latter three columns are optional (but are available on KEGG). The gene_line contains the "entrez_id:gene_name" of all genes involved in this pathway, each separated by a "|" symbol.
- For example, a line in the pathway-file would look like:
hsa00061 Fatty acid biosynthesis Lipid Metabolism 31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH
Ensure that the gene names and entrez IDs used match those used in the MAF file. Entrez IDs are not mandatory (use a 0 if Entrez ID unknown). But if a gene name in the MAF does not match any gene name in this file, the entrez IDs are used to find a match (unless it's a 0).
- Provide a file containing sample names and normal/tumor BAM locations for each. Use the tab- delimited format [sample_name normal_bam tumor_bam] per line. This tool only needs sample_name, so all other columns can be skipped. The sample_name must be the same as the tumor sample names used in the MAF file (16th column, with the header Tumor_Sample_Barcode).
Michael Wendl, Ph.D.
This module uses reformatted copies of data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database:
* KEGG - http://www.genome.jp/kegg/