Just published in Genome Research: A new tool with a mellifluous name is now available for the analysis of large-scale cancer sequencing data sets. The Mutation Significance In Cancer (MuSiC) package is a suite of tools for identifying significant mutations, relationships, and correlations in a mutation dataset.
Mutation Analysis Tools
MuSiC Applications to Cancer Datasets
MuSiC includes a number of tools to help identify significant mutations and relationships in a cancer mutation dataset. These include:
|SMG||Computation of background mutation rate in your dataset and identification of significantly mutated genes.|
|PathScan||Identification of significantly altered gene sets and/or pathways using KEGG or other databases.|
|Proximity||Search for mutations physically near one another at the DNA or protein level to identify mutation hotspots.|
|COSMIC-OMIM||Comparison of your mutations with those submitted to COSMIC and OMIM databases.|
|Mutation-Relation||Detection of co-occurring or mutually exclusive mutation relationships between genes.|
|Clinical Correlation||Correlation of gene mutation status with categorical or quantitative clinical data, such as tumor subtype|
You might notice a logical progression of these tools. With a mutation dataset encompassing enough tumor samples, you can use these tools to:
- Determine the mutation rate for your tumor type
- Identify significantly mutated genes and pathways
- Look for recurrent events within your dataset or compared to COSMIC/OMIM
- Infer interactions from co-occurring or mutually exclusive mutations
- Highlight patterns of mutations in clinical strata
To use MuSiC’s complete set of features, you’ll need three basic inputs:
- A list of somatic mutations in TCGA’s MAF format (VCF support is planned). The better these calls, the more accurate your results will be. If you need help, look into our somatic mutation callers VarScan and SomaticSniper.
- A list of BAM files for tumor samples and their matched normals. These will be used to compute sequence coverage of gene exons.
- A set of target regions for the genes in your MAF file above. These should generally be the coordinates of the exons for the genes used to annotate your mutations.
I have personally used the MuSiC package to analyze large-scale cancer datasets, most recently the TCGA breast cancer dataset (>500 tumors) whose publication was just accepted by Nature. In my admittedly biased opinion, MuSiC has a nice modular design (you can run a single tool or all of them) and user-friendly. If you’re analyzing a large-scale dataset, you can and should take advantage of parallelization across a compute cluster to speed up your analysis.
Even with >30,000 mutations from over 500 tumors, I was able to the complete MuSiC package in less than a week. If you’re working with cancer datasets, this tool is worth a look.
Nathan D. Dees, Qunyuan Zhang, Cyriac Kandoth, Michael C. Wendl, William Schierding, Daniel C. Koboldt, Thomas B. Mooney, Matthew B. Callaway, David Dooling, Elaine R. Mardis Richard K. Wilson and Li Ding. (2012). MuSiC: Identifying mutational significance in cancer genomes Genome Research : 10.1101/gr.134635.111