Copy number aberrations (CNAs) represent one of the most prevalent genetic alterations in cancer cells. There is considerable interest in finding CNAs that affect the same chromosomal region in multiple tumor samples. Recurrent CNA (RCNA) implies the presence of key cancer genes; on chromosome 7, for example, we often see amplification of the region containing the EGFR gene.
Most common approaches to RCNA identification involve a two-step approach: first, call CNAs in each individual sample; second, perform cross-sample analysis to look for recurrence. Unfortunately, with large numbers of samples and increasingly dense genomic data, this two-step approach carries a significant computational burden.
Enter the Matrix: Correlational Matrix Diagonal Segmentation
Now online at Bioinformatics Early Access is a paper describing CMDS, a population-based method for detecting RCNA in cancer that was developed here at Washington University by Qunyuan Zhang and his colleagues.
CMDS uses raw intensity ratio data (from SNP arrays, CGH, etc.) and adopts a diagonal transformation strategy to identify RCNAs via between-chromosomal-site correlation. Not only does this reduce the computational burden of RCNA identification, but it increases the detection power as well.
Done in 13 Seconds
CMDS has a speed advantage as well. Qunyuan compared its execution time to that of AWS-STAC, SBS-STAC, and pREC-A on a dataset comprised of 10,000 sites in 100 samples. The R version of CMDS finished in 13 seconds. The other algorithms took more than 300 times longer on the same dataset, indicating that CMDS represents a substantial performance gain. There’s also a C version of CMDS that runs even faster.
Application to Real Data: Lung Cancer and Glioblastoma
To evaluate CMDS on real data, Qunyuan applied it to lung adenocarcinoma and glioblastoma (brain cancer) datasets that were generated as part of the Tumor Sequencing Project (TSP) and the Cancer Genome Atlas (TCGA), respectively. CMDS called 39 significant RCNA regions in lung cancer and 37 in brain cancer. All of the significant regions had been previously reported/validated; they included or were proximal to a number of well-known cancer genes including EGFR, CCND1, KRAS, MDM2, PDGFRA, and others.
When the two datasets were combined, a few key RCNA regions emerged – amplification of EGFR, CDK4, and MDM2, and deletion of CDKN2A – that were shared by both cancers. This, to me, demonstrates one of the most powerful aspects of CMDS – its population-based approach can compare not only samples of the same cancer type, but also pools of samples across sample types. It makes a great addition to our arsenal of cancer genomics tools at Washington University.
CMDS is implemented in R and C programs which are available from Qunyuan’s web site.
References
Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, Boreki IB, & Province MA (2009). CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics (Oxford, England) PMID: 20031968
My thanks to GenomeWeb, who featured this post in their Daily Scan.