Later this month, I’ll present our work on detecting somatic mutations using capture and Illumina sequencing at the Advances in Genome Biology and Technology meeting on Marco Island. Using an internally developed solution-phase capture technology (Washington University Capture, or WUCap), we selectively targeted coding regions of 6,000 genes in tumors and matched controls from 94 patients with ovarian cancer and sequenced them on the Illumina GAIIx.
Capture Somatic Mutation Detection Pipeline
My group developed a high-throughput, automated pipeline that identifies mutations and determines their somatic status (Germline, Somatic, or LOH) in large-scale capture datasets, using this one as our test case. Given BAM files for a tumor sample and its matched control, our pipeline does the following:
- Identifies variants (SNPs and indels) in each of the matched samples
- Determines somatic status for each variant using probability (glfSomatic) or statistical (VarScan) methods.
- Generates a list of putative somatic mutations.
- Removes known germline variants using dbSNP, the 1,000 Genomes Project, and other sources.
- Annotates the filtered variants with gene structure and conservation information.
- Divides annotated variants into tiers according to predicted function class.
- Segregates the variants in each tier into high, moderate, and low confidence groups according to their supporting evidence.
The above is a simplified representation. In fact, the pipeline control module itself contains 28 sub-processes, and that number is still growing.
Application to TCGA-Ovarian Capture Data
When we applied our pipeline to TCGA Ovarian data, we predicted thousands of putative somatic mutations across the 94 patients. Manual review, additional filters, and validation efforts whittled that list down to just over 1,000 validated somatic mutations to date.
Our collaborators at the Broad Institute and Baylor College of Medicine are also sequencing TCGA Ovarian samples using their own capture methods. All three centers have exchanged datasets a couple of times now. We’ve applied our capture somatic variant detection pipeline to data from both other centers with promising results. I’m not sure if I’ll be able to show any of their data in my poster, but the results suggest that our approach is applicable to other capture methods and sequencing platforms.
For more, you’ll have to find my poster at Marco Island.
Deprecated: Function get_magic_quotes_gpc() is deprecated in /home/dkoboldt/public_html/massgenomics/wp-includes/formatting.php on line 4387
Deprecated: Function get_magic_quotes_gpc() is deprecated in /home/dkoboldt/public_html/massgenomics/wp-includes/formatting.php on line 4387
Wish I was going to Marco Island. Will you make your poster available digitally after the conference?
How much input DNA does the WUCap protocol require?
Deprecated: Function get_magic_quotes_gpc() is deprecated in /home/dkoboldt/public_html/massgenomics/wp-includes/formatting.php on line 4387
Deprecated: Function get_magic_quotes_gpc() is deprecated in /home/dkoboldt/public_html/massgenomics/wp-includes/formatting.php on line 4387
Keith, sure, I’ll make my poster available to you either here or on the VarScan web site.
It’s my understanding that our current WUCap protocol requires 1 microgram of input DNA. However, our Tech D group has been working on low-input library protocols for capture, and [Correction] you can contact Vince Magrini (vmagrini (at) watson (dot) wustl (dot) edu) from our Tech D group.