AGBT: Focus on Cancer Genomics

February 26, 2010 by Dan Koboldt

As usual, the quality of the scientific presentations at this meeting has been outstanding. The weather, too, has improved at last:

p_00014

There are too many to cover (or even attend) completely, but one area of interest with a strong focus this year is cancer genomics. Yesterday during plenary sessions, Stacey Gabriel of the Broad Institute of MIT and Harvard presented sequencing of multiple myeloma, a liquid tumor affecting 50,000 people in the US. Around 5,200 gigabases of sequence was generated across 26 tumor samples and matched controls, yielding ~30x average depth per genome. Their mutation detection pipeline achieved an admirable validation rate for somatic SNVs (95%). Short indels were more challenging (~50% validated), and candidate rearrangements even more so (30-50% validated). However, their study validated ~40 somatic mutations per tumor, implicating known MM genes (NRAS, KRAS, TP53) as well as novel ones (DIS3, FAM46C).

Elliott Margulies on Melanoma

Last night, there was a concurrent session devoted to cancer genomics. Eliott Margulies (NIH/NHGRI) led the lineup with his work sequencing the tumor genome and matched normal of a melanoma patient. Using the Illumina platform (2×100 bp), his group achieved 36x and 43x haploid coverage for tumor and normal, respectively, with ~99% of the genome covered by at least one read. Much of the talk was devoted to their analysis pipeline, summarized as:

Initial alignment of Illumina reads with ELAND
Partitioning the reads into “genome” bins of several kilobases
Local realignment with cross_match in highly parallelized fashion
SNV calling with their “Most Probable Genotype” (MPG) method
Removal of variants with any evidence in the Germline, or ones in dbSNP

The 175,768 novel tumor-specific SNVs were classified as coding (807) or noncoding (174,961). Some 513 of 807 coding variants were nonsynonymous. Of these, 101 were selected for validation; 84 got validation results and 75 somatic coding mutations (89%) were confirmed. Unsurprisingly, Dr. Margulies used his group’s expertise in comparative genomics to closely examine the noncoding variants as well. His group recently annotated “Chai” regions of the human genome, which bear evidence of evolutionary constraint that suggest functional relevance. Some 10,285 of the 174,961 fell within Chai regions, and among them were ~2,000 variants predicted to dramatically alter the local structure of DNA (suggesting regulatory changes).

Sequencing Pre- and Post-Treatment Lung Cancer

Ian Bosdet of BC Cancer Agency presented some very interesting work on mutational profiling of pre- and post-treatment lung cancer tumors. His group had the opportunity to participate in a clinical trial at BCCA in which carefully-selected, treatment-naive NSCLC patients underwent a standard therapeutic program. First, each patient underwent a pre-treatment evaluation and biopsy. Next, they received erlotinib (an EGFR inhibitor) until the disease inevitably progressed. Then, another biopsy that was sent for pathology review, as well as DNA/RNA extraction for sequencing. Transcriptome sequencing yielded some interesting findings. For example, the expression of one gene (IER5L or IER5C, it’s hard to read my own handwriting) was highly expressed in smokers that did not respond to treatment. A screen of unmapped transcript reads against viral genomes revealed the presence of Epstein-Barr Virus transcripts in one tumor that was later re-classified as EBV-positive lymphadenocarcinoma (?).

Mutational profiling for three patients was obtained via exome capture (Agilent) and sequencing of normal, pre-treatment tumor, and post-treatment tumor samples. Somatic mutations in PHACTR2 were seen only in pre-treatment samples. Mutations in a few genes (PRMT10, RanBP2) were found at both times, but a few (YY1AP1, SNX9) were only present after treatment, suggesting a role for these genes in progressive disease.

AGBT 2010: First Impressions

February 25, 2010 by Dan Koboldt

Only in Florida: Jellyfish Aquarium

I’m in the midst of my first full day at Marco Island. More than any other meeting that I’ve attended, AGBT has a remarkable corporate presence. Life Technologies seems to be the biggest sponsor; you can’t look anywhere without seeing a banner that promotes the new SOLiD4 system. Apparently I’m doing a poor job at keeping up with SOLiD, as I’d only just heard about SOLiD3. I spoke to Richard Gibbs at a coffee break, and he mentioned that SOLiD4 is an upgrade, not a new machine. Must be nice.

Caliper Life Sciences, a maker of microfluidics equipment for next-generation sequencing, won favor with many attendees by hanging chocolate “chips” (mini bars) on the doorknobs of every AGBT attendee’s room in the hotel to promote their recently-launched LabChip XT. I learned of this company only a week or so ago, when my colleague Vince Magrini was named to their scientific advisory board.

PacBio Instrument Unveiled

Pacific Biosciences unveiled their coveted SMRT sequencing instrument last night in a small, invitation-only event in their suite. Sadly, I wasn’t invited, but I’m told the guest list was very exclusive. Most likely it was restricted to directors from the ten initial PacBio customers that were announced last week. Tonight, PacBio hosts a roundtable called Global Challenges, Genomic Solutions that will be moderated by Charlie Rose.

Other Players in the Field

This morning at breakfast, Agilent Technologies was trading SureSelect T-shirts for surveys that assessed respondents’ interest in exome capture, which (thus far) seems to be the recurrent hot topic at AGBT. Things have been quiet from some of the other large sponsors, including Illumina, Complete Genomics, Roche, and others. I’m sure that their hour of glory will come soon enough.

Marco Island Meeting Preview

February 22, 2010 by Dan Koboldt

The Advances in Genome Biology and Technology (AGBT) meeting begins this week at Marco Island. I’ll be there to present a poster on our somatic mutation detection pipeline, and also to learn about what’s to come in next-generation and next-next-generation sequencing.

Some of the companies are already ramping up. Last week Pac Bio announced the intial members of their partnership program to provide complete solutions for single molecule real-time sequencing. Microfluidics company Caliper Life Sciences formed a scientific advisory board for next-gen sequencing that included WashU’s own Vince Magrini. Other companies – Illumina, Complete Genomics, and RainDance Technologies, for example – are hosting workshops or other events at AGBT.

AGBT Sessions Not To Miss

Day 1 of the meeting will be very strong, with opening remarks from Len Pennacchio (JGI), Kelly Frazer (UCSD) on genomic enrichment, Mike Snyder (Stanford) on paired-ends for SVs/assembly, and Barbara Wold on ChIP-Seq. On Day 2, Stacey Gabriel of the Broad Institute will discuss applications of new sequencing technology to medical and cancer genetics. Carlos Bustamante of Stanford will present the complete genome sequencing and analysis of African-American and Mexican-American individuals. WashU’s David Wang will give a talk on metagenomic approaches to pathogen discovery.

Some friends of mine are giving talks later that evening. Jeff Reid (Baylor College of Medicine) has what looks to be a very interesting talk on miRNA precursor variants in schizophrenia. Daniel MacArthur, of Sanger and Genetic Future fame, will present “Loss-of-Function Mutations in Healthy Human Genomes,” likely based on his work with the 1,000 Genomes Project.

Cancer Genomics and Sequencing

I’m very excited about an entire session devoted to cancer genomics. Elliott Margulies (NHGRI) will discuss the sequencing and analysis of a melanoma genome. In what may be the first application of single-molecule sequencing to cancer, the sequencing of Ewing’s Sarcoma on a Heliscope instrument will be presented by Timothy Triche of Childrens Hospital Los Angeles. Two speakers from BC Cancer Agency will discuss rearrangements in follicular lymphoma and capture/transcriptome sequencing in lung cancer.

Whole Genome Sequencing

There are to be big-picture sequencing talks as well. Genome center co-director Elaine Mardis will present “Single Molecule Sequencing to Detect and Characterize Somatic Mutations in Cancer Genomes.” Stan Nelson of UCLA will give a talk, presumably on his group’s recent publication – whole genome sequencing of a glioblastoma cell line on ABI SOLiD.

I’ll be there, and posting regular updates, as the latest and greatest in sequencing technologies unfolds at Marco Island.

Mutation Detection in Capture at AGBT

February 11, 2010 by Dan Koboldt

Later this month, I’ll present our work on detecting somatic mutations using capture and Illumina sequencing at the Advances in Genome Biology and Technology meeting on Marco Island. Using an internally developed solution-phase capture technology (Washington University Capture, or WUCap), we selectively targeted coding regions of 6,000 genes in tumors and matched controls from 94 patients with ovarian cancer and sequenced them on the Illumina GAIIx.

Capture Somatic Mutation Detection Pipeline

My group developed a high-throughput, automated pipeline that identifies mutations and determines their somatic status (Germline, Somatic, or LOH) in large-scale capture datasets, using this one as our test case. Given BAM files for a tumor sample and its matched control, our pipeline does the following:

Identifies variants (SNPs and indels) in each of the matched samples
Determines somatic status for each variant using probability (glfSomatic) or statistical (VarScan) methods.
Generates a list of putative somatic mutations.
Removes known germline variants using dbSNP, the 1,000 Genomes Project, and other sources.
Annotates the filtered variants with gene structure and conservation information.
Divides annotated variants into tiers according to predicted function class.
Segregates the variants in each tier into high, moderate, and low confidence groups according to their supporting evidence.

The above is a simplified representation. In fact, the pipeline control module itself contains 28 sub-processes, and that number is still growing.

Application to TCGA-Ovarian Capture Data

When we applied our pipeline to TCGA Ovarian data, we predicted thousands of putative somatic mutations across the 94 patients. Manual review, additional filters, and validation efforts whittled that list down to just over 1,000 validated somatic mutations to date.

Our collaborators at the Broad Institute and Baylor College of Medicine are also sequencing TCGA Ovarian samples using their own capture methods. All three centers have exchanged datasets a couple of times now. We’ve applied our capture somatic variant detection pipeline to data from both other centers with promising results. I’m not sure if I’ll be able to show any of their data in my poster, but the results suggest that our approach is applicable to other capture methods and sequencing platforms.

For more, you’ll have to find my poster at Marco Island.

« Previous Page