Why We Sequence Cancer Genomes

April 2, 2010 by Dan Koboldt

A recent article on GenomeWeb profiling the XGen Congress meeting in San Diego, where researchers debated the question of whether sequencing cancer genomes has clinical relevance. In a roundtable discussion, University of Washington’s Larry Loeb argued that cancer is too heterogeneous for sequencing to uncover the therapeutically-relevant mutations. As an example, he pointed to AML1 – the first cancer genome, which was sequenced here – whose 8 somatic mutations were not found in a screen of 187 other AMLs.

Recurrent Mutations in AML2

I wasn’t at the roundtable discussion, so I don’t know if anyone mentioned AML2 – the second cancer genome which was published in the New England Journal of Medicine – in which four of 64 somatic mutations were recurrent in other AMLs. Two of the recurrent mutations (NRAS and NPM1) were previously known. The other two were novel, and particularly interesting: one implicated IDH1 (a gene mutated in brain cancer) for the first time in AML, and the other was a noncoding (i.e. not-in-a-gene) mutation in an evolutionarily conserved region. The latter discovery reinforces the idea that “exome” sequencing – which targets the coding regions of known genes – can miss important variation, but that’s a debate for another day.

IDH1 and Clinical Relevance

I mentioned IDH1, which was first implicated in glioblastoma in a study led by Bert Vogelstein’s group at Johns Hopkins. In a Cancer Cell publication this January, members of the Cancer Genome Atlas (TCGA) research consortium found that abnormalities in IDH1 and three other genes (PDGFRA, EGFR, and NF1) helped distinguish clinically relevant subtypes in glioblastoma. Mutations in IDH1 have since been identified in other cancer types, including acute myelogenous leukemia (AML).

IDH1 encodes isocitrate dehydrogenase 1, a cytosolic enzyme that converts isocitrate to ${alpha}$ -ketoglutarate. The discovery of somatic mutations in IDH1 in human cancers has put a spotlight on this gene and its mitochondrial homolog, IDH2. Recent studies have shown that mutations in IDH1 and IDH2 not only disrupt its normal activity, but create a new one: the reduction of ${alpha}$ -ketoglutarateto 2-hydroxyglutarate. Tumors with IDH1/IDH2 mutations show elevated levels of 2-hydroxyglutarate, suggesting that further studies of this “oncometabolite” may shed new light on cancer development and progression.

If that’s not clinically relevant, I don’t know what is.

References
Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O’Kelly M, Tamayo P, Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, Getz G, Perou CM, Hayes DN, & Cancer Genome Atlas Research Network (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer cell, 17 (1), 98-110 PMID: 20129251

AGBT: Focus on Cancer Genomics

February 26, 2010 by Dan Koboldt

As usual, the quality of the scientific presentations at this meeting has been outstanding. The weather, too, has improved at last:

p_00014

There are too many to cover (or even attend) completely, but one area of interest with a strong focus this year is cancer genomics. Yesterday during plenary sessions, Stacey Gabriel of the Broad Institute of MIT and Harvard presented sequencing of multiple myeloma, a liquid tumor affecting 50,000 people in the US. Around 5,200 gigabases of sequence was generated across 26 tumor samples and matched controls, yielding ~30x average depth per genome. Their mutation detection pipeline achieved an admirable validation rate for somatic SNVs (95%). Short indels were more challenging (~50% validated), and candidate rearrangements even more so (30-50% validated). However, their study validated ~40 somatic mutations per tumor, implicating known MM genes (NRAS, KRAS, TP53) as well as novel ones (DIS3, FAM46C).

Elliott Margulies on Melanoma

Last night, there was a concurrent session devoted to cancer genomics. Eliott Margulies (NIH/NHGRI) led the lineup with his work sequencing the tumor genome and matched normal of a melanoma patient. Using the Illumina platform (2×100 bp), his group achieved 36x and 43x haploid coverage for tumor and normal, respectively, with ~99% of the genome covered by at least one read. Much of the talk was devoted to their analysis pipeline, summarized as:

Initial alignment of Illumina reads with ELAND
Partitioning the reads into “genome” bins of several kilobases
Local realignment with cross_match in highly parallelized fashion
SNV calling with their “Most Probable Genotype” (MPG) method
Removal of variants with any evidence in the Germline, or ones in dbSNP

The 175,768 novel tumor-specific SNVs were classified as coding (807) or noncoding (174,961). Some 513 of 807 coding variants were nonsynonymous. Of these, 101 were selected for validation; 84 got validation results and 75 somatic coding mutations (89%) were confirmed. Unsurprisingly, Dr. Margulies used his group’s expertise in comparative genomics to closely examine the noncoding variants as well. His group recently annotated “Chai” regions of the human genome, which bear evidence of evolutionary constraint that suggest functional relevance. Some 10,285 of the 174,961 fell within Chai regions, and among them were ~2,000 variants predicted to dramatically alter the local structure of DNA (suggesting regulatory changes).

Sequencing Pre- and Post-Treatment Lung Cancer

Ian Bosdet of BC Cancer Agency presented some very interesting work on mutational profiling of pre- and post-treatment lung cancer tumors. His group had the opportunity to participate in a clinical trial at BCCA in which carefully-selected, treatment-naive NSCLC patients underwent a standard therapeutic program. First, each patient underwent a pre-treatment evaluation and biopsy. Next, they received erlotinib (an EGFR inhibitor) until the disease inevitably progressed. Then, another biopsy that was sent for pathology review, as well as DNA/RNA extraction for sequencing. Transcriptome sequencing yielded some interesting findings. For example, the expression of one gene (IER5L or IER5C, it’s hard to read my own handwriting) was highly expressed in smokers that did not respond to treatment. A screen of unmapped transcript reads against viral genomes revealed the presence of Epstein-Barr Virus transcripts in one tumor that was later re-classified as EBV-positive lymphadenocarcinoma (?).

Mutational profiling for three patients was obtained via exome capture (Agilent) and sequencing of normal, pre-treatment tumor, and post-treatment tumor samples. Somatic mutations in PHACTR2 were seen only in pre-treatment samples. Mutations in a few genes (PRMT10, RanBP2) were found at both times, but a few (YY1AP1, SNX9) were only present after treatment, suggesting a role for these genes in progressive disease.

Sanger Adds Two Cancer Genomes

December 31, 2009 by Dan Koboldt

This week in Nature, investigators from Wellcome Trust Sanger Institute published the fourth and fifth complete cancer genomes. Interestingly, both are cancers in which the primary mutagen is known: malignant melanoma (UV light) and small-cell lung cancer (tobacco smoke). This seems to be important, because when I looked at the number of validated somatic coding mutations in each of the first five genomes, the latest two stood out.

first5cancer

Granted, small-cell lung cancer and malignant melanoma differ in many ways from leukemia and breast cancer. Yet the increase of confirmed somatic coding variants in these two recent studies is striking.

Corregendum: Not the First Catalogue of Somatic Mutations

Even so, the authors’ claim that they are “providing the first comprehensive catalogue of somatic mutations from an individual cancer” seems unjustified. Perhaps this is based on the idea that AML1 and LBC1 focused on coding variants only. Yet I point out that in AML2 we evaluated 282 noncoding somatic mutations and confirmed over 50. In the melanoma study, only 470 of the 33,345 newly found somatic mutations were sent through validation, and the method for selecting these was not made clear. At best, the “first comprehensive” claim is a semantic one; at worst, it’s just wrong.

That said, these are still landmark studies. Even with the emergence of next-generation sequencing, completing a cancer genome is a marvelous achievement. It requires substantial financial resources and technical expertise; we certainly knew that WTSI had these. But guiding the data through analysis and forming a cohesive story out of it is the real challenge. It requires persistence, intellect, and scientific rigor, but most of all it requires strong leadership. I congratulate our friends across the pond for showing that they have what it takes.

Illumina’s Fourth, ABI SOLiD’s First Cancer Genome

The melanoma study, like the first three cancer genomes, applied high-throughput sequencing on the Illumina platform (2 x 75bp, in this case). In contrast, the SCLC study is the first cancer genome to be sequenced on a different platform – ABI SOLiD. While the read lengths for SOLiD were not impressive (2×25 bp), the specificity was – 77 of 79 (97%) of somatic coding SNVs and 333 of 354 (94%) randomly chosen genome-wide variants tested confirmed by PCR and traditional sequencing.

Unfortunately, SOLiD also comes with a sensitivity cost. Only 22 of 29 previously identified SNVs (77%) were called. Indels were a real problem – neither of the two previously known coding indels were detected, and the validation rate for predicted somatic indels was 25%.

Mutational Signatures Implicate UV Light and Tobacco

Intriguingly, in both studies the authors identified distinct mutational signatures of exposure to the long-suspected environmental risk factor – ultraviolet radiation (in malignant skin cancer) and tobacco smoke’s “cocktail of carcinogens” (in lung cancer). The substantial number of mutations in each genome made it possible to characterize these signatures with unprecedented statistical power.

The strength of both studies is the insight into the molecular mechanisms of DNA damage, repair, and mutation that could be inferred from such a powerful dataset. In melanoma, the most prevalent mutations were C->T and G->A transitions; the mutational spectrum and sequence context indicate that most of these are attributable to ultraviolet-light-induced DNA damage. In lung cancer, G->T transversions were the commonest substitution; these mutations have previously been linked to polycyclic aromatic hydrocarbons and acrolein in tobacco smoke.

Don’t Smoke. Wear Sunscreen.

Cancer, like many common, complex diseases, has many risk factors that come from the environment as well as from one’s DNA. Here, for perhaps the first time, we get a picture of the significant mutational burden associated with two lifestyle choices. Smoking is an obvious one – avoiding it dramatically decreases one’s risk of cancer and a host of other diseases. Now science can offer another definitive piece of advice. Everybody’s free, as Baz Luhrmann puts it, to wear sunscreen.

References
Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, Mudie LJ, Ning Z, Royce T, Schulz-Trieglaff OB, Spiridou A, Stebbings LA, Szajkowski L, Teague J, Williamson D, Chin L, Ross MT, Campbell PJ, Bentley DR, Futreal PA, & Stratton MR (2009). A comprehensive catalogue of somatic mutations from a human cancer genome. Nature PMID: 20016485
Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, Costa GL, Lee CC, Minna JD, Gazdar A, Birney E, Rhodes MD, McKernan KJ, Stratton MR, Futreal PA, & Campbell PJ (2009). A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature PMID: 20016488

Genetics of Human Longevity

August 19, 2009 by Dan Koboldt

A new study in PLoS ONE resequenced candidate genes in a cohort of the “healthy oldest-old” – individuals aged 85 or older that are healthy and have never been diagnosed with cancer, cardiovascular disease, Alzheimer’s, pulmonary disease, or diabetes. The idea is that these robust old-timers harbor genetic variants that reduce susceptibility to, or even protect against, the prevalent age-related disorders that tend to shorten lifespans. Demographic data suggest that less than 36% of the population of western nations will live to see 85, and only a third of these (12% overall) will do while remaining in good health. Since longevity is highly heritable (~25%), it stands to reason that genetics play a key role.

Tortoise Still Winning the Race

Intriguingly, despite the sum of human technological achievements – in agriculture, sanitation, medicine, etc. – our maximum observed lifespan (122 years) is not the longest on the planet even among animals. Indeed, the authors point out that rougheye rockfish, bowhead whales, red sea urchins, and Galapagos tortoises easily outlive us, with lifespans of 150-200 years. We can extend the lifespan of other animals – mice, by putting them on reduced-calorie diets, and C. elegans, by inhibiting expression of insulin/IGF receptor daf-2 – but can’t seem to change our own.

Rounding Up the Usual Suspects (Genes)

Next, the authors selected 24 candidate genes known to be involved in age-related processes. These included genes implicated in dietary restriction (SIRT1/3, UCP2/3, PPARG), autophagy (FRAP1, BECN1), stem cell activation (NOTCH1, DLL1), progeria syndromes (LMNA, ZMPSTE24, KL), tumor suppression (TP53, ING1, CDKN2A), and DNA methylation (TRDMT1, DNMT3A/B). Also included were the human homologs of several genes known to be differentially expressed in long-lived daf-2 mutant worms: IGF1R (growth factor receptor), SCD and APOB (lipid metabolism), and CRYAB and HSPB2 (heat shock proteins). Such an esoteric gene list allowed the authors to screen for variants across a wide range of gene functions and biological pathways that might contribute to longevity.

Ye Olde Candidate Gene Resequencing

Some 716 PCR amplicons were designed to isolate the exons, 5′ and 3′ UTRs, 1.5 kbp promoters, intron-exon junctions, and selected conserved noncoding sequences (CNSs) for each of the 24 genes. Altogether some ~360 kbp of DNA was sequenced, bidirectionally, producing a grand total of ~35 million high quality (phred > 20) bases.

Variant detection with phred/phrap/polyphred and Mutation Surveyor identified 935 sequence variants (848 SNPs and 87 small indels), of which 59% were previously known to dbSNP. Unsurprisingly, the majority of variants found mapped to introns or conserved noncoding regions. About 50 novel coding SNPs were identified, though the authors point out that they were far less common (average MAF 1.6%) than the 80 or so previously known coding SNPs (average MAF 19%).

Tag SNPs: Leveraging the HapMap Resource

Here the authors took a rather puzzling turn and sought to compile a set of longevity tag SNPs by combining their data with the findings of the International HapMap Project. Only 12% of the combined variant set was shared between HapMap and the resequencing dataset, but that’s hardly surprising – HapMap variants were selected on the basis of high frequency (MAF > 5%), whereas many of the novel variants identified in this study were rare (in coding regions, MAF=1.6%). Thus the SNP sets are very likely to complement one another.

The authors selected 682 tag SNPs representing 1,550 non-redundant variants from the combined datasets (using LD > 0.8 for HapMap SNPs, LD >= 1 for resequencing SNPs). These were utilized to genotype a larger cohort (493 healthy oldest-old and 439 random controls), but unfortunately, the data was not shown. How disappointing! It seems to me that if the authors had found any significant association between their tag SNPs and longevity, that would have been an important result.

Common vs. Rare Variants: Is HapMap Enough?

One conclusion that was perhaps over-emphasized was that HapMap SNPs were inadequate to capture rare variation in the study population. Some 264 of the 935 variants identified by resequencing were singletons, i.e. present in just one individual, and only around 2.5% of these could be captured by HapMap tag SNPs using r-squared of 0.8. The authors conclude that “This shows that HapMap tagSNPs generally do not adequately represent, private re-sequencing SNPs. This analysis highlights a major challenge for genetic association studies. Using only HapMap SNPs, effects due to uncommon variants would often be missed.” Well, yes, but also, duh. HapMap was intended to represent common, and not rare, variation. Far more compelling would have been if the authors found rare variants actually associated with their phenotype of healthy aging. But alas…

The authors raise a fair point in that association studies cannot rely on the HapMap alone. To obtain the complete picture of genetic variation underlying a phenotype of interest requires a hybrid strategy that includes both common and rare variants. At some point this will require whole-genome resequencing of affected individuals, and for that, we’ll need something more than the 3730.

References
Halaschek-Wiener, J., Amirabbasi-Beik, M., Monfared, N., Pieczyk, M., Sailer, C., Kollar, A., Thomas, R., Agalaridis, G., Yamada, S., Oliveira, L., Collins, J., Meneilly, G., Marra, M., Madden, K., Le, N., Connors, J., & Brooks-Wilson, A. (2009). Genetic Variation in Healthy Oldest-Old PLoS ONE, 4 (8) DOI: 10.1371/journal.pone.0006641

« Previous Page