Genetic Evolution of Secondary AML from MDS

March 14, 2012 by Dan Koboldt

Contents: Whole-genome Sequencing • Recurrently Mutations • Clonal Evolution • References
Myelodysplastic syndromes (MDS) are a group of disorders of ineffective blood production and the most common cause of acquired bone marrow failure in adults. One-third of cases go on to develop secondary AML (sAML), yet there remains uncertainty among patients, insurers, and funding agencies about whether the myelodysplastic syndromes are actually cancers. A study online today at the New England Journal of Medicine has characterized the genetic evolution from MDS to sAML using whole-genome sequencing.

Whole-genome Sequencing of sAML

Matthew J. Walter and colleagues of the Washington University School of Medicine performed whole-genome sequencing of tumor samples and matched normal DNA from seven patients with secondary AML. For each subject, hundreds of somatic mutations were genotyped in sAML and MDS-stage samples to characterize the clonal architecture of each tumor. Figure 1A from the paper demonstrates the resolution that can be obtained from deep resequencing of somatic mutations in both sAML and MDS samples:

Notice the five clusters (differently colored) representing five clonal populations. In yellow (cluster 1) are mutations present in virtually all cells of both the MDS and the sAML sample. In orange (cluster 2) are mutations present at low frequency in MDS but enriched in sAML. Three more clusters (red, purple, and black) along the y-axis represent mutations that were absent in the MDS sample but acquired during the progression to sAML. The patterns of these mutations suggest that sAML evolved from a clonal population of MDS cells that acquired new mutations along the way.

Identification of Recurrently Mutated Genes

In the very near future, it may become feasible and cost-effective to perform whole-genome sequencing (WGS) on hundreds or thousands of tumors of a certain type to exhaustively identify recurrently mutated genes. Until then, WGS of a discovery cohort followed by extension screening in a larger cohort offers a powerful and cost-effective strategy. Two genes were already recurrently mutated in the 7 WGS cases: RUNX1, a known myeloid tumor suppressor, and UMODL1, for which mutations were recently reported in multiple myeloma and ovarian cancer. The authors extended their findings via targeted screening for additional coding mutations in 200 AML cases. This enabled the identification of 9 more recurrently mutated genes, for a total of 11.

Recurrently Mutated Genes in MDS and sAML

Gene	Mutation(s)
CDH23	1235insL
NPM1	W288fs
PTPN11	G60R
RUNX1	G170fs; del21q22.11
SMC3	e8-1 splice
STAG2	H738fs
TP53	V272M
U2AF1	S34F
UMODL1	T533P; V882M
WT1	D436E
ZSWIM4	P18A

Notably, four of the genes (CDH23, SMC3, UMODL1, and ZSWIM4) had not been implicated in MDS or AML. A specific codon (34) in U2AF1 harbored missense mutations in multiple AML tumors, suggesting a gain-of-function for the splicing factor encoded by that gene. The recurrent mutations in STAG2, a gene located on the X-chromosome, were all protein truncation mutations (nonsense or frameshift) suggesting that a loss-of-function of this gene contributes to MDS and AML pathogenesis.

Clonal Evolution: from MDS to AML

By characterizing mutations from secondary AML tumors in the MDS precursors for the same patient, the authors reconstructed the clonal architecture of the disease from early to advanced stages. The findings are summarized in Figure 2A:

In all 7 cases, the results suggest a linear model of clonal evolution, in which progression from MDS to sAML was characterized by persistence of a single founder clone (defined by ~200-700 mutations) and the outgrowth of at least one new subclone which contained dozens or hundreds of additional mutations. In other words, a single population of MDS cells underwent multiple rounds of mutation and selection, giving rise to multiple subpopulations present in full-blow secondary AML.

Please go read this fascinating study at the New England Journal of Medicine.

References

Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K, Larson DE, McLellan MD, Dooling D, Abbott R, Fulton R, Magrini V, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Fan X, Grillot M, Witowski S, Heath S, Frater JL, Eades W, Tomasson M, Westervelt P, DiPersio JF, Link DC, Mardis ER, Ley TJ, Wilson RK, & Graubert TA (2012). Clonal architecture of secondary acute myeloid leukemia New England Journal of Medicine

Human Genetics Challenges in an Era of Cheap Sequencing

February 22, 2012 by Dan Koboldt

Next-generation sequencing promises to reach unprecedented levels of throughput this year, driving down the cost of sequencing dramatically. Somewhere between the GridION, the Ion Proton, and the HiSeq2500, we may see the first single-day, $1,000-per-genome technologies in 2012. Even so, a 90% reduction in sequencing cost this year will not magically solve all medical problems, even the ones that are clearly genetic. We are already reaching a point where getting enough sequencing coverage and finding the variants no longer present a significant problem. Instead, the field of human genetics faces three significant challenges as we enter an era of ultra-low-cost sequencing.
1. Obtaining Sufficient, Relevant, Consented Samples
2. Clinical Annotation of Genetic Variants
3. Interpretation of Complex Genomes

Obtaining Sufficient, Relevant, Consented Samples

Samples will become a major challenge. Specifically, obtaining sufficient numbers of high-quality, accurately phenotyped, properly consented samples for sequencing. I know for a fact that many, many studies are not facing a bottleneck at sequencing capacity but at sample collection, consent, and banking. There are even internationally renowned cancer centers where banking tumor samples and patient blood samples is not a standard or required practice for oncologists. A sad reality is that, every day, people succumb to diseases such as cancer, metabolic syndromes, and heart disease where genetics undoubtedly plays a role. Those samples, if not banked, are lost to the world of science.

The good news is that there are many excellent cohorts out there. There are entire populations that have been catalogued, sampled, and followed-up-with over the course of decades, with huge amounts of qualitative and quantitative clinical data. The commoditization of sequencing means that the proprietors of these cohorts will have their choice of sequencing providers. Informative samples, especially those from patients suffering from rare inherited disorders, will be in high demand. Tumor samples will be fought over by researchers, drug companies, and the treating physicians. In a world of cheap sequencing, samples are the new commodity.

Clinical Annotation of Genetic Variation

With long enough reads and sufficient coverage, finding mutations will no longer be a problem. The new challenge will be in assessing their functional significance and determine which have clinical relevance.

Imagine a breast cancer patient whose germline and tumor genomes have been sequenced to high depth. You have the full spectrum of germline/somatic mutations, copy number alterations, and structural variants. And you also have so many questions:

• Which of these mutations are drivers? Which are passengers?
• What do the variants say about diagnosis or prognosis?
• Are there any clinically actionable mutations?
• Have any been seen before in this tumor type, or other tumor types?
• Are there germline susceptibility variants that predisposed this patient to developing cancer?
• If so, should that be communicated back to the patient’s family? Can it?

It is certain that clinical annotation and risk assessment will be more costly and time-consuming than whole-genome sequencing.

Interpretation of Complex Genomes

Let’s face it, people, even with thousands of samples and accurate genotype information for millions of SNPs, we’re still struggling to suss out the genetic underpinnings of most common diseases. Just last week, I heard about the whole genome sequencing of a family quartet in which the two offspring, monozygotic twins, had a neurological phenotype of likely genetic origin. Yet even after numerous fancy variant-calling and filtering approaches were applied, the researchers were unable to pinpoint a cause. We’ll undoubtedly hear dozens of stories like these as large-scale efforts to determine the genetic basis of inherited diseases (e.g. Mendelian disorders) get under way this year. Yes, with sufficient samples, precise phenotyping, and comprehensive variant detection, we will have the statistical power to detect small-effect changes associated with a given phenotype. But that’s association, not causation. High-throughput functional assays may be required to determine if a certain variant is the actual cause.

Credit: National Geographic

When it comes to coding regions of the genome, we have a number of tools at our disposal to evaluate the consequences of an observed variant. RNA-seq can tell us if the gene is expressed, and if both alleles are represented. Computational algorithms can determine the likelihood that the change is damaging to the protein. High-throughput proteomics can even assess the level of protein in the cell. We can do a lot to investigate coding variants.

I wish I could say the same about noncoding variation. With the recent availability of exome sequencing, we’ve all had the luxury of cherry-picking variants in coding regions because these are less numerous and easier to interpret. But the simple reality is this: the vast majority of genetic variation in humans lies outside the exons of protein-coding genes. Anecdotal examples tell us that noncoding variation is quite capable of exerting influence on a phenotype, though the effect may be quite subtle. We have a lot more to learn about noncoding DNA, and we’ll need to study up in order to correctly annotate and interpret the vast catalogue of genetic variation in human genomes.

AGBT 2012 Last Day: Elephants in the Room

February 18, 2012 by Dan Koboldt

A comment from the current speaker (Vivian Cheung) inspired this post’s title, and it seems to me that the final day of AGBT 2012 has many elephants in the room. There’s the Roche hostile takeover, which has had relatively little chatter this week. I managed to meet Illumina CEO Jay Flatley after a talk yesterday (before Oxford’s announcement); he was polite and exuded nothing but cheerful, casual confidence.

The talks today have been spectacular. The morning session included Michel Georges on the genetic basis of “color-sidedness”, a coloration trait observed in Belgian blue cattle that was mapped to a duplication near the KIT gene in the bovine genome. Jesse Gray of Harvard Medical School presented his work on “steady-state” RNA-seq to decipher the kinetics of transcription and splicing. Patrick Schnable of Iowa State talked about gene loss during domestication of modern maize from its ancestor, Teosinte (tee oh sin tay) by Native Americans about 10,000 years ago.

Over the coffee break I met James Hadfield; he and Nick Loman are the creators of the next-generation sequencing maps, a visualization tool of NGS installations across the world.

After the break was one of my favorite talks, a survey of DNA methylation in hematopoietic stem cells, lymphoid cells, and myeloid cells given by Emily Hodges of CSHL. Then James Galagan walked us through systems biology approaches to study tuberculosis, whose pathogen has the unique ability to survive inside macrophages (in the face of hypoxia and even drug exposure) and does so by eating your cholesterol!

In the final session, chaired by Elaine Mardis, we heard about RNA-DNA differences in B-cells (Vivian Cheung) and streaming algorithms for RNA-Seq analysis from Lior Pachter, who related that someone had contacted him recently about processing 14 billion RNA-seq reads. That’s a lot.

The meeting is still abuzz with talk of Oxford Nanopore; which I think we can all agree is a disruptive technology, if the stock market is any indication:

My colleagues at Genomes Unzipped have a thorough take on this new technology, its promise, and what it could mean for the field. For my part, I remain cautious. I grew up in St. Louis, Missouri, the “show-me” state, and I’ll be convinced the moment I hold a minION in my hand.

AGBT 2012 Day 2: Cancer, Technology, and Oxford Nanopore

February 17, 2012 by Dan Koboldt

Room With A View (Credit: Todd Wylie)

It was a thrilling day on Marco Island, with many good talks and the so-called “big announcement.” I feel as though I can’t remember much that happened before 11:55 a.m. local time, but luckily, I take notes.

Keynote speaker Rick Myers gave a high-level talk on the genetics and epigenetics of human gene regulation. They’re using CHiP-Seq to study the “interactome” of proteins binding to DNA. “Alignment to the genome and calling the gene models,” Myers said, “is still the hardest part.” He went on to talk about some of the other techniques they’re using: transposon-hopping RNA-seq (Tn-RNA-Seq), which sounds intriguing, and reduced representation bisulfite sequencing. These, with CHiP-Seq, bring together the interactome, the methylome, and the transcriptome.

Tom Gingeras spoke about the complexities of transcription in the human genome. He noted that 80% of the genome is expressed in the form of a primary transcript, including 90.7% of exon bases, 79.3% of intron bases, and 35.5% of intergenic bases. In his lab, they see a huge number of unannotated, single-exon transcripts that are intergenic, predominantly polyA-minus, and antisense. One very interesting thing they’re doing is looking at differences in transcription in different subcompartments of the cell.

Exome In A Day on the PGM. Kind of.

Joseph Boland of NCI gave a much-anticipated talk on exome sequencing with the IonTorrent PGM, a platform they’ve been working with for just over a year. It was ambitiously (and somewhat inaccurately) entitled “Exome In A Day.” They currently have six instruments, and have worked over the last year with the 314, 316, and 318 chips. Admittedly, the quick turnaround of the IonTorrent PGM has tremendous appeal for their laboratory, for situations such as:

Reviewers of a manuscript ask for more data, and the revision has a quick deadline
One member of a family study fails exome capture and/or sequencing
Downtime of the primary machine (Illumina HiSeq)
Site visit “crises”

In fairness, the hybridization for current exome kits takes 3 days, and sample/library prep takes another day. So even with 4-6 hours of sequencing, “Exome in a day” is not quite accurate. But it’s certainly faster than 8-10 day runs on the Illumina, albeit with dramatically lower throughput.

The presenter described some standard testing of the platforms on a CEPH trio, with comparisons to whole-genome sequence data from Complete Genomics. Coverage of target regions and variant calling stats looked pretty good, with perhaps a slightly higher false positive rate in the PGM data. Next, they sequenced the proband of a melanoma-prone family. Here, whole-genome data from CGI detected 16 cancer-relevant mutations, whereas the PGM exome data yielded only 10. This is a bit of a sensitivity concern, likely suggesting that important regions of the exome were under-covered.

Cancer Genomics Highs and Lows

There appeared to be a strong showing of cancer talks on the evening agenda, and indeed some of these were impressive. David Smith of the Mayo Clinic presented some early analyses of a project looking at head and neck cancer. The well-known risk factors for these deadly cancers (50% 5-year survival for advanced disease) are smoking and alcohol use. Recently, however, infection with two HPV serotypes (16 and 18) has been recognized as a risk factor, particularly for younger patients. As many as 90% of non-smokers who present with squamous cell carcinoma are HPV-positive. The numbers in the current study are still rather small: 8 exome tumor-normal pairs, 24 methylome (Infinium) pairs, and 18 RNA-seq pairs. Strangely, mutation rates in HPV-negative smokers and HPV-positive never-smokers seemed similar, which is puzzling, but could be an artifact of small sample numbers.

Sam Levy gave a nice talk on de novo assembly followed by assembly-to-assembly mapping (amusingly abbreviated DAAM) for the detection of somatic mutations, particularly indels and complex events. This seemed quite promising, and the presenter’s casual use of the acronym (“The DAAM approach calls them as somatic mutations.”) kept it light. Compared to a mapping-based approach for indel detection (BWA+GATK), their method seems to do better at finding longer indels. One disadvantage, though, is that the de novo assembly generally requires some planning, in the form of multiple libraries of different insert sizes.

The Aftermath of Oxford Nanopore

The talk throughout the afternoon, unsurprisingly, centered upon the announcement of a USB-drive-sized nanopore sequencer to be released in 2012 by Oxford Nanopore. I give them credit for keeping this effectively “under wraps” prior to Clive Brown’s talk, which drummed up the appropriate surprise and excitement in the meeting. Precious few were aware of the plans for the pocket-sized sequencer; one of them was Keith Robison who’d gotten a heads-up last night and had a post ready on Omics Omics! Sadly, Massgenomics was not offered an exclusive. Possibly because the author was here at Marco Island, actively blogging/tweeting, and admittedly might not have been able to contain such a whopper of a secret.

As much as I’m impressed by the technological achievement, I find myself bearish on the market-shifting potential of this technology. It seems like large genome centers are unlikely to have a use for disposable sequencers, and might lean more toward the larger instrument. It’s the same technology, and they still have kinks to work out. Time will tell how much of a disruptive technology this turns out to be.

« Previous Page