A Guide for Deep Sequencing of Human Genomes

August 26, 2011 by Dan Koboldt

The incredible throughput of current second-generation sequencing platforms makes it possible to sequence a complete human genome to high coverage, with a single instrument run, in less than 2 weeks. As whole-genome sequencing becomes more routine, it is increasingly important to understand the accuracy of sequence-level analyses, such as SNP detection, and its relationship to overall sequence depth. Enter a recent study from the lab of Elliott Margulies at NHGRI. As part of the NIH Undiagnosed Diseases Program, the authors generated over 380 gigabases of sequence data from the blood sample of a male patient. This is an astonishing amount of sequence for one sample, roughly 126-fold theoretical redundancy genome-wide.

Perhaps just as importantly, the dataset comprised four runs on two different but related platforms: the Illumina GAIIx, and the Illumina HiSeq2000. Here is a brief summary of the dataset.

Dataset	Total Gbp	Map Rate	Dup. Rate	Mapped Depth	% Genome Callable
GAIIx (14 lanes)	118	95.3%	3.9%	34.2x	88.82%
HiSeq A (8 lanes)	122	94.0%	13.7%	32.7x	90.99%
HiSeq B (8 lanes)	144	92.6%	8.7%	40.4x	93.10%
All (30 lanes)	384	93.9%	13.6%	102x	95.88%

With this impressive dataset in hand, the authors undertook a detailed examination of the technical aspects of sequence analysis: coverage uniformity, platform comparisons, genotyping accuracy, etc. and seek to answer two questions:

Given a specific amount of sequencing data, what fraction of the genome is “callable”?
How many SNVs can be accurately identified?

The results, I think, are critically important in the near future as whole-genome sequencing becomes routine and widely accessible to investigators.

Coverage Versus Callability

The authors correctly note that while many studies report “coverage” of genomes or exomes in terms of minimum depth achieved (1x, 5x, 10x, etc.), this metric alone is insufficient. Namely, it may not include information about alignment and quality filters, as well as the requirements of genotype calling algorithms. A better approach might be to report the fraction of the genome/exome that is “callable” – where genotypes can be determined with at a specified confidence threshold when all filters are applied. This term is roughly equivalent to what the 1000 Genomes Projects calls the “accessible” portion of the genome. In this study, the authors calculate callability by:

Starting with reads that pass the Illumina chastity filter
Further removing reads with <32 Q20 bases
Mapping reads to the reference sequence using BWA
Removing duplicates (using SAMtools rmdup)
Considering only bases with quality >= 20.
Requiring a genotype probability score of 10.

The last metric refers to the score from the group’s Bayesian genotype calling algorithm, Most Probable Genotype (MPG). An MPG score of 10 is a log-scaled value indicating a 1/e^10 (that’s 1/22026) theoretical probability of being incorrect. By these criteria, 88.82% of the genome was callable in the GAIIx dataset (34.2x mapped depth) and 90.99% was callable in the HiSeq-A dataset (32.7x).

You may notice that the GAIIx platform had more mapped bases but yielded a lower callability than HiSeq-A, and wonder, how could this be? It has long been observed that coverage is non-uniform across the genome and follows a Poisson distribution, influenced by factors such as read length, region mappability, and GC content. Although the amount of sequence data was similar, HiSeq platforms achieved a more uniform coverage than GAIIx, yielding more callable bases genome-wide.

GAIIx vs HiSeq Coverage of the Genome and Exome

To enable some direct comparisons, the authors normalized the HiSeq2000 data into a set of equivalent size to the GAIIx datset (34.2x average mapped depth), then assessed coverage of the genome as well as the exome (here defined as ~34 Mbp of non-redundant coding sequence from the UCSC Known Genes). Here’s a plot of the Q20 coverage for GAIIx and HiSeq values from Supp. Table 1.

On both platforms, around 97% of the genome was covered by at least one read. At 10x coverage, however, GAIIx covers 89.4% of the genome whereas HiSeq covers 92.2%. These differences were even more pronounced in the exome, where GAIIx and HiSeq covered 67.4% and 76.2% of the exome at 10x, respectively. Since both platforms performed unbiased whole-genome sequencing, the authors conclude that HiSeq’s superior coverage comes from a better representation of high-GC-content sequences, which tend to have higher gene density.

Filters for Accurate Genotype Calling

The authors next undertook a careful experiment to establish appropriate filters for SNV calling genome-wide. Pooling all Illumina data together, they generated two equal-sized datasets with an average mapped coverage of 50x by random read sampling. Next, they compared genotype calls at all bases that were “callable” with MPG score >=10. Among the 2.8 billion positions (98.3% of the genome) that met these criteria in both datasets, there were 46,580 discordant genotypes. Many of these, unsurprisingly, arose from sequence reads that were improperly aligned (misplaced, or locally mis-aligned). To address this, the authors removed reads with mapping quality <30 from both datasets. This mapping quality filter reduced the comparison set to 93.6% of the genome, but removed 81% of discordant calls.

Among the 8,710 remaining discordant positions, the authors observed consistently lower MPG scores than were seen among concordant positions, particularly at high coverage sites. They made perhaps one of the most useful inferences of this study: that genotype accuracy can be improved by requiring higher probability scores at higher sequence depths. Basically, they required that, for a given position, the ratio of MPG score to Q20 coverage be at least 0.5. The confidence-by-depth filter removed 61.5% of discordant positions but reduced callability by just 0.02%.

Finally, the authors employed the widely used strategy of removing SNV calls within 10 bp of called indels. This indel-nearby filter removed 26% of the remaining discordant positions, while reducing callability by 0.43%. Thus, by applying three filters aimed at reducing false positives, the authors removed 96.4% of discordant positions and maintained callability across 93.13% of the genome.

How Many Variants Can Be Detected?

The next experiment was quite interesting: the authors pooled all Illumina data, and progressively added reads to create datasets of 5x, 10x, 15x mapped coverage, all the way up to 100x. In each dataset, they applied their variant calling with all filters, then reported the number of SNVs that were identified. I’ve generated a plot of the number of SNVs called genome-wide by dataset:

At 30x, which might be considered a de facto standard, around 3 million variants were identified. Each new depth adds perhaps 10,000 variants, but at 50x the discovery power is nearly saturated (3.32 million, or 95% of the total). Very little is gained going from 50x to 105x, although, if the relationship between genes, GC content, and callability holds true, many of these could be coding variants. In summary, deep resequencing of a sample to 105-fold coverage tells us that a typical human genome contains around 3.5 million SNPs. That’s very close to estimates from the personal genomes that have already been published (~3.1 m to 4.1 m SNPs), which I find reassuring. It would be informative to see a similar experiment on a sample of African origin, where the number might be closer to 4.5 million.

The Sweet Spot of Coverage and Callability

Based on these experiments and their callability calculations, the authors estimate that generating 50x mapped coverage (60x before read mapping/filtering are applied) renders ~95% of the genome and ~81% of the exome callable. Intriguingly, however, the authors note that they’d sequenced an unrelated sample using the latest HiSeq chemistry and basecalling software, achieving the same level of callability with just 35x mapped coverage. If anything, this emphasizes that (as the authors suggest), a “callability” metric is far more informative to report when describing the resequencing of human genomes.

References
Ajay SS, Parker SC, Ozel Abaan H, Fuentes Fajardo KV, & Margulies EH (2011). Accurate and comprehensive sequencing of personal genomes. Genome research PMID: 21771779

NGS and the Hallmarks of Cancer

January 28, 2011 by Dan Koboldt

Massively parallel sequencing will be applied to hundreds or thousands of tumor genomes this year. Catalogues of somatic alterations in human cancers (e.g. COSMIC) will grow, perhaps as exponentially as dbSNP did in the past decade. Perhaps more importantly, we will begin to see cases where whole-genome or whole-exome sequencing of a patient’s tumor guides his or her treatment. Bridging the gaps between mutation discovery, biological interpretation, and clinical action, however, will be a substantial challenge. Hence the theme of this month’s posts, cancer biology and pathology.

Just over a decade ago, Douglas Hanahan and Robert A. Weinberg published the landmark article “The Hallmarks of Cancer” in the journal Cell. At the time, nearly a quarter-century of rapid advances had revealed a wealth of knowledge about this deadly disease. Although more than 100 subtypes of cancer had been described, Hanahan and Weinberg described six principal cellular traits shared by virtually all forms of human cancers. Collectively, these essential alterations in cell physiology dictate tumor development and growth.

Credit: Hanahan and Weinberg, Cell (2000) 100:57-70

Each of these six acquired capabilities – evasion of apoptosis, self-sufficiency in growth signals, insensitivity to growth inhibition signals, limitless replicative potential, sustained angiogenesis, and tissue invasion/metastasis – represents the successful circumvention of inherent anticancer defense mechanisms of cells and tissues.

Genomic Instability and Driver Mutations

Most of these acquired capabilities arise from somatic alterations – mutations, structural events, and epigenetic changes – which presents something of a dilemma. Thanks to a swath of fastidious DNA monitoring and repair enzymes, mutation is a rare event, and altering the critical genes to successfully acquire each capability is inefficient. For a single cell to achieve all of them in the span of a human lifetime is, well, statistically improbable. The authors suggest a seventh principle, not a hallmark of cancer but a universally enabling characteristic, to explain the means by which these six biological endpoints are reached: genomic instability.

Mutations that cause genomic instability are likely critical, early events in tumorigenesis. Studies have shown that mutations in DNA repair genes (e.g. ATM, RAD51, CHEK1) and, more recently, components of DNA methylation pathways (DNMT3A/DNMT3B) are recurrently mutated in human cancers, suggesting an important functional role in disease development.

Hallmark 1: Self-sufficiency in Growth Signals

Autonomous growth signaling was the first hallmark to be defined by cancer researchers, due in part to the large number of oncogenes that modulate it. Three common molecular strategies are used by tumors to provide self-sufficient growth stimulation:

Modulation of extracellular growth signals, for example, the production of PDGF and TGF-alpha by glioblastomas and sarcomas, respectively.
Alteration of the trans-cellular signal transducers (surface receptors), e.g. the up-regulation of EGFR in stomach/brain/breast tumors and HER-2 in mammary tumors.
Deregulation of the intracellular signaling pathways linked to transmembrane receptors, such as the Ras/Raf/mitogen activated protein kinase (MAPK) cascade.

The authors suspected that growth signaling pathways suffer deregulation in all human tumors. Ten years later, we know that this is largely true. Large-scale sequencing efforts have revealed that mutations in Ras-family genes (KRAS, NRAS, HRAS, etc.) and MAP kinase genes are frequent events in human cancers. In breast cancer, for example, PI3 kinase genes are among the most highly mutated, suffering alterations in as many as 40% of tumors.

Hallmark 2: Insensitivity to Antigrowth Signals

In normal tissue, both soluble factors and matrix-embedded inhibitors cooperate to maintain homeostasis by blocking cell growth. Much like growth stimulation, these signals are transduced to cells via transmembrane receptors and then into complex intracellular circuits. At the molecular level, most or all anti-proliferative signals are funneled through retinoblastoma (Rb) related proteins. When hypophosphorylated, Rb sequesters and alters the functions of E2F transcription factors, which normally serve to activate a number of genes required for transition from G0 to S-phase.

The best documented modulator of Rb signaling is TGFB, a soluble signaling molecule that suppresses cell growth. TGFB prevents the phosphorylation that activates Rb, thereby blocking the advance through G1. Tumor cells disrupt TGFB signaling by a number of mechanisms, including downregulation or alteration of its cellular receptor, or mutation of the key transducer of TGFB signaling, Smad4. One way or another, the anti-growth circuit converging on Rb is disrupted in a vast majority of human malignancies, virtually “defining the concept” of tumor suppressor loss in cancer.

Hallmark 3: Evasion of Apoptosis

I won’t dwell much on this topic, since much of it was covered in my post on Cancer Versus the Immune System. Simply put, evasion of apoptosis is a hallmark of many and perhaps all human cancers.

Hallmark 4: Limitless Replication Potential

Most types of mammalian cells carry intrinsic programs that limit their replication. Once cells reach a certain number of divisions, they stop growing, or senesce. In culture, human fibroblasts can be forced to keep dividing (by knocking out p53 and pRb tumor suppressors) beyond this point. These cells eventually enter a crisis state characterized by karyotypic disarray and massive cell death. A small fraction of cells, however, continue to grow and divide without limit, a trait known as immortalization.

The limit for most normal cell types is 60-70 divisions, after which they enter senescence. Obviously tumor cells surpass this limit, managing to grow and progress even while undergoing massive apoptosis. One key rate-limiting mechanism in cell division is the length of chromosome telomeres, which decreases by 50 to 100 base pairs with each consecutive cell division. At some point, telomere loss introduces massive genetic instability, and crisis ensues.

Many, if not all tumor cells address this issue by up-regulating telomerase, the enzyme that extends telomeres. This maintenance operation is a key component for enabling limitless replication potential in tumor cells.

Hallmark 5: Sustained Angiogenesis

Here’s something I didn’t know: virtually all cells must remain within 100 um of a capillary blood vessel to get the oxygen and nutrients they need to survive. You’d think that because of this limitation, rapidly proliferating cells must have an intrinsic ability to induce angiogenesis (blood vessel growth). It turns out, not so much. The ability to stimulate angiogenesis is not inherent in normal cells or developing neoplasias, and represents an acquired capability that successful tumors must achieve.

Like the other biological processes discussed in this review, angiogenesis is encouraged or prevented by a complex network of signaling molecules. Soluble factors, cell surface receptors, integrins, and cell adhesion molecules all play a role in the counter-balancing of blood vessel growth. Vascular endothelial growth factor (VEGF) and fibroblast growth factors (FGF1/FGF2), for example, are molecules that stimulate angiogenesis initiation. Thrombospondin-1 is an important inhibitor of this process.

Tumor cells encourage blood vessel invasion/growth by up-regulating inducers and suppressing inhibitors, often at the level of transcription. Loss of TP53, for example, causes thrombospondin-1 levels to fall, thereby reducing the latter’s inhibitory potential. Similarly, Ras activation and loss of the VHL tumor suppressor induce an up-regulation of the gene encoding VEGF. More recently, we’ve come to realize that the transmembrane receptors for angiogenesis-stimulating molecules (VEGFR, FGFR) are commonly mutated in human cancers.

Hallmark 6: Tissue Invasion and Metastasis

Eventually, tumor cells venture out from the primary lesion to colonize and grow in other, often distant parts of the body. Ultimately, it is these metastases that account for 90% of cancer deaths. Several families of proteins involved in tethering cells to their surrounding tissue are altered during this process. Perhaps the best-known of these are cell-cell adhesion molecules (CAMs) and integrins, which mediate cell-cell interactions and cell-matrix interactions, respectively. One example of the communication between cell and environment is offered by E-cadherin, which is expressed on the surface of epithelial cells. Bridging of E-cadherin receptors between adjacent cells triggers anti-growth signals within the cell (Lef/Tcf transcription factor activation) via cytoplasmic B-catenin. A number of epithelial cancers block this pathway, either by mutational inactivation of E-cadherin or B-catenin genes, transcriptional repression, or proteolysis of the extracellular E-cadherin domain.

Integrins have dozens of subtypes with distinct substrate preferences. Successful colonization by tumor cells at a distant site requires adaptation, which is often achieved through shifts in the spectrum of integrin alpha- and beta-subunits displayed by migrating cells. Support for this idea can be seen even in cell culture, where forcing expression of different integrin subunits can induce invasive and metastatic behavior. This aspect of tumor progression will be especially challenging to characterize, because there are large numbers of integrin genes and many, many unique heterodimeric receptors that can be generated by differential subunit expression.

Summary

It was clear even a decade ago that the innate defense mechanisms of cells to prevent transformation and metastasis are diverse and complex, and the processes by which tumor cells subvert those defenses are equally so. Nevertheless, Hanahan and Weinberg postulated that 10-20 years after the time of writing this review, diagnosis of virtually all somatic lesions within a tumor would be a routine procedure, as would comprehensive gene expression analysis. With such knowledge in hand, it would be possible to definitively test whether all tumor types behave according to a set of common rules like the ones outlined above. We aren’t quite able to provide those answers just yet, but given the rapid advances of next-gen sequencing, that day is soon coming.

References
Hanahan, Douglas, & Weinberg, Robert (2000). The Hallmarks of Cancer Cell, 100 (1), 57-70 DOI: 10.1016/S0092-8674(00)81683-9

Cancer Versus the Immune System

January 21, 2011 by Dan Koboldt

The human immune system is an incredible success story of evolution. It defends against a constant barrage of external threats – bacteria, viruses, and other pathogens – and, as I’ve recently learned, protects against an intrinsic threat: cancerous cells. In their review “Natural and Adaptive Immunity to Cancer“, Vesely and colleagues draw from recent mouse models of cancer and human clinical data to describe how cells, effector molecules, and pathways of the immune system act to suppress and control tumor cells. It’s not all good news, however. Apparently, certain immune system pathways (e.g. inflammation) instead serve to promote tumor growth.

The Immune System Strikes: Senescence and Apoptosis

Cells already have an array of intrinsic defense mechanisms that halt the transformation process. Numerous cellular proteins detect DNA damage and induce senescence, a permanent change of state characterized by morphological and gene expression changes. The activation of oncogenes, too, can trigger senescence. In fact, the hijacking of Ras signaling to escape senescence and proliferate is a key requirement for cell transformation. Alternatively, cells that sense injury or loss of mitochondrial integrity may undergo programmed cell death (apoptosis). This process may also be initiated externally by the ligation of tumor necrosis factor (TNF) family ligands to their corresponding receptors: TNF, TNF-related apoptosis-inducing ligand (TRAIL), and Fas ligand (FasL). There are still other, non-apoptotic paths to cell death (necrosis, autophagy, mitotic catastrophe) that are gaining attention as barriers to transformation.

How the Immune System Prevents Cancer

The immune system has three key responsibilities when it comes to preventing cancer:

Suppression of viral infections, which when unchecked can induce certain kinds of tumors
Timely elimination of pathogens, to reduce the extent and duration of inflammation, which often promotes tumorigenesis
Immunosurveillance, in which transformed cells are identified and destroyed before they can establish malignancy.

The idea that the immune system might recognize and destroy tumor cells was conceived 50-100 years ago. This concept of “immunosurveillance” remained controversial, and saw little progress until the 1990’s. Does this story sound familiar? It’s much like the story of cancer and the metabolism, which also saw a long period of general ignorance before its “rediscovery” in the 1990’s. Mice get the credit for rekindling interest in the immune system’s tumor suppressor potential. Specifically, mice that were immunocompromised after loss of interferon (IFN) signaling or T-cell function. Such animals were significantly more susceptible to sarcomas after exposure to methylcholanthrene (MCA), implicating a role for the immune system in preventing these tumors in healthy mice.

Over the last 10 years, work from many labs (including the authors’) has demonstrated how the immune system works to prevent outgrowth of many types of primary and transplanted tumors. The RAG2-knockout mouse, which is deficient in T-cells, B-cells, and natural killer (NK) cells, develops more spontaneous cancer lesions and is also more susceptible to MCA-induced sarcoma. Interestingly, a significant portion (40%) of tumors that develop in RAG2-knockout mice are rejected when transplanted to immunocompetent (wild-type) mice, demonstrating that normal immune system function successfully suppresses these cells. Sarcomas induced in wild-type mice (with MCA), however, grow unrestricted when transplanted to other mice. These observations suggest a dual role for the immune system: in wild-type mice, it protects against tumor development, but also edits the immunogenicity of developing tumors, allowing them to grow unimpeded when transplanted to healthy mice.

The Three E’s: Elimination, Equilibrium, and Escape

The authors have come to view immunoediting as a dynamic process with three distinct phases:

Credit: Strausberg, Genome Biol. (2005) 6:211

Elimination, when innate and adaptive immune cells work together to identify and destroy tumor cells before a malignancy can form.
Equilibrium, a phase when the immune system contains tumor outgrowth but does not eliminate transformed cells entirely.
Escape, in which tumor cells grow unrestricted by the immune system, and develop into clinically apparent disease.

Both elimination and equilibrium might be considered satisfactory clinical endpoints for a patient, because tumor cells are either destroyed entirely or held in check to prevent outgrowth of disease.

The transition from equilibrium to escape is facilitated, at least in part, by the micro-evolution of the tumor cells during equilibrium. The selective pressure of immune recognition and destruction selects for tumor cells that are less immunogenic. Also aiding tumor escape is the breakdown of the immune system, either naturally (as a person ages) or as a direct result of immunosuppression (often induced by the tumor).

The Mouse Evidence: Knockout and Induced Tumors

Humans and mice have similar immune systems, with a largely overlapping repertoire of immune cells and effector molecules. The development of mouse strains deficient for specific genes, and the induction of tumors by carcinogens MCA (sarcoma) and DMBA/TPA (papilloma) have demonstrated that NK cells and cytotoxic lymphocytes (CTLs) suppress tumor initiation and growth in vivo. Interferon signaling also plays a key role in immunosurveillance, as demonstrated by the increased tumor susceptibility in mice lacking perforin, IFN-γ, IFNGR1, TRAIL, IL-12, TNF-α, and DNAM-1.

Numerous cytokine molecules and receptors have also been implicated in controlling induced tumors. Mice deficient in IL-12, for example, develop increased numbers of papillomas than wild-type mice. Interestingly, mice lacking IL-23 or IL-17A are resistant to tumor development, suggesting a tumor-promoting role for these cytokines. Interestingly, DMBA/TPA exposure in mice lacking the TRAIL receptor did not affect the number of induced tumors, but did increase the rate of metastasis to lymph nodes (compared to wild-type mice), indicating a role for TRAIL-R in suppressing metastasis.

Aging Studies and Spontaneous Tumor Development

The incidence of spontaneous tumors in normal mice is very low, possibly because they have long telomeres. Many strains of immunodeficient strains fail to develop tumors even after two years of observation. Aging studies in knockout mice, however, have elucidated the roles of certain genes, effector molecules, and immune cells in the defense against spontaneous tumors. This is an elegant type of experiment that requires some patience; one simply removes specific components of the murine immune system and monitors them for spontaneous tumor development. One striking discovery highlighted in this review was the incidence of immunogenic B-cell lymphomas, which increases from 0-6% in wild-type mice to 40-60% in mice lacking perforin, a cytolytic protein used by NK cells and T-lymphocytes. Penetrance of lymphomas in these mice is even higher when they also lack MHC class I accessory molecules (B2M) or IFN-γ. These observations support the importance of “cytotoxic” immune cells in protecting against spontaneous tumors.

Aging experiments have also been performed in mice lacking specific immune cell types. RAG-2 knockout mice, for example, develop significantly more ephithelial tumors (35% gastrointestinal, 15% lung), even when raised on broad-spectrum antibiotics in a pathogen-free facility. RAG-2 knockouts that also lack STAT1, a key player in interferon I/II signaling, develop an earlier and broader spectrum of malignancy, including colon and mammary adenocarcinomas.

Loss of Equilibrium

The equilibrium phase, in which the immune system holds tumors in check but fails to eliminate them entirely, is an interesting phenomenon. Here we observe a dynamic balance between a powerful immune system response and a genetically heterogeneous population of tumor cells that can persist for a number of years. It has become clear that adaptive immunity, and not innate immunity, takes the lead in controlling tumor outgrowth. This has been demonstrated by experiments in which healthy mice are subjected to low levels of carcinogen exposure (which tends to induce few tumors) and later depleted for CD4+/CD8+ T-cells and/or IFN signaling. As many as 50% of apparently tumor-free mice develop sarcomas at the injection site upon this depletion, suggesting that micro-tumors were present but held in check by adaptive immunity. Granted, the tumors that arise after immunodepletion tend to be highly immunogenic; when transplanted to healthy mice, 40% are rejected by the competent immune response. In contrast, sarcomas obtained from mice that were not immunodepleted tend to grow progressively when transplanted.

The Human Evidence: Immunodeficency and Immunosuppression

Although we have fewer experimental liberties with human subjects, clinical and epidemiological data have proven useful. Human patients with specific perforin mutations, for example, not only develop familial hemophagocytic lymphohistocytosis as adults, but have recently been shown to also develop leukemia and lymphoma. Surveillance of human patients with AIDS has shown an increased frequency of several malignancies due to the immunodeficiency. Most often, these tumors are induced by pathogens, such as Epstein-Barr virus (lymphoma), herpesviruses (Kaposi’s sarcoma), and human papilloma virus (cervical cancer) that fail to be eliminated by the deficient immune system.

Intentional immunosuppression in the recipients of organ transplants can also increase the risk of cancer. Patients receiving kidney transplants, for example, exhibit a three-fold increase in overall malignancy. Most of these, too, are virus-associated tumors, though there’s also an increased risk for colon, lung, pancreas, and other non-infectious cancers. Renal transplant patients are a dramatic example; these individuals have a 200-fold (yes, two hundred) risk for non-melanoma skin cancers, highlighting the importance of immunosurveillance in tumors induced by exposure to UV radiation. Further, the duration of pharmacology-induced imunosuppression and incidence of cancer are positively correlated; that is, the longer the immune system is suppressed, the more likely a tumor will form. Taken together, these observations support the importance of immunosurveillance in preventing human cancers.

Further evidence of the immunity-cancer relationship, particularly the equilibrium phase, is offered by the occasional organ recipients who develop cancer that originated from the organ donor. I’m horrified to hear that this can happen, but it does. Often, the donors had died of other causes and bore no signs of clinically-detectable disease, suggesting that their immune systems had held cancerous cells in check. The combination of a naive immune system, and immunosuppressive therapies required for successful engraftment, allows these tumors to grow without restriction in the unfortunate recipient.

Miracles Happen: Spontaneous Tumor Regression

Perhaps the most compelling evidence for the anti-cancer role of the immune system is the spontaneous regression of melanoma tumors accompanied by T-cell clonal expansion. This phenomenon suggests the ability of CD4+ and CD8+ T-cells to identify tumor-specific antigens and destroy cancerous cells. As many as 100 tumor-associated antigens (TAAs) generate an antibody response in patient serum, though only 8 have been observed in multiple studies. This suggests that TAAs, much like somatic mutations, are largely unique to individual tumors. T-cell responses vary from antigen to antigen; for example, responses to MAGE family antigens are rare, whereas responses to melanocyte differentiation antigen (MART/Melan-A) are seen in >50% of healthy individuals.

More studies are needed here to catalogue TAAs and quantify their antigenicity across patient populations. Here, too, is where high-throughput sequencing of tumor genomes might offer useful information as well. Knowledge of the full set of protein-coding mutations in a tumor might shed light on its immunogenic potential, or vice-versa, thereby leading to better informed prognoses and treatment decisions.

Tumor-Infiltrating Lymphocytes and Disease Prognosis

Even without complete tumor regression, the presence and quality of tumor-infiltrating lymphocytes (TILs) – NK cells, T-cells, and NKT cells – has a favorable prognosis for numerous tumor types. This correlation was first observed in melanoma, where patients with high CTL infiltration of their tumors survived longer. A “landmark” study in ovarian cancer found that 38% patients with high TIL numbers survived longer than 5 years, compared to 4.5% of patients with low TIL numbers. Studies in colon and lung cancers have found that the type and density of TILs was more powerful prognostic indicator than the clinical stage of the tumor.

There is, of course, a downside to TILs: when they’re macrophages or regulatory T cells. High numbers of these can have a poorer prognosis, possibly due to their immuno-suppressive functions.

Inflammation and Tumor Development

Chronic inflammation can contribute to cancer by inducing genotoxic stress, cell proliferation, angiogenesis, and even enhancing tissue invasion. Even so, the tumor-promotion activities of inflammation and tumor-suppressing actions of the immune system are not mutually exclusive. In the authors’ mouse model of MCA sarcoma, for example, tumor development requires several inflammation molecules (MyD88, IL-10, IL1B,and IL-23), but these factors induce the host-protective immune response (IFN and T-cells) that destroy the tumors. In other primary carcinogen models, MyD88 and IL1B promote tumor development, but also facilitate the recognition of dying tumor cells that leads to anti-tumor immunity.

Another important role of inflammation is the transition from equilibrium to escape, when inflammatory and regulatory immune cells are recruited to the tumor, and then subverted to dampen anti-tumor immunity, allowing cancer progression. Indeed, the authors suggest that pro-inflammatory transcription factors NF-KB and STAT3 may be valuable therapeutic targets, whose inhibition may facilitate the transition from tumor-promoting inflammation to tumor-suppressing immunity.

References
Vesely MD, Kershaw MH, Schreiber RD, & Smyth MJ (2010). Natural Innate and Adaptive Immunity to Cancer. Annual review of immunology PMID: 21219185

Driver Mutations and Metastasis

November 30, 2010 by Dan Koboldt

Two recent papers used very different appraoches to shed light on the genetic alterations underlying tumor growth and progression in human cancers. Peter Campbell and colleagues from the Wellcome Trust Sanger Institute employed Illumina paired-end sequencing to survey the landscape of structural variation in metastatic pancreatic cancer. Ivana Bozic and colleagues from Harvard University took a different approach – they constructed mathematical models of tumor progression via the accumulation of driver and passenger mutations. I happened to read both papers on a long airplane ride, and learned a great deal about mutations and metastasis in human cancers.

Pancreatic Cancer: Bad News

You learn a lot from the introduction sections of these papers, even if the Letter to Nature format keeps them short. I knew that pancreatic cancer had, in general, a poor prognosis. It turns out that the five year mortality for this cancer is 97-98%, usually due to “widespread metastatic disease.” These tumors also appear to carry a heavy mutational load. A 2008 survey of 24 pancreatic cancers (by Bert Vogelstein’s group at Johns Hopkins) found that tumors had ~63 genetic alterations on average, the majority of which were point mutations. Copy number changes are also common in this cancer type. Frequently mutated genes include tumor suppressors (TP53, SMAD4, CDKN2A) as well as oncogenes (KRAS, MYC). Less was known about the patterns of structural variation in pancreatic cancer.

Detecting Rearrangements by Paired-End Sequencing

Peter Campbell’s group has developed a very nice strategy for identifying somatically acquired rearrangments by massively parallel paired-end sequencing on the Illumina platform. They’ve already applied it to the characterization of SVs in several cancer cell lines. In this study, they generated 50-150 million read pairs (2 x 37 bp) per patient, which, in their experience, enables detection of 50-60% of rearrangements in a sample. Across the 13 pancreatic tumors, they identified 381 somatic and 177 germline rearrangements across seven categories: amplicon, deletion, tandem duplication, inversion, fold-back inversion, interchromosomal (translocation), and “other” intrachromosomal.

Many rearrangements corresponded with a change in copy number. In one metastasis, for example, numerous rearrangements (some inverted, some not) combine to amplify the KRAS oncogene.

Rearrangement/Amplification of KRAS (Credit: Nature).

Fold-back Inversions and Inter-Lesion Genetic Heterogeneity

One sixth of the rearrangements identified fell into a class the authors call “fold-back” inversions. These are genomic regions that are duplicated, but the two copies face in opposite directions from the breakpoint (as opposed to a tandem duplication). The authors suggest breakage-fusion-bridge cycles as the likely mechanism that creates such an event. Basically, a double-stranded break that occurs during G0-G1 phase is replicated (in S phase), creating two duplicated end sequences. These are fused together by DNA repair processes, resulting in a sort of inverted duplication (fold-back inversion) with two centromeres. These “dicentric” chromosomes are unstable, and frequently initiate the amplification of oncogenes.

Each rearrangement was [laboriously] genotyped by PCR in both the index tumor sample and matched normal control to verify the somatic status. Further, PCR and capillary sequencing were employed to resolve breakpoints, and some 206 rearrangements were genotyped across multiple lesions (metastases) in the 10 patients for which metastatic samples were available. There was a considerable amount of genetic heterogeneity among samples from the same patient. While the majority of rearrangements were present in all samples but not the germline (omnipresent); several were present in some samples but not others (partially shared) or unique to the index tumor sample (private).

Telomere Loss and Breakpoint-Fusion-Bridge Cycles

Fold-back inversions were significantly more likely than other classes of rearrangement to be omnipresent, suggesting that they occur early during tumor progression, before cancer cells disseminate. Because breakage-fusion-bridge cycles are often initiated by telomere loss, the activity of telomerase to maintain telomeres may play a pivotal role in the development of pancreatic cancer. Other studies have shown that telomerase expression is low in early tumor stages, but markedly increased in the invasive tumor. The increased expression likely suppresses breakage-fusion-bridge cycles, which may help explain why fold-back inversions are more likely to occur earlier in the development of the disease.

Ongoing Evolution in Tumors and Mets

In several patients, the authors found rearrangements that were in the primary tumor and some metastases, but not all of them. The most likely explanation for such a pattern is that the metastases were “seeded” by different cells from the primary tumor. This is intriguing, because it suggests ongoing clonal evolution, in the primary tumor, among cells capable of initiating metastases. There were also rearrangements in some metastases that weren’t detected in the primary tumor, suggesting that secondary lesions, too, are undergoing clonal evolution.

Overall, the authors demonstrated that pancreatic cancers and secondary invasions show a substantial amount of genetic heterogeneity within the same patient. There’s certainly more to be done to get the full picture of genetic alterations in these tumors, but at just ~4-10 Gbp of data per sample, the scope and nature of what the authors have uncovered is pretty impressive.

Drivers and Passengers

The other paper (contributed by Bert Vogelstein to PNAS) took a theoretical approach to modeling the accumulation of driver and passenger mutations during tumor progression. In contrast to previous models that account for only 1-2 mutations, the authors develop a model in which mutations occur sequentially in tumor cells, with each new driver mutation conferring a slightly faster growth rate. This more closely reflects recently-characterized solid tumors, which harbor 40-100 coding gene alterations, of which 5-15 are considered “driver” mutations.

Based on the assumption that any human cell contains 286 tumor suppressor genes and 91 oncogenes, the authors estimate that ~34,000 positions in the human genome could host a driver mutation. By this estimate, the driver mutation rate is approximately 3.4 x 10-5 per cell division. Under the authors’ assumption that each driver speeds tumor growth, the rate at which drivers accumulate becomes faster and faster, because the more drivers a cell has, the faster it divides. Not all mutations are successful, because they only reduce the probability that a cell will senesce or die (they don’t guarantee it). The authors considered a mutation in a tumor suppressor gene to be the central rate-limiting factor, since the other working copy tends to be lost relatively quickly due to large-scale LOH events.

Six simulated patients were modeled and presented in this study. All of them started with one driver mutation. Strikingly, though all of the input values (mutation rate, division rate) were the same, there was enormous variation in the rates of tumor progression between simulated patients. Patient 1, for example, went 20 years before acquiring a second driver mutation, and the size of the tumor remained small (<5 g). In contrast, patient 6 had a secondary driver mutation in less than 5 years; by the end of the simulation, that tumor weighed hundreds of grams. While this model is undoubtedly an oversimplification, it does highlight the importance of, well, random chance. Given the large size of the human genome and the relatively small number of potential driver mutations, an individual’s fate hinges on stochastic processes. If you’re lucky, you go decades without picking up that crucial second hit. If you’re unlucky, you don’t.

Intuitively, this seems reasonable, given the anecdotal evidence of de novo cancers, which seem to strike somewhat randomly. Of course, the older you are, the more times your cells divide, and the better chance you have of picking up additional driver mutations. And environmental exposures (like smoking and radiation exposure) certainly have a role to play, because they increase cellular mutation rates. Even so, if you believe in the model, chance plays a significant role.

Here’s to hoping you’re one of the lucky ones.

References

Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, Karchin R, Kinzler KW, Vogelstein B, & Nowak MA (2010). Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences of the United States of America, 107 (43), 18545-50 PMID: 20876136

Campbell PJ, Yachida S, Mudie LJ, Stephens PJ, Pleasance ED, Stebbings LA, Morsberger LA, Latimer C, McLaren S, Lin ML, McBride DJ, Varela I, Nik-Zainal SA, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Griffin CA, Burton J, Swerdlow H, Quail MA, Stratton MR, Iacobuzio-Donahue C, & Futreal PA (2010). The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature, 467 (7319), 1109-13 PMID: 20981101