RSS 2.0
  • Home
  • About
  • Aligners
  • Genomes
  • Subscribe
  • VarScan
  •  

    Major Exome Platforms Compared

    October 26th, 2011

    A few recent studies have sought to compare commercial exome sequencing technologies. These kits, which selectively target coding regions for next-generation sequencing, have matured rapidly over the past couple of years. I like the recent study out of Michael Snyder’s lab (Stanford)the best. In it, the authors compared three major exome platforms – Agilent’s SureSelect Human All Exon (50 Mbp), Roche/Nimblegen’s SeqCap EZ v2.0, and Illumina’s TruSeq Exome Enrichment - to each other and to whole-genome sequencing (35x), all for a single individual.

    From Figure 1: Major exome platforms (Clark et al, Nat. Biotech., 2011)

    Differences in Target Space

    First off, a comparison of the declared exome targets for each platform.

    Credit: Clark et al, Nat. Biotech. 2011

    A large number of bases  (29.45 Mbp), presumably the “meat” of the exome are targeted by all three platforms. Individually, the platforms have 4-28 Mbp of unique target space. Agilent does better for Ensembl transcripts; Nimblegen has better coverage of miRNAs. These two platforms share more target space with each other than either did with the Illumina platform. This is primarily because Illumina goes after untranslated regions (UTRs). I can’t decide if this is an advantage or not. On one hand, it certainly appeals to the investigator interested in variation in UTR regions. On the other, that’s a lot to sequence. Indeed, the authors note that 50 million 2×100 bp reads yield only 30x coverage on the Illumina platform, compared to 60x for Agilent and 68x for Nimblegen.

    Target Enrichment Efficiency and GC Content

    The authors performed exome capture and sequencing on a single sample – a healthy volunteer of European descent – using all three exome kits. Each exome library got one lane of 2×100 bp reads on the Illumina HiSeq 2000 (11 to 18 Gbp per library). BWA mapped 99% of these to the reference sequence, and some 10-15% were PCR duplicates. Overall targeting efficiency was measured using 80 million reads for each exome, and evaluating the fraction of bases covered at 10x, 20x, and 30x. The authors wrote “At all read counts and depth cut-offs, the Nimblegen platform enriched a higher percentage of its targeted bases than the other two platforms.” They attribute this efficiency to the higher-density, overlapping baits used by the Nimblegen platform.

    Unsurprisingly, all platforms demonstrated a marked reduction in coverage over high and low GC targets. At low GC (40% to 20%), however, the Agilent platform showed only a slight decrease in read depth, possibly due to fewer PCR cycles, longer baits, and/or the use of RNA probes that were unique to this platform.

    Detection of Single Nucleotide Variants (SNVs) and Small Indels

    Detection of small sequence variants, especially SNVs, is a major goal of exome sequencing. Using the normalized ~80m read sets, the authors performed SNV detection (using GATK) in each exome. All three platforms showed high concordance between SNV calls and high-density SNP array genotypes. The reference allele was slightly favored (0.53-0.55) at SNP positions, suggesting slight mapping bias against variant-containing reads. However, there were no biases toward or against specific substitution types. For all platforms, the SNV count increased as the coverage increased. This increase was not linear, however; at 30 million reads, over 95% of SNVs were detected. In shared regions, Nimblegen consistently captured the most SNVs and became saturated with the lowest number of reads.

    Nimblegen also detected the most indels in shared and RefSeq regions, owing to more efficient capture and thus deeper coverage. At low read counts, Agilent detected more indels in shared regions, but at 50 million reads, Illumina surpassed Agilent (and, unsurprisingly, detected many more UTR indels). Most indels were 1bp in size, though the authors saw slight enrichment of indels in the 4bp and 8bp bins (consistent with human-primate genome comparisons), as well as the multiple-of-three enrichment expected due to selection against frameshift mutations.

    Comparison with Whole-Genome Sequencing

    A key strength of this study was that the authors also performed whole-genome sequencing to 35x mean coverage on the sample that was evaluated. WGS data had 98.5% concordance at heterozygous SNP positions as detected by SNP array. To simulate the multiplexed sequencing of 3 or 6 exome libraries per lane (GAIIx or HiSeq, respectively), the authors normalized exome datasets to 50 million reads apiece. In each exome-WGS comparison, the WGS dataset was restricted to regions targeted by that exome product. This step seems necessary for an apples-to-apples comparison, but I should note that it minimizes the strength of WGS, which provides relatively unbiased coverage across all coding regions. In other words, this restriction slightly favors the exome dataset by examining only regions that its platform was willing, and able, to target.

    From Figure 6: Overlap of SNVs called by Exome and WGS (Clark et al, Nat. Biotech 2011)

    The vast majority of SNVs in exome space were detected by both exome and WGS data, but there were some differences. Notably, the exome-specific and WGS-specific calls in each comparison tended to have (1) lower confidence scores, (2) higher proportions of novel-to-dbSNP variants, and  (3) better coverage in the detection platform. WGS-specific SNVs often had zero reads in the exome data (probably hybridization failure). In contrast, most exome-specific SNVs had coverage in WGS, though it tended to be lower.

    It seems clear from this figure that the number of SNVs detected by exome and WGS is correlated to the “reach” of the exome platform. Illumina, which had the biggest target space and also went after UTRs, had the highest number of shared SNVs. Agilent had more than Nimblegen, but Nimblegen’s sensitivity for true positives in its target regions was much higher than that of the other two platforms.

    How to Choose an Exome

    The authors conclude that all three exome platforms are pretty good. Choosing among them probably depends on the goals, priorities, and budget of the investigator. For the cost-conscious, Nimblegen offers the most efficient enrichment of exons (and also of miRNAs). For the variant-hunters, Agilent provides a wider reach but requires a bit more sequence data. Illumina requires the most sequence data, but it alone surveys untranslated regions, which might appeal to some researchers.

    References

    Clark MJ, Chen R, Lam HY, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, & Snyder M (2011). Performance comparison of exome DNA sequencing technologies. Nature biotechnology, 29 (10), 908-14 PMID: 21947028

    AddThis Social Bookmark Button

    Whole-genome sequencing and clinical annotation

    September 16th, 2011

    Next-generation sequencing has immense transformative potential for medicine in the coming decade. Rapid, economical whole-genome sequencing can provide a wealth of information useful for diagnosis, treatment, and even prevention of disease. Very soon (if not already), generating whole-genome sequencing data will be routine. The challenges will lie in accurate variant calling, phasing, annotation, and clinical interpretation.

    A new study in PLoS Genetics reports the whole-genome sequencing and detailed genetic risk assessment of a family quartet with a history of familial thrombophilia. There’s a lot to like about this paper, but let me give you the highlights.

    • Construction of and alignment to an ethnicity-specific major allele reference sequence yielded improved alignment and more accurate genotyping, especially at disease-associated loci.
    • Mendelian inheritance state analysis in the family structure enabled identification and removal of >90% of variants arising from sequencing errors.
    • Per-trio phasing, inheritance state of adjacent variants, and population-level linkage disequilibrium data were integrated to provide long-range phased haplotypes.
    • By fine-mapping recombination events to sub-kilobase resolution, the authors were able to perform sequence-based human lymphocyte antigen (HLA) typing.
    • A curated database of genotype-phenotype correlations made it possible to construct comprehensive genetic risk profiles, including multigenic risk of inherited thrombophilia, common disease susceptibility, and pharmacogenomics.

    Advantages of an Ethnically-Concordant Reference Sequence

    The human reference sequence is a composite, assembled using pooled sequence data from about 20 individuals. Several groups have reported that the current reference harbors a number of biases – some alleles represented are the minority of those present in world populations, and insertions are better represented than deletions. Using SNP genotype data from the 1,000 genomes project (~6-10m loci), the authors of this study developed three ethnicity-specific reference sequences for the CEU (Western Europe), YRI (Sub-saharan Africa), and CEU/JPT (Han Chinese / Tokyo Japanese) populations. They did so by determining the major allele in each population, and swapping it in when the NCBI reference base differed. This resulted in ~1.6 million substitutions for each population reference:

    Credit: Dewey et al, PLoS Genetics 2011.

    There were almost 800,000 positions where the reference allele was not the major allele in all three populations. Thus, at roughly 10% of SNP positions examined, the NCBI reference sequence contained a minor allele relative to European, African, and Asian populations.

    Self-reported ethnicity of the parents in the quartet was northern/western European, a claim largely confirmed by PCA analysis. The authors therefore aligned all genomes to the CEU major allele reference, resulting in a small increase (0.1%) in the fraction of reads mapped by BWA. This seems like a small fraction, but it works out to around 6 million reads across the four samples. Presumably, more reads were mapped because the population-matched reference reduces allele-specific mapping bias (ASMB) against non-reference bases. Next, the authors compared variants to an internally-curated database of genotype-phenotype correlations, identifying 9,389 correlated variants in the family quartet. This number would have been 10,396 if the NCBI reference were used, indicating that 10% of disease-associated markers are in fact major population alleles less likely to contribute to inter-individual variation in disease susceptibility.

    The ethnicity-matched reference also enabled a more accurate estimation of population mutation rate (7.8 x 10-4). Using the NCBI reference, this rate was 9.2 x 10-4, indicating that a standard reference sequence yields inflated population mutation rates.

    Mendelian Inheritance and Long-Range Haplotyping

    Whole-genome sequencing of a “nuclear” family (mother, father, son, daughter) has a number of advantages:

    • It enables comprehensive Mendelian inheritance analysis, to facilitate the removal of false-positive variants, isolate putative de novo mutations, and even identify regions of structural variation based on blocks of Mendelian inconsistencies.
    • Meiotic crossover sites can be comprehensively surveyed, in this case to sub-kilobase resolution.
    • Trio information (each child compared to both parents) helps to phase the variants, in other words, to determine which variants are on the paternal chromosome, and which are on the maternal chromosome. This is especially useful for identifying compound heterozygotes for recessive traits.
    • Paired with population linkage information from the HapMap and 1,000 Genomes Project, this information can be used to infer long-range haplotypes. On chromosome 6, the authors used haplotype and population information to accurately determine HLA genotypes for every sample.

    The family information also made possible this fascinating mosaic of chromosomal inheritance:

    Credit: Dewey et al, PLoS Genetics 2011.

    There are obviously key benefits to having sequence data for everyone in the family. In the future, when clinical sequencing is commonplace, don’t forget to bring your parents along.

    Synonymous But Not the Same

    One downstream analysis that I particularly enjoyed was that of synonymous coding variants. These variants are often ignored in studies of human genetics, despite a growing body of evidence that they can have translational effects via codon usage bias, mRNA stability, and splice site alteration. The authors developed an algorithm to evaluate these effects for 186 rare, novel synonymous SNPs found in the family. One of these, in the gene ATP6V0A4, is predicted to significantly affect mRNA secondary structure by disrupting a stable “tetraloop” – likely reducing mRNA stability. This is relevant because homozygous loss-of-function variants in this gene have been associated with distal renal tubular acidosis (a disease in which the kidneys don’t remove enough acid into the urine).

    Clinical Annotation and Interpretation

    The authors build on their previous work to comprehensively annotate clinically-relevant variants in all family members. There’s an extensive amount of work done here, much of it hinging on the authors’ internally-developed, hand-curated database of 16,400 SNPs associated with disease traits. An analysis of rare variants bolstered with evolutionary conservation data highlighted variants in two genes related to thrombophilia: one in the F5 gene, encoding Leiden factor V, with increased risk for thrombophilia, and another in the MTHFR gene (love that gene symbol), which predisposes carriers to hyperhomocysteinemia.

    Looking ahead to the probable treatment of family members with blood-thinning medication, the authors next undertook a pharmacogenetic analysis. Perhaps the best-known example of pharmacogenetics is warfarin (coumadin), an oral anticoagulant given to patients at risk for stroke or deep vein thrombosis (DVT). Warfarin was the fifth-most prescribed drug in the U.S. the last time I checked, but it has a narrow therapeutic window. Too little, and it has no anticoagulant effect. Too much, and it can cause internal bleeding. Variants in a number of genes have been associated with warfarin dosing, but two are predominant: CYP2C9, the primary metabolizing enzyme for the drug, and VKORC1, the drug target. In this family, all four members were homozygous for the CYP2C9*1 allele, associated with normal dose, but heterozygous for VKORC1-1639, associated with “therapeutic prolongation” of warfarin response at low doses. Based on these genotypes and patient clinical data, the authors applied the International Warfarin Dosing Algorithm to determine the appropriate dose.

    All told, this is an interesting study that clearly involved a substantial amount of work (the pre-print PDF totaled more than 100 pages). Undoubtedly, many of the strategies presented here will be useful as whole-genome sequencing moves into the clinic.

    References

    Frederick E. Dewey, Rong Chen, Sergio P. Cordero, Kelly E. Ormond, Colleen Caleshu, Konrad J. Karczewski, Michelle Whirl-Carrillo, Matthew T. Wheeler, Joel T. Dudley, Jake K. Byrnes, Omar E. Cornejo, Joshua W. Knowles, Mark Woon, Katrin Sangkuhl, Li Gong,, Madeleine P. Ball, Alexander W. Zaranek, Heidi L. Rehm, George M. Church, John S. West, Carlos D. Bustamante, Michael Snyder, Russ B. Altman, Teri E. Klein, Atul J. Butte, & Euan A. Ashley (2011). Phased whole genome genetic risk in a family quartet using a major allele reference sequence PLoS Genetics, 7 (9)

    AddThis Social Bookmark Button

    A Guide for Deep Sequencing of Human Genomes

    August 26th, 2011

    The incredible throughput of current second-generation sequencing platforms makes it possible to sequence a complete human genome to high coverage, with a single instrument run, in less than 2 weeks. As whole-genome sequencing becomes more routine, it is increasingly important to understand the accuracy of sequence-level analyses, such as SNP detection, and its relationship to overall sequence depth. Enter a recent study from the lab of Elliott Margulies at NHGRI. As part of the NIH Undiagnosed Diseases Program, the authors generated over 380 gigabases of sequence data from the blood sample of a male patient. This is an astonishing amount of sequence for one sample, roughly 126-fold theoretical redundancy genome-wide.

    Perhaps just as importantly, the dataset comprised four runs on two different but related platforms: the Illumina GAIIx, and the Illumina HiSeq2000. Here is a brief summary of the dataset.

    Dataset Total Gbp Map Rate Dup. Rate Mapped Depth % Genome Callable
    GAIIx (14 lanes) 118 95.3% 3.9% 34.2x 88.82%
    HiSeq A (8 lanes) 122 94.0% 13.7% 32.7x 90.99%
    HiSeq B (8 lanes) 144 92.6% 8.7% 40.4x 93.10%
    All (30 lanes) 384 93.9% 13.6% 102x 95.88%

    With this impressive dataset in hand, the authors undertook a detailed examination of the technical aspects of sequence analysis: coverage uniformity, platform comparisons, genotyping accuracy, etc. and seek to answer two questions:

    1. Given a specific amount of sequencing data, what fraction of the genome is “callable”?
    2. How many SNVs can be accurately identified?

    The results, I think, are critically important in the near future as whole-genome sequencing becomes routine and widely accessible to investigators.

    Coverage Versus Callability

    The authors correctly note that while many studies report “coverage” of genomes or exomes in terms of minimum depth achieved (1x, 5x, 10x, etc.), this metric alone is insufficient. Namely, it may not include information about alignment and quality filters, as well as the requirements of genotype calling algorithms. A better approach might be to report the fraction of the genome/exome that is “callable” -  where genotypes can be determined with at a specified confidence threshold when all filters are applied. This term is roughly equivalent to what the 1000 Genomes Projects calls the “accessible” portion of the genome. In this study, the authors calculate callability by:

    1. Starting with reads that pass the Illumina chastity filter
    2. Further removing reads with <32 Q20 bases
    3. Mapping reads to the reference sequence using BWA
    4. Removing duplicates (using SAMtools rmdup)
    5. Considering only bases with quality >= 20.
    6. Requiring a genotype probability score of 10.

    The last metric refers to the score from the group’s Bayesian genotype calling algorithm, Most Probable Genotype (MPG). An MPG score of 10 is a log-scaled value indicating a 1/e^10 (that’s 1/22026) theoretical probability of being incorrect. By these criteria, 88.82% of the genome was callable in the GAIIx dataset (34.2x mapped depth) and 90.99% was callable in the HiSeq-A dataset (32.7x).

    You may notice that the GAIIx platform had more mapped bases but yielded a lower callability than HiSeq-A, and wonder, how could this be? It has long been observed that coverage is non-uniform across the genome and follows a Poisson distribution, influenced by factors such as read length, region mappability, and GC content. Although the amount of sequence data was similar, HiSeq platforms achieved a more uniform coverage than GAIIx, yielding more callable bases genome-wide.

    GAIIx vs HiSeq Coverage of the Genome and Exome

    To enable some direct comparisons, the authors normalized the HiSeq2000 data into a set of equivalent size to the GAIIx datset (34.2x average mapped depth), then assessed coverage of the genome as well as the exome (here defined as ~34 Mbp of non-redundant coding sequence from the UCSC Known Genes). Here’s a plot of the Q20 coverage for GAIIx and HiSeq values from Supp. Table 1.

    On both platforms, around 97% of the genome was covered by at least one read. At 10x coverage, however, GAIIx covers 89.4% of the genome whereas HiSeq covers 92.2%. These differences were even more pronounced in the exome, where GAIIx and HiSeq covered 67.4% and 76.2% of the exome at 10x, respectively. Since both platforms performed unbiased whole-genome sequencing, the authors conclude that HiSeq’s superior coverage comes from a better representation of high-GC-content sequences, which tend to have higher gene density.

    Filters for Accurate Genotype Calling

    The authors next undertook a careful experiment to establish appropriate filters for SNV calling genome-wide. Pooling all Illumina data together, they generated two equal-sized datasets with an average mapped coverage of 50x by random read sampling. Next, they compared genotype calls at all bases that were “callable” with MPG score >=10. Among the 2.8 billion positions (98.3% of the genome) that met these criteria in both datasets, there were 46,580 discordant genotypes. Many of these, unsurprisingly, arose from sequence reads that were improperly aligned (misplaced, or locally mis-aligned). To address this, the authors removed reads with mapping quality <30 from both datasets. This mapping quality filter reduced the comparison set to 93.6% of the genome, but removed 81% of discordant calls.

    Among the 8,710 remaining discordant positions, the authors observed consistently lower MPG scores than were seen among concordant positions, particularly at high coverage sites. They made perhaps one of the most useful inferences of this study: that genotype accuracy can be improved by requiring higher probability scores at higher sequence depths. Basically, they required that, for a given position, the ratio of MPG score to Q20 coverage be at least 0.5. The confidence-by-depth filter removed 61.5% of discordant positions but reduced callability by just 0.02%.

    Finally, the authors employed the widely used strategy of removing SNV calls within 10 bp of called indels. This indel-nearby filter removed 26% of the remaining discordant positions, while reducing callability by 0.43%. Thus, by applying three filters aimed at reducing false positives, the authors removed 96.4% of discordant positions and maintained callability across 93.13% of the genome.

    How Many Variants Can Be Detected?

    The next experiment was quite interesting: the authors pooled all Illumina data, and progressively added reads to create datasets of 5x, 10x, 15x mapped coverage, all the way up to 100x. In each dataset, they applied their variant calling with all filters, then reported the number of SNVs that were identified. I’ve generated a plot of the number of SNVs called genome-wide by dataset:

    At 30x, which might be considered a de facto standard, around 3 million variants were identified. Each new depth adds perhaps 10,000 variants, but at 50x the discovery power is nearly saturated (3.32 million, or 95% of the total). Very little is gained going from 50x to 105x, although, if the relationship between genes, GC content, and callability holds true, many of these could be coding variants. In summary, deep resequencing of a sample to 105-fold coverage tells us that a typical human genome contains around 3.5 million SNPs. That’s very close to estimates from the personal genomes that have already been published (~3.1 m to 4.1 m SNPs), which I find reassuring. It would be informative to see a similar experiment on a sample of African origin, where the number might be closer to 4.5 million.

    The Sweet Spot of Coverage and Callability

    Based on these experiments and their callability calculations, the authors estimate that generating 50x mapped coverage (60x before read mapping/filtering are applied) renders ~95% of the genome and ~81% of the exome callable. Intriguingly, however, the authors note that they’d sequenced an unrelated sample using the latest HiSeq chemistry and basecalling software, achieving the same level of callability with just 35x mapped coverage. If anything, this emphasizes that (as the authors suggest), a “callability” metric is far more informative to report when describing the resequencing of human genomes.

     

    References
    Ajay SS, Parker SC, Ozel Abaan H, Fuentes Fajardo KV, & Margulies EH (2011). Accurate and comprehensive sequencing of personal genomes. Genome research PMID: 21771779

    AddThis Social Bookmark Button

    NOTCH tumor suppression in HNSCC

    August 9th, 2011

    More than half a million new cases of head and neck squamous cell carcinoma (HNSCC) will occur in 2011, making it the 6th most common malignancy in the world. Two studies online at the journal Science survey the mutational landscape of this deadly cancer, which has a mortality rate of ~50%. They report frequent mutation of the NOTCH1 gene in HNSCC (11-15% of cases), and the patterns of these mutations suggest a tumor suppressive role. This observation is in stark contrast with many solid tumors and hematopoietic malignancies where Notch signaling is thought to play an oncogenic role. Moreover, it carries worrisome implications for Notch1 inhibitors, some of which have recently entered clinical trials.

    HNSCC Pathology

    Around 50,000 cases of HNSCC are diagnosed each year in the United States. The genomes of HNSCC bear many chromosomal aberrations, including amplifications targeting the CCND1 gene on chr11q13, and the epidermal growth factor receptor gene (EGFR) on chr7p11. Many of these tumors also exhibit genetic or epigenetic alterations of TP53 and CDKN2A, two well-known tumor suppressor genes. Tobacco and alcohol exposure are risk factors. More recently, HPV infection has emerged as a risk factor as well. Patients with HPV-associated tumors have a better chance of survival, indicating distinct biological features for this form of the disease.

    Exome Sequencing of HNSCC

    Stransky et al performed solution-phase capture and Illumina sequencing of 74 tumor-normal pairs, achieving 150-fold average depth of target regions and covering 87% of bases with at least 20 reads. They also performed SNP array-based copy number analyses. Common CCND1 amplifications, CDKN2A deletions, and rarer amplifications of MYC, EGFR, ERBB2, and CCNE1 suggested that their tumor set was genetically representative of HNSCC. Using exome data, the authors predicted ~130 coding mutations per tumor, of which 25% were synonymous changes.

    Agrawal et al examined 32 tumor-normal exome pairs: 17 by Illumina sequencing, and 15 by SOLiD sequencing. They achieved 77-fold (Illumina) and 44-fold (SOLiD) average depth of target regions, with 90-92.6% of target bases covered by at least 10 reads. Most of the tumors (30 of 32) came from pre-treatment patients, and all were selected for >60% tumor cellularity. This latter selection was an important one, as the Stransky et al study had sequenced but not reported 18 additional tumor-normal pairs due to extensive stromal admixture.

    HPV, TP53, and Tobacco Exposure

    Taken together, both studies found that HPV-associated HNSCC truly represents a distinct disease at the molecular level, with a lower mutation rate (2.28 per megabase compared to 4.83 per megabase) than HPV-negative tumors. Further, none of the HPV-associated tumors carried TP53 mutations, whereas the gene was mutated in 62-78% of HPV-negative tumors. Only 18% of HPV-negative tumors had mutations in bona fide oncogenes, which is not good news for the prospect of targeted therapies. Mutation rates in smokers were higher than those of non-smokers. While Agrawal et al reported no evidence of tobacco exposure in their study, this might have been due to the limited sample size (32 tumors), because Stransky et al (with 74 tumors) observed an excess of G to T transversions at non-CpG islands, consistent with carcionogen-induced mutations.

    Inactivating Mutations in Notch

    Both studies made an interesting observation: an excess of mutations in the NOTCH1 gene, many of which were protein-truncating alterations. Further, several tumors had lost both copies of NOTCH1, either by mutation or large-scale deletion. These observations suggest that NOTCH1 is being inactivated in HNSCC. In contrast, the Notch signaling pathway is up-regulated in numerous human cancers, particularly hematopoietic tumors. The newly-established tumor suppressive role of NOTCH has important implications for cancer therapies, as several NOTCH pathway inhibitors have entered clinical trials. One of these trials was recently halted, partly because of treatment-associated skin cancers.

    A number of other genes related to squamous cell differentiation proved to be mutated at significant frequencies, including NOTCH2, IRF6, TP64, RIPK4, CDH1, EZH2, Dicer1, and MLL2. Other mutations affected genes involved in calcium-sensing (RIMS2 and PLCO) or apoptosis (CASP8, DDX3X). These gene sets, and the NOTCH-related genes in particular, suggest an important role for normal squamous cell developmental pathways in the formation of squamous cell carcinoma.

    References
    Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, Kryukov GV, Lawrence M, Sougnez C, McKenna A, Shefler E, Ramos AH, Stojanov P, Carter SL, Voet D, Cortés ML, Auclair D, Berger MF, Saksena G, Guiducci C, Onofrio R, Parkin M, Romkes M, Weissfeld JL, Seethala RR, Wang L, Rangel-Escareño C, Fernandez-Lopez JC, Hidalgo-Miranda A, Melendez-Zajgla J, Winckler W, Ardlie K, Gabriel SB, Meyerson M, Lander ES, Getz G, Golub TR, Garraway LA, & Grandis JR (2011). The Mutational Landscape of Head and Neck Squamous Cell Carcinoma. Science (New York, N.Y.) PMID: 21798893

    Agrawal N, Frederick MJ, Pickering CR, Bettegowda C, Chang K, Li RJ, Fakhry C, Xie TX, Zhang J, Wang J, Zhang N, El-Naggar AK, Jasser SA, Weinstein JN, Treviño L, Drummond JA, Muzny DM, Wu Y, Wood LD, Hruban RH, Westra WH, Koch WM, Califano JA, Gibbs RA, Sidransky D, Vogelstein B, Velculescu VE, Papadopoulos N, Wheeler DA, Kinzler KW, & Myers JN (2011). Exome Sequencing of Head and Neck Squamous Cell Carcinoma Reveals Inactivating Mutations in NOTCH1. Science (New York, N.Y.) PMID: 21798897

    AddThis Social Bookmark Button