Archives for April 2008

Lung Cancer: The Big Picture

April 10, 2008 by dkoboldt

Yesterday was our GSC-wide lab meeting, a quarterly event that crams 400+ people into an undersized auditorium. The guest speaker was Ramaswamy Govindan (MD), a medical oncologist from the Siteman Cancer Center. He gave a great 20-minute talk about lung cancer. He could easily have spoken to us for two hours on this topic, but alas, time is short. One topic he discussed that’s very germane to our work is the EGFR cancer pathway, which may account for 10% or more of lung cancers. Interestingly Asians, women, and NON-smokers are far more likely to have EGFR mutations (there was a fourth risk factor that I didn’t have time to write down).

From what I understand, EGFR encodes an epidermal growth factor receptor and as I understand it is expressed in normal tissues only during development. There is a drug called gefitinib that inhibits EGF receptors – a simple tablet that’s taken orally. Dr. Govindan showed some superior-view images of the chest cavity of a lung cancer patient before and after gefitinib therapy. The difference was amazing. Before treatment one lung was completely cancerous, and 2 years later (after treatment) the cancer was totally gone. It was a compelling example of what the future of cancer therapy might look like.

Speaking of which, the speaker went on to talk a bit about pharmacogenetics – the study of how genetics affect differential response to treatment among individuals. Evidently, classifying lung cancer patients by EGFR mutation status is extremely effective in predicting the outcome of chemotherapy. Patients with EGFR mutations respond well, while other patients see little or no benefit. Worse, some patients might have a toxic response to the drug. The ability to identify responders, non-responders, and toxic-responders by genotyping or gene expression profiling is perhaps one of the most important goals of cancer genetics.

Still Waiting for that ABI SOLiD Genome

April 8, 2008 by dkoboldt

One of the big announcements at this year’s AGBT was ABI’s sequencing of a complete human genome using the SOLiD system. It wasn’t just any genome, either – it was the genome of an African male of the Yoruba tribe in Nigeria (one of the HapMap samples). Perhaps I should be unsurprised that the press releases flew months ago but we’ve yet to see the peer-reviewed publication. Yet I’m eager to read the results of their project, as it will be the first complete genome sequencing of an individual from the African continent. Many studies have seen higher incidence and allele frequencies of SNPs in African samples, consistent with population bottlenecks during out-of-Africa expansions. In fact, a recent genome-wide survey of genetic variation in 51 populations showed that humans formed a chain of colonies as they migrated out of Africa some 10,000 years ago. That article’s a very interesting read.

But back to ABI. Perusing the SOLiD web site, I did find a poster on the genome-wide variation detected from their not-yet-completed SOLiD sequencing. From it I took these key pieces of information. They sequenced both fragment and mate-pair libraries to a coverage of about 4.9X. The mate-pair libraries allowed them to detect ~22,000 insertions and ~45,000 deletions, nearly all of which were heterozygous. At ~4X coverage on chromosome 7, some 75% of the SNPs detected were already in dbSNP. In the ENCODE regions (which have been extensively characterized), 91% of the SNPs detected were in dbSNP. To me, the fraction of novel SNPs seems low, but if it remains constant, this study will almost certainly add more SNPs to public databases than the Watson and Venter efforts.

Helicos Resequences M13 Virus Genome

April 7, 2008 by dkoboldt

The April 4th issue of Science had an article by Helicos BioSciences in which they described the single-molecule DNA sequencing of a viral genome. I knew about Helicos because they came and gave a talk to our Genetics department describing their planned strategy to develop a method for single-molecule sequencing. As I recall, the talk was entirely theoretical as they didn’t have much experimental data to show. Clearly things have gone well for Helicos, since their article convincingly demonstrates the potential of single-molecule sequencing for high-throughput, low-cost sequencing.

Introduction: The Problems with PCR

Why bother with single molecule sequencing? The introduction briefly discussed three problems associated with PCR-based sequencing.

Bias in template representation. Due to thermodynamics and other factors I don’t well understand, PCR efficiency is directly affected by characteristics of the template. Shorter products, for example, are more efficient to amplify than longer products.
Library preparation complications. PCR-based sequencing methods require a lot of templates, and preparation of the libraries can be “onerous and expensive in terms of DNA manipulation,” according to the article. I don’t do library prep myself, but this sounds reasonable.
Error incorporation. Here is something that I do know about. Any time you use PCR, there’s a chance that mis-incorporation at an early cycle will introduce (and then amplify) errors in the sequence. We’ve seen some problems with 454 and Solexa sequencing that may be attributed to this. The idea of taking PCR-induced errors out of sequence reads appeals to me very much.

Results: Sequencing-by-synthesis of the M13 Viral Genome

The authors report sequencing the ~7 kbp M13 genome with 100% coverage and at an average depth of 150X. The read lengths averaged 23-27 bp, depending on the run and some post-processing; the authors claim to have performed runs with average read lengths of over 30 bp. According to alignment statistics in Table 1, there were 32,473 forward-orientation reads (relative to the reference) for an average coverage of 96X, and 34,109 reverse-orientation readds for an average coverage of 105X. Coverage in both orientations becomes important during their mutation-detection simulations.

Simulations of Mutation Detection

Because they sequenced the canonical strain of M13, there should be no sequence polymorphisms. So, to test the ability of this sequencing method to pick up mutations, the authors created “synthetic mutations” in the reference sequence and re-performed alignments. The synthetically-introduced mutations are picked up with an average sensitivity of ~98%. To me, this was the weaker part of the paper – mutations created in silico won’t accurately represent real variation, but at least it let the authors discuss analysis and refinement steps that led to improved mutation detection.

Discussion: Caveats and Future Directions

I don’t think Helicos is yet a threat to established next-generation platforms like Roche/454 and Illumina/Solexa. At 25 bp, the reads are too short to be useful in eukaryotes. Like 454, the Helicos platform has some difficulties with homopolymers , especially runs of cytosine residues. The authors readily admit that “large genomes, heterogeneous samples, and genomic structural variations will likely require longer reads, reduced homopolymer run through, and enhanced alignment tools.”

Yet this publication is an important proof-of-principle for the Helicos method. As far as single-molecule DNA sequencing goes, it looks like Helicos Biosciences is the one to beat.

Genome-Wide Association Failures

April 1, 2008 by dkoboldt

There was an interesting post over at GeneticFuture on why genome-wide association studies fail. It’s a good discussion of the many challenges that still face GWAS even in the era of high-throughput SNP genotyping.

It should be noted that there have been many successful genome-wide association studies, especially since the completion of the International HapMap Project (phases I/II). Last year saw high-profile publication of GWAS’s for coronary heart disease, breast cancer, celiac disease, type I diabetes and Crohn’s disease , just to name a few. deCODE Genetics performed a large-scale study on the genetics underlying exfoliation glaucoma, and found that individuals with two particular SNPs in the first exon of LOXL1 had a 100X greater chance of getting the disease.

Last June the Wellcome Trust Case Control Consortium published the largest study ever of genetics behind common diseases. In a massive cohort of 17,000 samples, the researchers performed GWAS’s for diabetes, rheumatoid arthritis, cardiac disease, and other common, complex phenotypes. Perhaps the most exciting result of this study was the association of several genes that had never before been implicated in human disease.

Yet, as the GeneticFuture post pointed out, we rarely hear about the failure of genome-wide association studies to turn up such interesting discoveries. The complexities of small allelic effects, population structure, rare variants, and copy number variation may explain how such failures manifest in the realm of genetics. As for epigenetic factors and disease heterogeneity, well, these issues are out of our hands for the time being.

As far as SNPs go, I believe we’re getting very close to a complete catalog of variation that’s common in human populations. Genome-wide sequencing of two individual human genomes each found ~600,000 SNPs that are not already in dbSNP. At best they’d increase the number of known SNPs by ~10%. At ~10-11 million SNPs, dbSNP is mostly complete in my opinion. We still have a long way to go, though, in cataloging copy number variation.

Another challenge not mentioned in FutureMedicine, but nevertheless important, is the fact that a substantial fraction of the genetic variation underlying complex disease occurs outside the coding regions of known genes. It’s time to look beyond nonsynonymous coding SNPs, people. But that’s a post for another day.

« Previous Page