There was an interesting post over at GeneticFuture on why genome-wide association studies fail. It’s a good discussion of the many challenges that still face GWAS even in the era of high-throughput SNP genotyping.
It should be noted that there have been many successful genome-wide association studies, especially since the completion of the International HapMap Project (phases I/II). Last year saw high-profile publication of GWAS’s for coronary heart disease, breast cancer, celiac disease, type I diabetes and Crohn’s disease , just to name a few. deCODE Genetics performed a large-scale study on the genetics underlying exfoliation glaucoma, and found that individuals with two particular SNPs in the first exon of LOXL1 had a 100X greater chance of getting the disease.
Last June the Wellcome Trust Case Control Consortium published the largest study ever of genetics behind common diseases. In a massive cohort of 17,000 samples, the researchers performed GWAS’s for diabetes, rheumatoid arthritis, cardiac disease, and other common, complex phenotypes. Perhaps the most exciting result of this study was the association of several genes that had never before been implicated in human disease.
Yet, as the GeneticFuture post pointed out, we rarely hear about the failure of genome-wide association studies to turn up such interesting discoveries. The complexities of small allelic effects, population structure, rare variants, and copy number variation may explain how such failures manifest in the realm of genetics. As for epigenetic factors and disease heterogeneity, well, these issues are out of our hands for the time being.
As far as SNPs go, I believe we’re getting very close to a complete catalog of variation that’s common in human populations. Genome-wide sequencing of two individual human genomes each found ~600,000 SNPs that are not already in dbSNP. At best they’d increase the number of known SNPs by ~10%. At ~10-11 million SNPs, dbSNP is mostly complete in my opinion. We still have a long way to go, though, in cataloging copy number variation.
Another challenge not mentioned in FutureMedicine, but nevertheless important, is the fact that a substantial fraction of the genetic variation underlying complex disease occurs outside the coding regions of known genes. It’s time to look beyond nonsynonymous coding SNPs, people. But that’s a post for another day.
Hey, promising new blog – I look forward to hearing more from you.
Quick point – it’s Genetic Future, not FutureMedicine!
Regarding the completeness of dbSNP: I think you’re probably right for common SNPs in Europe and East Asia, but there’s a lot of variation out there that is yet to be explored. Even in the well-surveyed populations there are plenty of rare variants (frequency between 0.1 and 5%) that are totally unknown, but could still be functionally important. In addition, most human populations are currently extremely poorly characterised at a genetic level – in Africa alone there’s a deep well of diversity that hasn’t been really tapped at all by the surveys performed so far.
The 1000 Genomes Project will hack into some of this variation, but ultimately it will take much larger high-throughput sequencing projects in a wide variety of populations to completely catalogue functional human variation.
Regarding non-coding variation: this is an excellent point with some serious consequences for downstream analysis, although it doesn’t really explain the failure of GWAS to capture disease-causing variation (the major topic of my post). Modern SNP chips tag non-coding variation pretty well, and thus do at least identify non-coding regions associated with disease (even if we don’t know what they do!).