An essay published last week in Cell dismissed the findings of genome-wide association studies (GWAS) and questioned their value to the study of human disease. In their article Genetic Heterogeneity in Human Disease, McLellan and King argue that because common diseases exhibit a high deegree of allelic, locus, and phenotypic heterogeneity, their causality “can almost never be resolved by large-scale assocation studies.” Instead, the authors believe that rare mutations underlie most of the disease-relevant genetic variation in humans, and as such, their causal relationships can only be uncovered by sequencing-based approaches. The article as a whole comes off as uninformed and misleading. Thanksfully, genomics bloggers have taken them to task: p-ter at the Gene Expression blog explains how noncoding variants influence disease risk, and Kai Lang guest-posts at Genetic Future with a full-on criticism of the essay.
GWAS Overload
I am tempted to agree with McLellan and King in some respects, particularly in their concern that the myriad of GWAS publications often fail to advance our understanding of many common diseases. And I do mean myriad. Once upon a time, Nature Genetics was my favorite journal for cutting-edge genetics and genomics discoveries. Take, for example, the summer of 2006 when three high-profile papers revealed the presence of extensive structural variation in the human genome. In recent months, however, I find myself underwhelmed by the content of this particular journal, as it seems saturated with GWAS, GWAS, and more GWAS.
In fact, when I looked at the ~70-80 research articles published in 2010 in Nat. Genetics, more than half (46) were association studies, or worse, meta-analyses of association studies. It’s like every investigator in the world with a disease cohort got a hold of an Affy or Illumina SNP array. When I scan the titles each month in my RSS reader, my eyes begin to glaze over with each new title that reads “Common variants associated with…” or “Genome-wide assocation study identifies…” Unless you happen to be an investigator studying the phenotype or disease of interest, these cookie-cutter papers probably hold little interest for you.
That said, I took issue with much of what was written in the McLellan and King essay. Specifically:
- Their disparagement of the value of GWAS studies based upon the observation that most associations come from intergenic regions. As my colleagues in the blogosphere have pointed out, the aim of high-density SNP arrays is not to pinpoint the causal SNP; in fact, a high-frequency variant is more likely to be included than a rare nonsynonymous SNP simply because the former is more informative as a genetic marker.
- Their blanket dismissal of most GWAS findings as artifacts of “cryptic population stratification.” The authors suggest that although outliers based on population substructure may be excluded, “hypervariable polymorphisms resmain vulnerable to stratification.” As Kai Wang points out in a guest post on Genetic Future, the methods to account for hidden population structure are well established in the GWAS community.
- Their apparent misunderstanding of how genome-wide association studies work. They write: “Had sickle cell anemia been investigated among afected individuals worldwide, the number of responsible mutations would be far greater and hence no one allele at any SNP would be consistently associated with the disease.” This is flat-out wrong. Although there are hundreds of known mutations in HBB — the gene that encodes hemoglobin and, when mutated, causes sickle-cell anemia — most cases are caused by a single amino acid change (glutamic acid -> valine). Sickle-cell is autosomal recessive, so it’s rather preposterous to assume that a worldwide study would fail to associate the homozygous variant with the disease.
Common Disease, Common Variants
The authors seem convinced that the common disease, common variant theory no longer holds because (according to them) not many have been found. Rather, McLellan and King believe that “the overall magnitude of human genetic variation, the high rate of de novo mutation, the range of mutational mechanisms that disrupt gene function, and the complexity of biological processes underlying pathophysiology all predict a substantial role for rare severe mutations in complex human diseases.” Do humans have a high rate of de novo mutation? That’s news to me.
Unfortunately, the difficulties of associating common variants with complex disease are also faced by rare variants. Namely, picking out causal relationships among complex networks of interactions between many genes and environmental factors. The observation that few such relationships have been elucidated, if true, does not mean that we are looking at the wrong variants. An important fact that seems to have been overlooked by the authors it that the vast majority of human genetic variation *is* shared. From the dozen or so individual genomes published so far, it is clear that perhaps 10% of variants are novel; as databases like dbSNP continue to grow, this will shrink even further. I am reluctant to believe that this small fraction of “rare” mutations accounts for the numerous prevalent human diseases.
A Time to Sequence
Strangely, the emphasis on rare variation seems to indicate that the authors would make a strong case for sequencing. Yet the issue does not even come to light until the last 3/4 of a page in a section entitled “A Time to Sequence – With an appreciation to Maynard Olson.” Surely, I thought, they’ll wow us with the capabilities of next-generation sequencing technologies and their promise for studying complex disease. Not so. Instead, the authors vaguely hint that “new sequencing technologies provide conceptual and practical advantages over current approaches (Olson, 1995).” Why are they citing a fifteen-year-old article to support the advantages of new sequencing technologies? Where are the citations of landmark sequencing/WGS papers? The only citation related to NGS that I see is McKernan 2009, and you know how I feel about that one.
This ending is unfortunate, because sequencing ultimately will provide us with many of the answers. I’m tired of seeing Yet-Another-GWAS that concludes with a table of loci and p-values, or at most, a list of genes. Comprehensive, convincing studies of genetic association should have a strong sequencing component, in which the regions implicated by genotyping are exhaustively sequenced to identify all putative causal variants. Such variants could then be analyzed computationally and experimentally to characterize their effects on gene structure or regulation. Thus, I find myself reluctantly agreeing with King and McLellan on this point: genetic association is not enough.
References
McClellan J, & King MC (2010). Genetic heterogeneity in human disease. Cell, 141 (2), 210-7 PMID: 20403315
Another thoughtful and well written post. I haven’t read the King piece yet, but have been following all the dust it has kicked up. It certainly sounds like there were flaws.
The tide certainly is turning against the entire concept of GWAS studies, and King is not alone in her disparagement, as I’m sure you are aware. You object to King’s dismissal of GWAS because many associated variants are intergenic. I the idea that the SNPs are simply markers is well understood. You say, “…high-frequency variant is more likely to be included than a rare nonsynonymous SNP simply because the former is more informative as a genetic marker.” But THAT is the point–these intergenic “markers” have by and large failed to successfully mark anything! Intergenic means we really don’t know where to look for causation. Nearby? How near? I don’t see mentioned SP Dickson’s PLOS Biology article which has a lot to say about this issue. I don’t know about “cryptic population stratification,” but perhaps we should substitute “synthetic associations” as a hipper GWAS flaw.
About sickle cell and “flat out wrong.” Take a look at the Dickson article. They examine SCD specifically and find strong associations with markers up to 2.5 MB away from HBB. That sucks.
I think your disappointment in Nat Genetics (which I share) means you really do agree with King. At the heart of your post, you say that you just don’t believe that rare variants can explain common diseases. I would start to get comfortable with that idea. You defend the common variant hypothesis, but don’t tackle head on the primary motivation of the King piece: that we have found squat so far with GWAS. Keep looking? Ever hear the phrase, “throwing good money after bad?” The common variant hypothesis was born of the observation that we humans share so much of our sequence. But the size of the rare variant pool took everyone by surprise. And while millions upon millions of dollars have been spent trying to squeeze blood out of the common variant stone, we have not yet begun to look at rare variants. That is set to change.
Perhaps the reason King doesn’t reference “landmark sequencing/WGS” papers is that the bulk of these have so far primarily been technical achievements and not genetic advances. IDH1? Until we find a clinical utility for this, the jury will remain out. Sorry to have to tell you this, but many biomedical investigators outside of the small circle of HGRI grantees have not been impressed with these efforts.
Yes sequencing, we all agree on that. But how? The genomics community is obsessed with the technical side of sequencing and analysis, but what people like King and Weinberg are pointing out is that maybe we should be listening to the geneticists about what samples to look at. Eventually, we will need to trade the shotgun in for a scalpel.
Michael T,
Thanks for the thoughtful comments. Please don’t misinterpret this post as a denial of the importance of rare variation. I strongly support the idea that BOTH common and rare variants will ultimately affect complex phenotypes. Just last night, I read an interesting blurb in Nature by Kevin Mitchell concerning the shift away from common variants to rare variants in psychiatric genetics.
As for the findings of GWAS efforts to date, p-ter has another post at the Gene Expression blog about the insufficient awe of what we’ve found so far.
Altough I agree with most of your post, I would like to temper the optimism somewhat regarding the use of sequencing improving the study of genetic asssociation. Ok, sequencing will offer more fine grained answers to the variatns, but its no solution to assess the functional implications of any variants found. In many cases I expect that it will cause more questions than answers e.g. how to validate non-coding variants? The tedious functional validation of any of these variants will still have to be done and will probably not be part of the initial publications either, leaving us again with a table of loci and pvalues.