An essay published last week in Cell dismissed the findings of genome-wide association studies (GWAS) and questioned their value to the study of human disease. In their article Genetic Heterogeneity in Human Disease, McLellan and King argue that because common diseases exhibit a high deegree of allelic, locus, and phenotypic heterogeneity, their causality “can almost never be resolved by large-scale assocation studies.” Instead, the authors believe that rare mutations underlie most of the disease-relevant genetic variation in humans, and as such, their causal relationships can only be uncovered by sequencing-based approaches. The article as a whole comes off as uninformed and misleading. Thanksfully, genomics bloggers have taken them to task: p-ter at the Gene Expression blog explains how noncoding variants influence disease risk, and Kai Lang guest-posts at Genetic Future with a full-on criticism of the essay.
GWAS Overload
I am tempted to agree with McLellan and King in some respects, particularly in their concern that the myriad of GWAS publications often fail to advance our understanding of many common diseases. And I do mean myriad. Once upon a time, Nature Genetics was my favorite journal for cutting-edge genetics and genomics discoveries. Take, for example, the summer of 2006 when three high-profile papers revealed the presence of extensive structural variation in the human genome. In recent months, however, I find myself underwhelmed by the content of this particular journal, as it seems saturated with GWAS, GWAS, and more GWAS.
In fact, when I looked at the ~70-80 research articles published in 2010 in Nat. Genetics, more than half (46) were association studies, or worse, meta-analyses of association studies. It’s like every investigator in the world with a disease cohort got a hold of an Affy or Illumina SNP array. When I scan the titles each month in my RSS reader, my eyes begin to glaze over with each new title that reads “Common variants associated with…” or “Genome-wide assocation study identifies…” Unless you happen to be an investigator studying the phenotype or disease of interest, these cookie-cutter papers probably hold little interest for you.
That said, I took issue with much of what was written in the McLellan and King essay. Specifically:
- Their disparagement of the value of GWAS studies based upon the observation that most associations come from intergenic regions. As my colleagues in the blogosphere have pointed out, the aim of high-density SNP arrays is not to pinpoint the causal SNP; in fact, a high-frequency variant is more likely to be included than a rare nonsynonymous SNP simply because the former is more informative as a genetic marker.
- Their blanket dismissal of most GWAS findings as artifacts of “cryptic population stratification.” The authors suggest that although outliers based on population substructure may be excluded, “hypervariable polymorphisms resmain vulnerable to stratification.” As Kai Wang points out in a guest post on Genetic Future, the methods to account for hidden population structure are well established in the GWAS community.
- Their apparent misunderstanding of how genome-wide association studies work. They write: “Had sickle cell anemia been investigated among afected individuals worldwide, the number of responsible mutations would be far greater and hence no one allele at any SNP would be consistently associated with the disease.” This is flat-out wrong. Although there are hundreds of known mutations in HBB — the gene that encodes hemoglobin and, when mutated, causes sickle-cell anemia — most cases are caused by a single amino acid change (glutamic acid -> valine). Sickle-cell is autosomal recessive, so it’s rather preposterous to assume that a worldwide study would fail to associate the homozygous variant with the disease.
Common Disease, Common Variants
The authors seem convinced that the common disease, common variant theory no longer holds because (according to them) not many have been found. Rather, McLellan and King believe that “the overall magnitude of human genetic variation, the high rate of de novo mutation, the range of mutational mechanisms that disrupt gene function, and the complexity of biological processes underlying pathophysiology all predict a substantial role for rare severe mutations in complex human diseases.” Do humans have a high rate of de novo mutation? That’s news to me.
Unfortunately, the difficulties of associating common variants with complex disease are also faced by rare variants. Namely, picking out causal relationships among complex networks of interactions between many genes and environmental factors. The observation that few such relationships have been elucidated, if true, does not mean that we are looking at the wrong variants. An important fact that seems to have been overlooked by the authors it that the vast majority of human genetic variation *is* shared. From the dozen or so individual genomes published so far, it is clear that perhaps 10% of variants are novel; as databases like dbSNP continue to grow, this will shrink even further. I am reluctant to believe that this small fraction of “rare” mutations accounts for the numerous prevalent human diseases.
A Time to Sequence
Strangely, the emphasis on rare variation seems to indicate that the authors would make a strong case for sequencing. Yet the issue does not even come to light until the last 3/4 of a page in a section entitled “A Time to Sequence – With an appreciation to Maynard Olson.” Surely, I thought, they’ll wow us with the capabilities of next-generation sequencing technologies and their promise for studying complex disease. Not so. Instead, the authors vaguely hint that “new sequencing technologies provide conceptual and practical advantages over current approaches (Olson, 1995).” Why are they citing a fifteen-year-old article to support the advantages of new sequencing technologies? Where are the citations of landmark sequencing/WGS papers? The only citation related to NGS that I see is McKernan 2009, and you know how I feel about that one.
This ending is unfortunate, because sequencing ultimately will provide us with many of the answers. I’m tired of seeing Yet-Another-GWAS that concludes with a table of loci and p-values, or at most, a list of genes. Comprehensive, convincing studies of genetic association should have a strong sequencing component, in which the regions implicated by genotyping are exhaustively sequenced to identify all putative causal variants. Such variants could then be analyzed computationally and experimentally to characterize their effects on gene structure or regulation. Thus, I find myself reluctantly agreeing with King and McLellan on this point: genetic association is not enough.
References
McClellan J, & King MC (2010). Genetic heterogeneity in human disease. Cell, 141 (2), 210-7 PMID: 20403315