Rare Variant Studies of Common Disease

Not so long ago, there was a hope in the research community that common genetic variation, i.e. variants present at minor allele frequencies >5% in human populations, might explain most or all of the heritability of common complex disease. That would have been convenient, because such variants can be genotyped with precise, inexpensive, high-density SNP arrays in tens of thousands of samples.

Sadly, the human genome doesn’t play that way.

Genome-wide association studies have implicated hundreds (if not thousands) of new loci in common complex disease. Yet most of the identified variants had a very small effect on risk, and they collectively explained only a fraction of disease heritability. One possible explanation was that rare variants, which are largely untested by high-density SNP arrays, might account for some of that missing heritability. Yet large-scale sequencing studies of common complex disease have not been financially viable until very recently.

As we forge ahead with the Alzheimer’s Disease Sequencing Project, TopMed, CCDG, and other projects, it’s promising to see results like those in the common/rare variant association study recently published by the International AMD Genomics Consortium.

Age-related Macular Degeneration: A Common Disease

Age-related macular degeneration (AMD) is the leading cause of blindness, affecting about 10 million patients worldwide. It’s a progressive disease whose biological underpinnings are still not well understood, and therapeutic options are limited. Like most age-related diseases, this is a complex phenotype with numerous risk factors, but there’s clearly a substantial inherited component at play.

As of last year, GWAS efforts had uncovered 21 loci in which genetic variation affects disease risk. Translating these into biological insights (or better yet, therapeutic targets) has been challenging.

Massive GWAS: Common and Rare Variants

The International AMD Genomics Consortium (IAMDGC) brought together 16,000 AMD cases and 17,000 controls from 26 different studies, and genotyped them using a customized set of variants:

  • Common variants used for classic genome-wide association studies
  • Low-frequency coding variants, i.e. “exome chip”
  • Protein-altering variants detected by previous AMD gene sequencing studies

Altogether, the authors directly genotyped about 450,000 variants (160,000 of which were protein-altering). After imputation, they were able to analyze 12 million variants overall. Single-variant association testing revealed 34 susceptibility loci for AMD:

AMD gwas loci

Figure 1a (IAMDGC, Nature Genet, 2015)

The 52 associated variants roughly double the number of genetic loci for AMD. The vast majority of them (42/52) are common, with MAF >1% and relatively small effects on risk. The odds ratio (OR) which measures the relative increase/decrease of risk conferred by such variants, ranges from 1.1-2.9.

The Role of Rare Variants

Yet the authors also observed 7 significantly associated rare variants (MAF<1%) with odds ratios of 1.1-47.6. All seven were located in or near complement genes (that’s “complement” as in the innate immune system complex), which had been implicated in AMD by sequencing studies over the past couple of years. Four genes also exhibited a significant burden of rare damaging variants, suggesting a functional link to disease risk.

Notably, three of those four burden signals were due to variants with frequency <0.1%, suggesting that trait-associated variants with clear functional consequences might be even rarer than we’d guessed. The corollary, of course, is that sample sizes will need to be much larger to detect them with any kind of power.

Shared Genetics for Mendelian and Complex Disease

One of the rare variant burden genes, TIMP3, was previously associated with Sorsby’s fundus dystrophy, a rare disease similar to AMD but with earlier on set and Mendelian inheritance. The Mendelian disease variants occur largely in exon 5, but the IAMDGC’s study uncovered a number of rare variants of the same class (nonsynonymous changes disrupting cysteine residues) in other exons in AMD cases.

Carriers of such alleles also had a burden of other AMD-associated variants, suggesting that TIMP3 variation contributes to disease risk in conjunction with other variants. It’s a cool example of variation in the same gene giving rise to monogenic and complex disorders with similar clinical presentations.

Outlook for Common Disease Genomics

I like this study because it demonstrates the importance of looking at both common and rare variants, in a large number of samples, to more comprehensively interrogate the genome for complex disease loci. It sets the stage for large-scale sequencing of complex disease. We have the tools and we have the sample collections. Now, we just need the funding.

Fritsche LG, Igl W, Bailey JN, Grassmann F, Sengupta S, Bragg-Gresham JL, Burdon KP, Hebbring SJ, Wen C, Gorski M, Kim IK, Cho D, Zack D, Souied E, Scholl HP, Bala E, Lee KE, Hunter DJ, Sardell RJ, Mitchell P, Merriam JE, Cipriani V, Hoffman JD, Schick T, Lechanteur YT, Guymer RH, Johnson MP, Jiang Y, Stanton CM, Buitendijk GH, Zhan X, Kwong AM, Boleda A, Brooks M, Gieser L, Ratnapriya R, Branham KE, Foerster JR, Heckenlively JR, Othman MI, Vote BJ, Liang HH, Souzeau E, McAllister IL, Isaacs T, Hall J, Lake S, Mackey DA, Constable IJ, Craig JE, Kitchner TE, Yang Z, Su Z, Luo H, Chen D, Ouyang H, Flagg K, Lin D, Mao G, Ferreyra H, Stark K, von Strachwitz CN, Wolf A, Brandl C, Rudolph G, Olden M, Morrison MA, Morgan DJ, Schu M, Ahn J, Silvestri G, Tsironi EE, Park KH, Farrer LA, Orlin A, Brucker A, Li M, Curcio CA, Mohand-Saïd S, Sahel JA, Audo I, Benchaboune M, Cree AJ, Rennie CA, Goverdhan SV, Grunin M, Hagbi-Levi S, Campochiaro P, Katsanis N, Holz FG, Blond F, Blanché H, Deleuze JF, Igo RP Jr, Truitt B, Peachey NS, Meuer SM, Myers CE, Moore EL, Klein R, Hauser MA, Postel EA, Courtenay MD, Schwartz SG, Kovach JL, Scott WK, Liew G, Tan AG, Gopinath B, Merriam JC, Smith RT, Khan JC, Shahid H, Moore AT, McGrath JA, Laux R, Brantley MA Jr, Agarwal A, Ersoy L, Caramoy A, Langmann T, Saksens NT, de Jong EK, Hoyng CB, Cain MS, Richardson AJ, Martin TM, Blangero J, Weeks DE, Dhillon B, van Duijn CM, Doheny KF, Romm J, Klaver CC, Hayward C, Gorin MB, Klein ML, Baird PN, den Hollander AI, Fauser S, Yates JR, Allikmets R, Wang JJ, Schaumberg DA, Klein BE, Hagstrom SA, Chowers I, Lotery AJ, Léveillard T, Zhang K, Brilliant MH, Hewitt AW, Swaroop A, Chew EY, Pericak-Vance MA, DeAngelis M, Stambolian D, Haines JL, Iyengar SK, Weber BH, Abecasis GR, & Heid IM (2016). A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nature genetics, 48 (2), 134-43 PMID: 26691988

My Worst Kept Secret: The Rogue Retrieval

As you might guess from the hundreds of MassGenomics blog posts over the years, I enjoy writing. It’s a vital skill to scientists, not just for writing grants and papers, but for communicating the achievements (and importance) of science to the general public. The researchers whose work I admire, whose careers are a model for success, tend to be excellent writers.

Writing fiction, for me at least, is far more challenging. While there’s no barrier for entry, the odds of success are much smaller. Few traditional publishers (Random House, HarperCollins, etc.) take submissions directly from authors. Instead, one must have a literary agent who can send a novel manuscript to a targeted list of editors. Most literary agents receive thousands of queries each year from aspiring authors, but take on only a handful of new clients. Even with an agent, the odds of landing an offer of publication (a “book deal” in industry parlance) are slim.

Through what I can only assume is a series of clerical errors, I have managed to navigate this gauntlet. My science fiction novel The Rogue Retrieval — about a researcher who goes A.W.O.L. into a pristine medieval world he’s been studying, and the mission to retrieve him — was published yesterday by HarperCollins.

Stage magician Quinn Bradley has one dream: to headline his own show on the Vegas Strip. And with talent scouts in the audience wowed by his latest performance, he knows he’s about to make the big-time. What he doesn’t expect is an offer to go on a quest to a place where magic is all too real.

That’s how he finds himself in Alissia, a world connected to ours by a secret portal owned by a powerful corporation. He’s after an employee who has gone rogue, and that’s the least of his problems. Alissia has true magicians…and the penalty for impersonating one is death. In a world where even a twelve-year-old could beat Quinn in a swordfight, it’s only a matter of time until the tricks up his sleeves run out.

Fans of Terry Brooks and Terry Pratchett will find this a thrilling read.

Links: HarperCollins  Amazon   Barnes&Noble   Goodreads   iBooks

In case you’re worried, publishing a book is surprisingly non-lucrative. In other words, I have no plans to leave the world of genomics. You won’t get rid of me that easily!

Sequencing for Common Complex Disease

sequencing for common disease

So, the press embargo lifted yesterday on our worst-kept secret: we have won a four-year, $60 million grant to serve as a Center for Common Disease Genomics (CCDG). The CCDG marks a new direction for NHGRI’s flagship sequencing program to comprehensively study the genetic architecture of common disease.

This $240 million initiative aims to sequence 200,000 genomes over the next four years, making it among the largest sequencing studies in the world.

Institution Amount
The McDonnell Genome Institute $60 million
Baylor College of Medicine $60 million
Broad Institute of MIT and Harvard $80 million
The New York Genome Center $40 million

The official announcements of the program came from NHGRI and has been well-covered by national publications (like the STAT article that quotes me) and local media (like the St. Louis Post-Dispatch).

I thought I’d offer the inside view of someone involved in writing one of the successful applications. In other words, let me tell you a story.

The RFA: A New Direction

The request for applications for this program (RFA-HG-015-001) was posted in late December 2014. I probably read it at least eight times in its entirety. The 30+ pages made for an interesting read. Right off the bat, it was clear that for this program, the NHGRI was emphasizing:

  • Multiple common disease phenotypes. The definition of “common” was not specified, but the implication was that this meant non-Mendelian disorders of appreciable frequency in the population, things like heart disease, diabetes, and Alzheimer’s.
  • Diversity across the board, including disease phenotypes, genetic architectures, study designs, and perhaps most importantly, the human populations studied. Several sections of the RFA encouraged applicants to include under-represented populations (e.g. non-European ancestries) in their genetic studies.
  • Big sample numbers. The importance of rare variation and the empirically small effect sizes of variants implicated in common disease to date suggested that we’ll need BIG sample numbers to comprehensively study the genetic architecture of these diseases. The RFA mentions “as many as 25,000 cases and 25,000 controls could be required.”
  • Whole genome sequencing. It was obvious that, while NHGRI recognizes the utility of targeted (e.g. exome) sequencing and non-genomic sequencing (e.g. RNA-Seq), whole genome sequencing would be the priority. The advantage to WGS is that it’s a comprehensive assay, allowing one to study small variants (SNVs, indels) as well as large ones (SVs), both in coding and noncoding regions of the genome.

Each of these four themes offered a unique set of challenges.

The Challenge of Common Disease

It was quite clear, from the explicit language in the RFA, that NHGRI didn’t want cancer projects. Tumor-normal studies did not qualify, and even studies of cancer susceptibility would “receive lower priority.” This seems fair and reasonable, since there’s an entire institute at the  NIH designed to fund cancer research, but it kind of sucks if your sequencing center spent the last eight years building a reputation in cancer genomics.

Fortunately, we have also conducted a number of human genetics studies over the past two decades. We’d recently published some high profile studies of AMD, cleft lip, metabolic syndromes, and other phenotypes that demonstrated our ability to unravel common complex disease. The main challenge, I think, was choosing which common diseases to propose. Many of the most obvious common diseases already had large genetic studies under way. Others were in that “gray area” of uncertainty as to whether they met the criterion of common disease.

We had to think about what the other applicants were doing, too. We didn’t want to propose identical projects, but we knew that all of the awardees would eventually be working together. So some amount of synergy was desirable.

The Diversity Challenge

Anyone working in human genetics understands the importance of studying non-European populations. The challenge, quite frankly, was finding such cohorts. Most of the well-phenotyped, consented samples available for research in the United States are of European ancestry. There are many complex reasons for this, and I’m not qualified to explain them all. It’s clear that we, as a research community, need to make a concerted effort to collect samples from under-represented populations. That’s just beyond the scope of this RFA.

Fortunately, the first project that’s likely to be undertaken by our center will involve sequencing thousands of African-American samples, and we’re very excited about that.

The Sample Number Goals

Many numbers have been tossed around as the minimum requirement to comprehensively study the genetic architecture of complex disease. Some have argued for 10,000 samples, while other models (referenced in the RFA) were talking 50,000. Needless to say, NHGRI hoped to see projects with big sample numbers. This, essentially, was one of the most challenging aspects of this application.

There *are* large sample collections for common disease studies that have been banked for the last decade or more. Some of them hit those sample counts. The issue is the informed consent. A strict requirement of this program is that all sequencing data be submitted to public repositories, i.e. dbGaP. This means that samples must be properly consented for public data sharing and deposition. Frankly, many sample collections are not consented in this manner, and IRBs are paying attention. It was a difficult challenge to find large, well-characterized sample sets with modern consents.

Whole Genome Sequencing Challenges

The language of the RFA made it quiet clear that studies emphasizing whole genome sequencing were sought for this program. We’re big fans of this approach, of course, but it also comes with some limitations: the cost of generating WGS data — even with an Illumina X Ten installation — is considerable. Especially when trying to design studies numbering tens of thousands of samples.

Unfortunately, while WGS costs have come down considerably, they’re not low enough for us to do all of the studies that we wanted. The sequencing costs for a 20,000-sample study at current prices exceed the entire budget of any one center. In other words, we had to prioritize samples and projects. And we had to choose studies that could be coordinated across multiple centers to maximize our discovery power.

An Open Door for Sequencing Studies

The proposals funded for CCDG all achieved exceptional grant scores from the study sections, which speaks to the high quality of the science that we all proposed. The projects that we’ve lined up for years 1 and 2 are very exciting. Much of it still needs to be finalized, but I think it’s safe to say that cardiovascular disease (the western world’s #1 killer) will be a mainstay of MGI’s efforts.

The CCDG award, when viewed in the light of the recent NIH budget increase, also presents a unique opportunity to seek funding for other large-scale common disease sequencing studies that might be co-funded with other institutes. If you have a large sample collection that seems to fit the CCDG mission (and an idea of the institute that would help support the work), please get in touch! We would love to expand our research portfolio to tackle other disease phenotypes that are critical for human health.

Come on, let’s do this.

Year in Review: NGS and Large-scale Genomics

massgenomics highlights 2015This year (2015) was a dynamic and busy one for the field of next-gen sequencing. We saw the release of a paradigm-shifting sequencing platform, milestone publications of key big science projects, and the sustained acceleration of discovery enabled by high-throughput genomics technologies. Here are some of the things that rocked our world in 2015.

Large-Scale Genome Sequencing

My first post in 2015 was uncannily prescient: it reviewed a study that uncover rare variants associated with myocardial infarction using exome data from NHLBI’s Exome Sequencing Project (ESP). The ESP was a pioneering effort in many ways, and its database of coding variation from thousands of exomes has been a critical resource for many research studies.

One clear trend from this year, however, is that the time of the exome is waning. Later in January, I profiled Illumina’s newest sequencing platforms.  The most significant of these was the HiSeq X Ten, a 10-instrument “factory installation” that enabled the most cost-effective human whole genome sequencing to date: 18,000 genomes per year at a consumables cost of just over $1,000 each (note: this does not include the costs of data storage, analysis, or the $10 million buy-in). Correction: this figure does incorporate the cost of the instruments based on 100% capacity over a period of 4 years. However, I doubt it incorporates the system’s additional $100,000 “shipping & handling” fee.

Chief among the restrictions that accompanied the X Ten was the requirement that X Tens only be used for whole-genome sequencing, and only for human genomes.

The Value of Cohorts

For the first few months of 2015, my world was consumed by our center’s application to the Centers for Common Disease Genomics program from NHGRI. Among many things, it made me appreciate the value of large, diverse, deeply-phenotyped, widely consented sample cohorts for genomics research. And as I pointed out in February, consumer genetics firm 23andMe has one of the largest sample cohorts in the United States. Among the 700,000+ individuals who’ve undergone 23andMe testing are subsets of intense clinical interest, such as Parkinson’s disease patients.

Granted, their research cohort has some caveats to it, not the least of which is the fact that most of the phenotypes are self-reported. Even so, 23andMe made at least two big-money deals with pharmaceutical companies, which suggests that they might be onto something.

Epigenetics and Regulatory Variation

One of my favorite papers from this year was the landmark publication of the NIH Epigenomics Roadmap Consortium, which profiled 111 primary human tissues and cell types for histone modification patterns, DNA accessibility, DNA methylation, and gene expression. The Epigenomics Roadmap, the ENCODE Project, and other functional genomics initiatives are just so vital as we expand our search for phenotypically-relevant variants outside of the coding regions.

Recent forays into the study of regulatory variation have already been promising. Just a few months ago, I reviewed a paper demonstrating that regulatory variation near genes predicts gene dosage sensitivity. More studies like that are bound to come.

Genetic Variation and Human Disease

There were far too many disease gene discovery papers than I could ever hope to cover on MassGenomics, so I admit to playing favorites. I enjoyed showcasing our targeted sequencing study of cleft lip, in which we used model systems in zebrafish and mice to functionally validate rare variants uncovered by sequencing known GWAS loci.

I also reviewed a wonderfully informative study on human de novo mutations in 250 Dutch families sequenced by the Genome of the Netherlands Consortium. My favorite tidbit of this paper was the observation that 75% of de novo mutations come from the father, whose age was correlated not just with the number of de novo mutations, but also their location relative to late-replicating regions of the genome.

In June, I highlighted some of the latest additions to the catalogue of known retinal disease genes, also known as RetNet. Among the many diseases whose genetic underpinnings can be studied by NGS, retinal diseases might be the top beneficiary. In the June update, 278 retinal disease genes had been mapped, including first bona-fide noncoding gene linked to a retinal disease.

Beyond Research: Sequencing in the Clinic

Next-generation sequencing continues to find new clinical applications. Exome sequencing, especially via the GeneDx service, has become a routine diagnostic test. Over the summer, I wrote a post emphasizing the importance of clinical sequence data sharing in repositories like the ClinSeq database, which has already proven life-saving (or life-changing) for thousands of patients and their families. A month later, I covered a paper in Nature Genetics that offered some sound advice on how to succeed at clinical genome sequencing.

My institute and St. Louis Children’s Hospital also launched a new initiative this year, called the Pediatric Genomics Board (PGB), to perform research exome sequencing for infants with severe (but undiagnosed) genetic disorders. I’m therefore excited to see the potential of state-of-the-art sequencing and analysis techniques to determine the molecular cause of such cases.

Personal Highlights for 2015

As the year draws to a close, I find myself grateful, because 2015 has been good to me. I saw my 60th research publication this year, and helped write one of the best grant applications I’ve ever been a part of (look for a press release next month). I got to see Oregon and meet all kinds of nice people when the OHSU Program for Molecular and Cellular Biology invited me to deliver the keynote at their annual retreat. I got to chair an outstanding session on cancer genomics at the ASHG meeting in Baltimore, and I was on TV for almost five minutes! (it was ASHG TV, but that counts in my book).

And speaking of books, I learned that HarperCollins will publish The Rogue Retrieval, my novel about a Vegas magician who infiltrates a medieval world. It comes out January 19th, by the way. If you’ve read this far in my blog post, it’s probably right up your alley.

Thank you for your loyal readership of MassGenomics this year. See you in 2016!