Sequencing for Common Complex Disease

So, the press embargo lifted yesterday on our worst-kept secret: we have won a four-year, $60 million grant to serve as a Center for Common Disease Genomics (CCDG). The CCDG marks a new direction for NHGRI’s flagship sequencing program to comprehensively study the genetic architecture of common disease.

This $240 million initiative aims to sequence 200,000 genomes over the next four years, making it among the largest sequencing studies in the world.

Institution	Amount
The McDonnell Genome Institute	$60 million
Baylor College of Medicine	$60 million
Broad Institute of MIT and Harvard	$80 million
The New York Genome Center	$40 million

The official announcements of the program came from NHGRI and has been well-covered by national publications (like the STAT article that quotes me) and local media (like the St. Louis Post-Dispatch).

I thought I’d offer the inside view of someone involved in writing one of the successful applications. In other words, let me tell you a story.

The RFA: A New Direction

The request for applications for this program (RFA-HG-015-001) was posted in late December 2014. I probably read it at least eight times in its entirety. The 30+ pages made for an interesting read. Right off the bat, it was clear that for this program, the NHGRI was emphasizing:

Multiple common disease phenotypes. The definition of “common” was not specified, but the implication was that this meant non-Mendelian disorders of appreciable frequency in the population, things like heart disease, diabetes, and Alzheimer’s.
Diversity across the board, including disease phenotypes, genetic architectures, study designs, and perhaps most importantly, the human populations studied. Several sections of the RFA encouraged applicants to include under-represented populations (e.g. non-European ancestries) in their genetic studies.
Big sample numbers. The importance of rare variation and the empirically small effect sizes of variants implicated in common disease to date suggested that we’ll need BIG sample numbers to comprehensively study the genetic architecture of these diseases. The RFA mentions “as many as 25,000 cases and 25,000 controls could be required.”
Whole genome sequencing. It was obvious that, while NHGRI recognizes the utility of targeted (e.g. exome) sequencing and non-genomic sequencing (e.g. RNA-Seq), whole genome sequencing would be the priority. The advantage to WGS is that it’s a comprehensive assay, allowing one to study small variants (SNVs, indels) as well as large ones (SVs), both in coding and noncoding regions of the genome.

Each of these four themes offered a unique set of challenges.

The Challenge of Common Disease

It was quite clear, from the explicit language in the RFA, that NHGRI didn’t want cancer projects. Tumor-normal studies did not qualify, and even studies of cancer susceptibility would “receive lower priority.” This seems fair and reasonable, since there’s an entire institute at the NIH designed to fund cancer research, but it kind of sucks if your sequencing center spent the last eight years building a reputation in cancer genomics.

Fortunately, we have also conducted a number of human genetics studies over the past two decades. We’d recently published some high profile studies of AMD, cleft lip, metabolic syndromes, and other phenotypes that demonstrated our ability to unravel common complex disease. The main challenge, I think, was choosing which common diseases to propose. Many of the most obvious common diseases already had large genetic studies under way. Others were in that “gray area” of uncertainty as to whether they met the criterion of common disease.

We had to think about what the other applicants were doing, too. We didn’t want to propose identical projects, but we knew that all of the awardees would eventually be working together. So some amount of synergy was desirable.

The Diversity Challenge

Anyone working in human genetics understands the importance of studying non-European populations. The challenge, quite frankly, was finding such cohorts. Most of the well-phenotyped, consented samples available for research in the United States are of European ancestry. There are many complex reasons for this, and I’m not qualified to explain them all. It’s clear that we, as a research community, need to make a concerted effort to collect samples from under-represented populations. That’s just beyond the scope of this RFA.

Fortunately, the first project that’s likely to be undertaken by our center will involve sequencing thousands of African-American samples, and we’re very excited about that.

The Sample Number Goals

Many numbers have been tossed around as the minimum requirement to comprehensively study the genetic architecture of complex disease. Some have argued for 10,000 samples, while other models (referenced in the RFA) were talking 50,000. Needless to say, NHGRI hoped to see projects with big sample numbers. This, essentially, was one of the most challenging aspects of this application.

There *are* large sample collections for common disease studies that have been banked for the last decade or more. Some of them hit those sample counts. The issue is the informed consent. A strict requirement of this program is that all sequencing data be submitted to public repositories, i.e. dbGaP. This means that samples must be properly consented for public data sharing and deposition. Frankly, many sample collections are not consented in this manner, and IRBs are paying attention. It was a difficult challenge to find large, well-characterized sample sets with modern consents.

Whole Genome Sequencing Challenges

The language of the RFA made it quiet clear that studies emphasizing whole genome sequencing were sought for this program. We’re big fans of this approach, of course, but it also comes with some limitations: the cost of generating WGS data — even with an Illumina X Ten installation — is considerable. Especially when trying to design studies numbering tens of thousands of samples.

Unfortunately, while WGS costs have come down considerably, they’re not low enough for us to do all of the studies that we wanted. The sequencing costs for a 20,000-sample study at current prices exceed the entire budget of any one center. In other words, we had to prioritize samples and projects. And we had to choose studies that could be coordinated across multiple centers to maximize our discovery power.

An Open Door for Sequencing Studies

The proposals funded for CCDG all achieved exceptional grant scores from the study sections, which speaks to the high quality of the science that we all proposed. The projects that we’ve lined up for years 1 and 2 are very exciting. Much of it still needs to be finalized, but I think it’s safe to say that cardiovascular disease (the western world’s #1 killer) will be a mainstay of MGI’s efforts.

The CCDG award, when viewed in the light of the recent NIH budget increase, also presents a unique opportunity to seek funding for other large-scale common disease sequencing studies that might be co-funded with other institutes. If you have a large sample collection that seems to fit the CCDG mission (and an idea of the institute that would help support the work), please get in touch! We would love to expand our research portfolio to tackle other disease phenotypes that are critical for human health.

Come on, let’s do this.

Comments

cariaso says

January 15, 2016 at 10:24 pm

Bravo for what’s here, but I hope for something more I don’t see. Is there any way for these 200k people to get access to their own sequencing data? If not, what is the barrier that prevents it?
Dan Koboldt says

January 15, 2016 at 11:34 pm

That’s a great question, and I don’t know the answer. It will depend (to some extent) on the informed consent signed by the participants. In my experience, many of these do not have a provision for return of data (or results), but I will look into this and report back.