Inherited retinal disease genomics at ASHG 2016

ASHG 2016 meeting

Inherited retinal diseases (IRDs) comprise a heterogeneous group of retinal degenerations, including retinitis pigmentosa, choroideremia, Leber congenital amaurosis, and other dystrophies. These disorders exhibit remarkable genetic heterogeneity; mutations in numerous different genes can cause a retinal disorder, and they can be inherited in dominant, recessive, or X‐linked fashion.

I proposed an invited session for the 2016 ASHG meeting because I believe that inherited retinal diseases offer several key advantages as a model system for gene discovery, functional validation, genetic counseling, and individualized treatment.

IRDs as a Model for Discovery and Translation

The genetic heterogeneity within IRDs encompasses a broad set of genes, inheritance patterns, and disease pathways. While the number of patients affected by a single one of these may be small, they provide a wider set of options for targeted therapies.

IRDs typically affect only a single tissue (the retina). Although the genes implicated in these disorders are involved in a variety of pathways, most of them share a molecular phenotype: high (and often transcript‐specific) expression in retinal tissue. This offers a compelling avenue of experimental approaches to functionally validate new candidate disease genes.

The fact that most IRDs are not life threatening allows for long‐term studies and recruitment from an actively engaged patient population. Thanks to decades of work to identify the genes and mutations responsible for IRDs, more than half of patients who undergo routine genetic testing receive a definitive molecular diagnosis. This success, combined with the customary slow onset of many retinal disorders, suggests that new therapeutic interventions that could dramatically improve the quality of life for thousands of patients. Indeed, Leber Congenital Amaurosis (LCA) was one of the first recessive disorders to undergo clinical trials for gene replacement therapy.

The catalogue of pathogenic mutations and disease‐associated genes for IRDs has grown considerably over the past two decades, fueled by both technological advances (i.e. next‐generation sequencing) and well‐established genetic testing programs. So, too, has our knowledge of the molecular mechanisms linking pathogenic mutations to distinct disease phenotypes. Yet there remains a pressing need for innovative approaches to functionally validate genetic discoveries, and to incorporate that information into patient counseling and clinical care.

Our Session at ASHG 2016

I’m pleased to say that the ASHG program committee selected our proposed session for the 2016 meeting! It brings together a diverse panel of experts in gene discovery, molecular characterization, genetic counseling, and treatment for inherited retinal diseases. Together, they will provide a snapshot of the current state of the art in IRD research, use it as a context to discuss the best practices for transforming genetic discoveries into personalized precision medicine.

Title: Gene discovery, genetic counseling, and clinical care of patients with inherited retinal diseases (Session #61).

When: Friday, October 21st, 11:00 a.m. to 1:00 p.m.

Where: Convention Centre Room 302, West Building.

Twitter-friendly? Hell yes.

Here’s the schedule:

  • Gene discovery and mutation detection in families with dominant retinitis pigmentosa. Steve Daiger, University of Texas Health Science Center, Houston.
  • Integrating genetic testing into the inherited retinal disease clinic. Kari Branham, Kellogg Eye Center, University of Michigan, Ann Arbor.
  • Functional validation and genetic intervention for retinal disease genes. Val Sheffield, University of Iowa Carver College of Medicine, Iowa City.
  • From gene discovery to clinical trials for inherited retinal diseases. Eric Pierce, Massachusetts Eye & Ear Infirmary, Boston.

We believe that this session therefore has appeal not just for groups working on retinal diseases, but also the wider community of researchers, genetic counselors, and clinicians who seek to translate genetic discoveries into improved clinical care. I hope to see you there. Please take pity on the moderator, and ask the speakers some questions!

What Clinicians Want from NGS-based Tests

Next-gen sequencing technologies have introduced two paradigm shifts for genomic medicine. First, they’ve accelerated the discovery of pathogenic variants and disease-associated genes for myriad inherited diseases, providing the basis for expanded genetic testing and carrier screens. Second, they’ve provided a superior assay for performing those tests in a clinical setting.

From a researcher’s point of view, it seems inevitable and matter-of-fact that NGS will become a routine part of clinical care. After all, these technologies have fundamentally altered how we study the genetic basis of disease. From that point of view, it can be hard for us to understand why these powerful tools haven’t been rapidly and thoroughly adopted by healthcare providers. In my new digs, I’m fortunate to work directly with clinicians and genetic counselors, which has been enormously helpful in understanding their perspective on genetic testing.

Here’s a useful illustration. Genetic testing panels — focused diagnostic assays that interrogate a specific set of genes associated with a certain phenotype or disorder — have been a routine part of clinical care for decades. NGS has the potential to improve the speed, cost, and diagnostic efficiency of such panels. Indeed, many testing providers are offering NGS-based tests. For any given condition, there might be 10+ commercial tests available.

This is good news for the clinicians, because they have the power to compare multiple options and choose the best one for their needs. As far as I can tell, it comes down to four things:

1. Comprehensiveness.

For panels, the set of genes tested should be as inclusive as possible. The total number tested is one of the first metrics that clinicians ask about. Obviously, this is a moving target: as new disease-associated genes are identified and credentialed, they’ll ideally be incorporated into the test.

Another important aspect of comprehensiveness is the coverage of the panel genes. For historical reasons, clinicians have come to expect that a gene included in a panel test will have 100% coverage, i.e., interrogate every single coding base. In the days of 3730 sequencing, test providers assured that by repeating assays (or even designing new primers) to achieve exhaustive sequencing coverage of all coding bases. Even recently, some test providers that moved to NGS will use capillary sequencing to fill the gaps in coverage.

This model is almost certainly untenable for a rapid, reasonably priced genetic test that relies on NGS. It’s inherent with targeted sequencing that some regions will have low or no coverage in some samples. What’s interesting to me is that some clinicians choose panels over whole-exome because they have the impression that sequencing coverage is somehow more complete in a panel test. Yet most test providers actually perform exome sequencing, but limit their reports the panel genes.

In other words, unless special probes are spiked in, the coverage of a given gene from a company’s panel test is probably the same as you’d get from their exome-wide test.

2. Del/Dup testing.

Panel tests should report not only single-nucleotide variants and small indels, but also structural variants (deletions and duplications) that might span exons or entire genes. This is vital when surveying the known genes for a specific condition, as many of the known causal lesions fall into this category.

Importantly, this requires a separate assay: usually, a custom array with oligos designed for each exon in every panel gene. It’s tempting to try and use the information obtained in targeted sequencing to call these, to save both time and expense. Can exome-based technologies uncover such forms of variation? Absolutely. I developed one of those methods. Yet I can also tell you that they’re inherently noisy — particularly for targeted sequencing data — and I would not be confident in using them to guide clinical care.

3. Fewer VUS.

Variants of Unknown Significance (VUS) on clinical reports are increasingly common, particularly as panels expand. I understand some of the reasons for this from the analysis point of view. However, Every VUS on the report creates a burden for the ordering clinician, and many of them could probably be promoted (to pathogenic) or ruled out with deeper analysis.

Thanks to efforts like the Exome Aggregation Consortium, we now have allele frequency data from cohorts of more than 50,000 individuals. These deep catalogs of human genetic variation should make it possible to discount an ever-larger-swath of rare but neutral variants.

4. Competitive pricing and turnaround.

Most would-be NGS testing providers can meet requirements 1-3 (or come close). The real challenge is to do so in a short timeframe for a market-competitive price. Doing so requires, among other things, robust automation of lab processes and analysis pipelines.

Determining whether or not a test’s price is competitive is a complex problem. More inclusive tests, for example, should logically have higher associated costs. Direct comparisons of competing products are made difficult because of differences in inclusiveness, coverage, reporting, etc.

The responsibility for payment adds another layer of complexity to this problem. A 100-gene panel test might cost the patient $5,000 or more. Often, the patients who need genetic testing the most are ones who really can’t afford it. Particularly if it’s not covered by insurance or Medicaid. I admit, I don’t know much about this part of healthcare, but it seems to me that the ideal genetic test is one whose cost can be justifiably reimbursed by insurance companies.


We take for granted that disruptive technologies tend to see rapid adoption in the research community, where the pressure from competition is high, but the risk is mitigated. None of our activities directly impact the well-being of patients or their families. By comparison, the adoption of new technologies in clinical settings seems painfully slow and tedious. It helps me to remember that clinical decisions have direct consequences on the lives and well-being of real people. With stakes that high, an abundance of caution seems warranted.


dbSNP exceeds a ridiculous 150 million variants

Earlier this week, I took a look at the dbSNP VCF file for build 147 (human) with Ben Kelly from the White Lab at NCH. Even summary statistics took a while to generate, and soon we realized why: dbSNP now contains a jaw-dropping 152.7 million reference variants. Roughly speaking, that’s one variant for every 20.5 base pairs in the human genome. They’re not all rare variants, either: 86 million variants are classified as common (G5, G5A, or COMMON), with minor allele frequencies >1% in at least one population.

dbSNP’s Ridiculous Growth

During the HapMap project in 2003-2004, we were astonished to see dbSNP hit 10 million variants. My boss at the time, Ray Miller, told me that some thought it could one day hit 50 million. We thought it might take decades, but dbSNP surpassed 50 million refSNPs just seven years later.

dbSNP growth 2016

dbSNP Growth, 2002-2016

Even more astonishing are the 6+ million coding variants: depending on how you define the exome, that’s about one variant every 5 or 6 base pairs in coding regions. Compounded by the fact that a single variant might affect multiple transcripts/genes, the number of observed human coding variants exceeds 22 million.

dbSNP coding variants

That being said, the fact remains that the vast majority of known variants in our genome lie outside of protein-coding exons. When annotated with snpEff, there are more than 80 million variants within or nearby genes, where they might play a regulatory role (again, multiple transcripts = multiple annotations per variant).

noncoding dbSNP 147

Noncoding dbSNP annotations.

The Evolving Utility of dbSNP

In the early days of next-generation sequencing, dbSNP provided a vital discriminatory tool. In exome sequencing studies of Mendelian disorders, any variant already present in dbSNP was usually common, and therefore unlikely to cause rare genetic diseases. Some of the first high-profile disease gene studies therefore used dbSNP as a filter. Similarly, in cancer genomics, a candidate somatic mutation observed at the position of a known polymorphism typically indicated a germline variant that was under-called in the normal sample. Again, dbSNP provided an important filter.

Now, the presence or absence of a variant in dbSNP carries very little meaning. The database includes over 100,000 variants from disease mutation databases such as OMIM or HGMD. It also contains some appreciable number of somatic mutations that were submitted there before databases like COSMIC became available. And, like any biological database, dbSNP undoubtedly includes false positives.

On the bright side, however, the rapid generation of genomic data worldwide has enabled deeper characterization of the variants that we know about. The 1,000 Genomes Project contributed genome-wide data for 2,504 individuals from several continental groups, while the Exome Aggregation Consortium (ExAC) has compiled gene-centric data from 60,706 individuals at the time of writing.

The Value of Variant Allele Frequencies

As a central repository for variant allele frequency (VAF) data, dbSNP can be a powerful resource for human genetics studies. Of particular relevance for rare disease genetics are the variant allele frequencies (VAFs) in worldwide populations. For a rare autosomal recessive disorder affecting 1 in 100,000 individuals, compound-heterozygous variants with VAFs of 0.01 in a certain population are too common: their combined frequency is 0.0001, or 1 in 10,000. In contrast, most known disease-causing variants — mutations that have been imported from OMIM, for example — are exceedingly rare.

Thus, while the mere presence of a variant in dbSNP is a blunt tool for variant filtering, dbSNP’s deep allele frequency data make it incredibly powerful for genetics studies: it can rule out variants that are too prevalent to be disease-causing, and prioritize ones that are rarely observed in human populations. This discriminatory power will only increase as ambitious large-scale sequencing projects like CCDG make their data publicly available.

Ancestry and Inclusion

Importantly, allele frequency data are most useful when the population matches the ancestry of the sample(s) being studied. In their current form, our databases are skewed towards major population groups (northwest European, East African, and East Asian). Many important geographic and ethnic groups are still under-represented. The reasons for this are complex, and not the focus of this post, but I think we can all agree that it’s vital to seek out and include samples from diverse ancestries as large-scale sequencing efforts move forward.

In Summary

At 150+ million variants, dbSNP is a massive beast, and still offers a useful discriminatory tool when used correctly. Proceed with caution.


A New Era for MassGenomics

When I started MassGenomics in 2008, next-generation sequencing was in its infancy. We’d sequenced AML1 — the first cancer genome — with two nascent platforms: Illumina/Solexa (32-bp reads) and 454 FLX (450-bp reads). Already, we had a glimpse of the bioinformatics challenges that these technologies brought forth.

Sequencing for Common Disease

It’s astonishing how far the field has come in just eight years. Factory-scale sequencing now makes it practically and economically feasible to sequence tens of thousands of (whole human) genomes in a single year. Washington University and other large-scale sequencing institutions are currently applying it to ambitious studies of cardiovascular, autoimmune, and neurological conditions. By studying tens of thousands of genomes, it should be possible to comprehensively define the genetic architecture underlying each of these common complex diseases.

Rare Disease and Clinical Applications

Yet there are other important applications of next-gen sequencing, such as:

  • The identification of genes underlying rare inherited disorders
  • Molecular diagnosis and characterization of undiagnosed diseases
  • Utilization of genomic information to improve clinical care

The distribution model for these applications is different from the factory-scale sequencing operation required for common disease. The democratization of NGS has empowered hundreds of smaller labs to carry out such research, and enabled rapid clinical sequencing at the point of care. That’s where the rubber hits the road, and it’s also where I want to be.

A New Position: Nationwide Children’s Hospital

Thus, after 13 years at Washington University, I’ve accepted a position as Principal Investigator at Nationwide Children’s Hospital. If the name of that institution sounds familiar, it’s because they’ve recruited Rick Wilson and Elaine Mardis to establish a new Institute for Genomic Medicine (IGM). Under their leadership, I’ll help build up the research program for the genetic basis of rare pediatric disorders.

Future Directions

So, what does this mean for MassGenomics? The blog will continue, hopefully at a greater frequency, and with a new emphasis into pediatric and clinical genomics. I should state for the record that the blog does not represent the views of Nationwide Children’s Hospital or the Ohio State University (where I’m now an assistant professor).

The McDonnell Institute at Washington University will continue on, by the way. The talented faculty and staff have already begun work on the common complex disease genomics (CCDG) program, while University leadership has initiated a search for a new director. They have capacity to spare, so if you’re looking for high-quality exome or genome sequencing (human or non-human), please reach out to Bob Fulton.

So that’s my news, and I hope to have more to share in the weeks to come.