Return of Results: Genetics Experts Weigh In

Genetics experts return of results

Image credit: 123 RF

In my last post, I wrote about the return of results from next-gen sequencing, specifically a recent paper in AJHG about secondary findings in ~6500 ESP exomes. Today we’ll delve into another paper in the same issue on the attitudes of genetics professionals on return of incidental findings from whole genome sequencing (WGS) and exome sequencing (ES).

Joon-Ho Yu and colleagues conducted a survey of around 850 genetics professionals to gauge their attitudes toward:

  1. The return of clinical ES/WGS results
  2. The process of returning results
  3. The ACMG recommendations for secondary findings

Responding Genetics Experts

To identify potential respondents, the authors first collected e-mail addresses, professional degrees, and states of residence from three societies: the American Society of Human Genetics (ASHG), the American College of Medical Genetics (ACMG), and the National Society of Genetic Counselors (NSGC). They sent out 9,857 invitation e-mails and had 847 respondents, for a completion rate of around 8%.

The majority of those respondents were:

  • Female (58%)
  • White (98%)
  • Non-Hispanic (96%)
  • Residents of the U.S. (81%)
  • In academia (73%)

Various professions were well-represented among respondents, including clinical geneticists (24%), genetic counselors (22%), and human geneticists (19%).

Return of Incidental Findings

Responses to the heady questions from the survey are depicted in Figure 1, which I’ve adapted here:

Return of results survey

Adapted from Figure 1 of Yu et al, AJHG 2014

Overall, genetics experts were very supportive of the idea that some secondary findings should be returned from clinical ES/WGS. The majority of respondents agreed that incidental results should be offered to:

  • Adult patients (85% agreed)
  • Healthy adults (75% agreed)
  • Parents of a child with a medical condition (74% agreed)

Where the Experts Agree

Nearly all experts (88%) supported offering results about childhood-onset conditions to the parents of child patients, and most (62%) would also offer information about the child’s results for adult-onset conditions. The last result in the figure above is an important one: the vast majority of experts (81%) agree that the preferences of a patient or family should guide which results are offered for return. And most (66%) agreed that a web-based tool would suffice to assess those preferences.

Where the Experts Don’t Agree

The experts were divided (~40% agreed/disagreed) on whether only actionable secondary results should be returned, and less than half (44%) thought that giving patients and families the option to choose which results to receive would improve care. Respondents also differed in their opinions on what kind of results to return.

Obligation to return results

Yu et al, Supp. Fig 1 (AJHG 2014)


When asked about the type(s) of conditions that merited return of positive incidental findings, the experts chose:

  • Mendelian disorders (67%)
  • Adverse drug reactions (61%)
  • Carrier status (49%)
  • Complex traits (20%)

And about 25% responded that healthcare providers had no obligation to return secondary results.

How to Return Results

Winning the survey’s least-surprising category was the part where respondents were asked to rank, by order of preference, the manner in which incidental findings should be communicated.

Method 1st Choice 2nd Choice 3rd Choice 4th Choice
A face-to-face meeting with a genetic counselor 78.5% 9.5% 7.2% 4.8%
A phone call with a genetic counselor 5.6% 63.0% 26.5% 4.9%
An interactive website with access to counseling 13.6% 23.4% 53.4% 9.5%
A report sent in the mail 2.3% 4.0% 13.0% 80.8%

Unsurprisingly, most respondents put a face-to-face meeting with a genetic counselor as their first choice. Most popular second choice, a phone call with a genetic counselor. An interactive website with access to genetic counseling by phone or online was a popular third choice (37% ranked it first or second). Everyone hated the idea of sending reports by mail.

The ACMG Gene List

Recently, the ACMG published their recommended list of 57 genes/conditions for which incidental findings should be returned. Their recommendations got a lot of press, and received a vigorous (and mixed) response from the medical and research community. In this survey, 68% of genetics professionals agreed that results from the ACMG list should be reported, regardless of the indication for sequencing.

However, only 29% felt it was the responsibility of the health care professional to decide which results on the minimal list should be returned. And the majority of respondents (70%) disagreed with the notion of returning secondary findings from the ACMG list regardless of the patient/family preferences for getting that information.

Challenges of Returning Results

Next, the genetics professionals were asked for their perspective on the greatest challenge of returning results from clinical ES/WGS.

Challenges of returning results

Yu et al, Supp. Fig 1 (AJHG 2014)

The greatest concern is one that I’ve heard before, particularly in the roundtable discussion on genetic testing last year: health care providers simply don’t have the time and may not have the expertise to return incidental findings. There are also concerns about the effect of returning secondary findings on the patients and families:

Concern results may cause

Yu et al, Supp. Fig 1 (AJHG 2014)

The foremost concern by a long margin was the anxiety and stress that this knowledge might cause the patient. That’s why asking and honoring the patient/family preferences beforehand (i.e. before the sequencing even happens) is so important. There are privacy concerns as well; about a third of respondents worried that recipients of secondary findings might experience discrimination.

Clearly, the decisions about whether to return results, which results to return, and how to do so will be difficult to address. It’s also unclear, at least to me, which group or organization or (dare I say) government body should call the shots. We’ve heard from the genetics professionals, but it’s also important to hear from two other groups: primary care physicians and the general public (i.e. patients and families). These people arguably have the most at stake, so their opinions should carry significant weight.

We should move quickly to collect the information necessary for well-guided decision making, because one thing is clear: Next-gen sequencing will soon be a routine part of clinical care, whether we like it or not.

Yu JH, Harrell TM, Jamal SM, Tabor HK, & Bamshad MJ (2014). Attitudes of genetics professionals toward the return of incidental results from exome and whole-genome sequencing. American journal of human genetics, 95 (1), 77-84 PMID: 24975944

Return of Results from Next-gen Sequencing

Return of results next gen sequencing

Image credit: CDC Blogs

The rapid adoption of next-gen exome and genome sequencing for clinical use (i.e. with patient DNA) raises some difficult questions about the return of results to patients and their families. In contrast to traditional genetic testing, which usually checks for variants in specific genes, high-throughput sequencing has the potential to reveal a number of secondary findings, i.e., genetic variants with medical relevance but not related to the condition that merited the test.

Two articles in the current issue of AJHG delve into the sticky issue of incidental genetic findings. Holly Tabor et al analyzed de-identified exome data from 6,517 individuals obtained from NHLBI’s Exome Sequencing Project (ESP). They examined the burden of pathogenic variants in three sets of biomedically important genes:

  1. Genes underlying 31 Mendelian conditions, most of which are inborn errors of metabolism, recommended for newborn screening (NBS, n=39)
  2. Genes associated with the risk of age-related macular degeneration, a complex disease and the most prevalent form of vision loss (ARMD, n=17)
  3. Genes known to influence drug response, i.e. replicated pharmacogenetics hits from PharmGKB (PGx, n=14).
Variant GERP scores

Tabor et al, AJHG 2014

Looking only at SNVs called by GATK, the authors identified 10,879 variants affecting the 70 disease genes across the full ESP cohort. Unsurprisingly, filtering this set to include only variants with a high call rate that were listed in OMIM and HGMD for the correct phenotype, reduced the set by over 90%, to around 400 total variants.

Included versus Excluded Variants

Next, the authors evaluated some of the characteristics of variants that made it through to their final set, versus variants that they’d excluded. Because pathogenic mutations should be under strong purifying selection, one would expect them to be extremely rare and to occur at positions with high conservation across evolution.

The lovely violin plots at right show the mean GERP scores (a measure of conservation) for included versus excluded variants in the newborn screening (NBS), age-related macular degeneration (ARMD), and pharmacogenetics (PGx) genes examined. As the authors hoped, GERP scores were significantly higher for included versus excluded variants, particularly for the severe recessive disease genes screened for in newborns. Included ARMD and PGx variants also had higher GERP scores, but with a wider spread.

A comparison of the Polyphen-2 scores, which offer computational estimates of how damaging amino acid substitutions will be, also showed significant differences, with included variants in the NBS predicted to be far more deleterious than the excluded variants. The effect was again consistent but less striking in the ARMD/PGx sets.

Together, these patterns are consistent with the idea that mutations underlying severe, highly penetrant phenotypes (i.e. the NBS set) are more deleterious — and thus under stronger natural selection — than variants associated with complex phenotypes like ARMD and PGx.

Rate of Incidental Findings

Carrier burden recessive alleles

Tabor et al, AJHG 2014

Having established that their final set of ~400 variants was properly vetted, the authors set out to establish the burden of pathogenic mutations that might be found in any individual’s exome. The majority of included variants were rare, with MAF<0.5%.

The carrier burden in the NBS set was surprisingly high (0.57 per exome), with 45% of individuals carrying at least one allele and 11% carrying at least two alleles. If the ARMD and PGx variants were also considered, each individual carried 15.3 risk alleles on average.

These findings challenge the assumption that secondary findings (actionable results) and incidental findings (potential clinical utility) uncovered by exome or genome sequencing are rare. Indeed, a research highlight on the paper from Nature Genetics noted that the study demonstrates the “striking prevalence of actionable incidental or secondary results, including ones of direct clinical usefulness, which might be obtained in patient sequencing.”

In my next post, I’ll tackle the medical community’s opinions about sharing secondary findings, based on a recent survey of 900 genetics professionals.

Tabor HK, Auer PL, Jamal SM, Chong JX, Yu JH, Gordon AS, Graubert TA, O’Donnell CJ, Rich SS, Nickerson DA, NHLBI Exome Sequencing Project, & Bamshad MJ (2014). Pathogenic variants for mendelian and complex traits in exomes of 6,517 European and african americans: implications for the return of incidental results. American Journal of Human Genetics, 95 (2), 183-93 PMID: 25087612

Sequencing Finnish Population Isolates (SISu)

sequencing in finnish suomiIf you compare any individual’s genome to the human reference sequence, you’ll find around 3 million differences. Most of these (95%) area already known, and have been catalogued in databases like dbSNP. Many are common, and shared by 5% or more of human populations. They may still have biomedical relevance, of course; genome-wide studies of common genetic variation (GWAS studies) have found thousands of genetic loci associated with disease susceptibility and other complex traits.

But there are still huge numbers of rare (MAF<0.5%) and low-frequency (MAF<5%) genetic variants. Their contribution to human health is harder to understand, particularly because such variants:

  • Are usually not included on high-density SNP arrays
  • Occur in few individuals, and thus require large cohorts
  • Have low individual power for genetic association

One way to address the challenges of rare variants is to study them in founder populations in which such variants are more common. Ashkenazi Jews and Amish families, for example, have undergone population bottlenecks effects: a limited number of founders gave rise to the current populations.

This breeding isolation, whether cultural or geographic in nature, increases the frequency of some variants that are otherwise quite rare in broad populations.  And if those variants underlie a genetic disorder, the risk of the disease is increased. Ashkenazi Jews, for example, have increased risk of many uncommon genetic disorders.

Finland has a unique population history — a bottleneck followed by geographic isolation — the result of which is a Finnish “disease heritage”: a high incidence of 40+ Mendelian disorders. Dozens of rare Mendelian disease genes were mapped in Finns, and that knowledge is valuable for understanding disease biology. What about rare variants underlying common, complex disease? Here the Finns have an important resource: nationalized health records with decades of follow-up data.

Sequencing Initiative Suomi (SISu)

The Sequencing Initiative Suomi (SISu) aims to combine the unique population structure, the health records, and the substantial Finnish interest in genetics. The first study from SISu, just out in PLoS Genetics, compares the exomes of 3,000 Finns to an equal number of non-Finnish Europeans (NFEs). They found:

  • A depletion of “singletons” (variants only seen in one individual) in Finns: 3.7 times fewer singletons than NFEs
  • An excess of low-frequency variants (MAF 0.5-5%) in Finns relative to NFEs
  • Similar patterns of common variants between Finns and NFEs
Finnish loss of function variants

Lim et al, PLoS Genetics 2014

All of these are consistent with the expected bottleneck effect on Finnish populations. When variants were stratified by annotation (i.e. their predicted effect on genes), Finns had a higher proportion of likely-deleterious missense variants and more severe loss-of-function (LoF, or protein-truncating) variants. The average Finn had 0.160 homozygous LoF variants, whereas the average NFE had 0.095.

To determine if some of these enriched LoF variants have phenotypic effects, the authors genotyped 83 of them in 36,262 individuals from three large Finnish cohorts. Using the deep phenotype data — quantitative traits like blood pressure, lipids, etc. — they found 5 significant associations.

LPA association

LPA level (Lim et al, PLoS Genet 2014)

One of these was an association between splice site variants in the gene encoding lipoprotein A (LPA) and decreased levels of circulating lipoprotein A. As it happens, circulating LPA is a risk for coronary heart disease. Looking at the medical records showed that LPA splice variants are protective for cardiovascular disease.

This is only a proof-of-principle study, the tip of the SISu iceberg. Yet it shows the value of sequencing Finnish populations to identify rare variants contributing to complex diseases. Undoubtedly, as large-scale sequencing of Finnish cohorts continues at places like WashU and the Broad Institute, we’ll have even more power to identify genes relevant for common diseases.




Population Whole-genome Sequencing: Dutch Edition

Genomes of the netherlands

GoNL Consortium, Nat. Gen. 2014

The last time I checked, the database of human genetic variation (dbSNP) contained over 50 million unique sequence variants. And yet, as anyone who analyzes exome or whole-genome sequencing data can tell you, every individual harbors a significant number of variants (usually around 5% of single nucleotide variants, or SNVs) that dbSNP has never seen.

These “private” or rare variants undoubtedly contribute to important phenotypes, such as disease susceptibility. Non-SNV variants, like indels and structural variants, are also under-represented in public databases. The only way to fully elucidate the genetic basis of a trait is to consider all of these types of variants, and the only way to find them is by large-scale sequencing.

In this month’s issue of Nature Genetics, the Genome of the Netherlands (GoNL) Consortium reports the whole-genome sequencing of 250 Dutch families from 5 biobanks across the Netherlands. The families comprised mostly parent-child trios (n=231), along with some family quartets with monozygotic (n=11) or dizygotic (n=8) twins. All told, it was 769 individuals whose genomes were sequenced ~13x depth.

Variant Calling

Granted, this is a modest coverage depth, considering that sequencing to 30x or 40x might be considered standard. To help address this, the authors performed joint sample calling with GATK. That, along with a combination of 10 indel/SV calling tools, yielded the following:

  • 20.4 million biallelic SNVs
  • 1.2 million biallelic indels of 1-20 bp
  • 27,500 larger deletions (>20 bp)

Here’s a quick tour of the highlights in each variant class

Single Nucleotide Variants (SNVs)

Dutch genome SNPs dbSNP

GoNL Consortium, Nat. Gen. 2014

Half of the 20.4 million SNVs discovered in this study were rare, with MAF < 0.5%. The others were not quite evenly divided between low-frequency SNVs (4.0 million with MAF 0.5-5%) and common SNVs (6.2 million with MAF >5%). Altogether, there were around 7.6 million SNVs that were novel to dbSNP 137. Most of those 75%) were singletons, meaning that they were observed in just one individual. If we consider only the 500 unrelated individuals sequenced (the parents), that’s about 15,200 novel contributed variants per sequenced genome.

Among the ~2 million singletons uncovered in the European panel of a different project (1,000 Genomes), 16.5% were observed in GoNL. The authors therefore expect that a “substantial number” of singleton variants reported by these projects will be seen again as larger European cohorts are sequenced. Even so, that’s a lot of “private” variation. Remember, too, that these cohorts are from northwest Europe, arguably one of the best-characterized ancestry groups thus far.

Indels and Structural Variants

Compared to SNV calling, the detection of indels and larger SVs remains a considerable challenge. Anyone working in NGS informatics can tell you that. The authors have put forth a good effort in this arena by combining the results of 10 different variant callers: GATK UnifiedGenotyper, Pindel, 1-2-3SV, Breakdancer, DWAC, CNVnator, FACADE, MATE-CLEVER, GenomeSTRiP and SOAPdenovo. These are all different algorithms, but they boil down to five approaches for uncovering indels and SVs:

  1. Gapped reads alignments to the reference (e.g. GATK)
  2. Split reads, an approach pioneered by Pindel
  3. Paired-end read distance/orientation (e.g. BreakDancer)
  4. Overall read depth changes (e.g. CNVnator)
  5. De novo assembly of SV breakpoints.
Dutch genome SV calls

GoNL Consortium, Nat. Gen 2014

Some of the tools use one approach, while others employ multiple approaches. No single indel/SV caller has emerged as vastly superior to all others, so combining the results from a suite of different tools seems like a good strategy. The authors divided variants into three size categories (1-20 bp, 20-100 bp, and >100 bp) and kept any SV detected by at least two orthogonal tools. Their validation rate (138/144, or 96.5%) for randomly-chosen SVs of at least 20 bp is impressive.

The size distribution of consensus calls showed peaks at +/- 4 bp (microsatellite instability), ~300 bp (SINEs), and ~6 kbp (LINEs). Not remarked upon in the manuscript is the largest peak right near zero, since 1-2 bp indels are by far the most common. While 54.4% of short indels (<20 bp) were already in dbSNP, virtually all of the mid-size deletions (30-500 bp) were not (98.4%). Thus, this study helps fill an important gap in the catalogue of human sequence variation.

Functional Variation

LOF variants in dutch genomes

Rare, low freq, and common variant distributions (GoNL, Nat. Gen 2014)

Because these families were not obtained “on the basis of phenotype or disease,” their patterns of genetic variation provide a useful model for apparently healthy individuals.

Rare Loss-of-Function Variants

Among rare variants identified in this study, the authors observed an excess of nonsense SNVs and frameshift indels, consistent with the expectation that damaging variants would be under strong purifying selection.

A similar excess-of-rare-events was evident for larger deletions that removed the first exon or >50% of the coding sequence of a gene. The effect was even stronger when considering only genes in the OMIM database, reflecting strong purifying selection against structural changes in key genes.

On average, each individual in GoNL had about 60 nonsense or splice-site SNVs. Most of these, however, were common in the cohort (MAF>5%, and thus unlikely to be deleterious), which illustrates the need for cautious interpretation of apparent loss-of-function (LOF) variants. Looking at rare variants, and using synonymous SNVs as a baseline, the authors estimate that each individual might have 4-5 rare loss-of-function SNVs.

Compound Heterozygous Events

Individuals that were compound-heterozygous (i.e. one variant on each parental haplotype) for rare loss-of-function SNVs/indels/SVs were extremely rare. The authors found just 3 such instances across the cohort (an average of 0.01 events per individual). Such events are thus of considerable interest for disease studies.

Compound heterozygosity for common LOF variants should have been far more prevalent, because these are less likely to be truly deleterious. Indeed it was; there were about 3 compound heterozygous events of common LOF variants per individual. Interestingly, the 1,917 such events observed across the entire cohort were confined to 11 genes (C11orf40, DEFB126, GSTT2, HTR3D, KRTAP4-8, MS4A14, OR13C2, SIGLEC12, TRY6, VWDE, and WNK1) which all seem to have high mutation tolerance.

HGMD False Positives

The human gene mutation database (HGMD) is a commercial repository of “disease causing” variation in humans. When the authors of this study annotated variants with HGMD information, each individual harbored about 20 such variants. In other words, HGMD annotation would suggest that a large number of GoNL individuals have diseases with profound physical (or even lethal) consequences. Whoops.

It’s possible that the HGMD variants simply cause disease in non-Dutch populations, or have low penetrance. An alternative possibility is that HGMD has a lot of false positives. Among the 1,093 HGMD variants in GoNL, almost a third had MAF>1%, which is much higher than the frequency of the diseases they’re reported to cause.

De Novo Mutations

de novo mutation calling

De novo mutation calling performance, GoNL, Nat. Gen 2014

One of the most fascinating aspects of this study was the exploration of de novo mutations (variants present in a child but absent from both parents). These events are extremely rare (occurring at a rate of around 1 in 100 million bases), and identifying them absolutely requires sequencing at least three genomes: an individual and both biological parents.

Even then, they’re very difficult to find: Across the 258 independent offspring in GoNL there were 4.5 million apparent Mendelian violations. The authors applied a method (PhaseByTransmission) to refine this to around 29,162 candidate autosomal de novo mutations. That’s about 63 per offspring, far too many.

So the authors attempted to independently validate over 1,000 candidate de novo mutations by orthogonal sequencing, and found that around 50% were false positives. Some independent Complete Genomics data for 19 parents and 1 child revealed another 1,137 events that were false positives. From these 2,270 observations, the authors developed a random forest classifier to predict whether a predicted mutation would be truly de novo or not based on a number of different properties. This is something that many other groups (including ours) have done for somatic mutation calling in cancer genomes.

The classifier in this study, which had an estimated accuracy of 92%, relied primarily on factors related to the sequencing depth and read counts, which happens to be the basis for mutation detection with VarScan 2. When applied to the GoNL dataset, the classifier nominated 11,020 high confidence de novo mutations — roughly 42.7 per offspring — with a range of 18 to 74 per offspring. That’s still a bit higher than it should be, but still reasonable for downstream analysis.

Paternal Age and de novo Mutation Rate

paternal age and mutation rate

GoNL Consortium, Nat. Gen 2014

The authors observed a significant correlation between the father’s age at conception and the number of de novo mutations in the child. This is the third study to report such a trend, and the largest sample size yet. Although the ages of mother and father are highly correlated, its effect on de novo mutation rate was primarily due to paternal influence.

The authors estimate that each additional year of paternal age caused a 2.5% increase in the number of de novo mutations in the child. Under their model, about 75% of de novo mutations come from the father, and 25% from the mother. Phase analysis using read pairs (a complex process I won’t go into) revealed that 76% of de novo mutations were indeed on the paternal haplotype. So you can thank your dad for 3/4 of your de novo mutations.

De Novo Indels and SVs

The authors attempted to find de novo indels and structural variants. It didn’t go well.

In Summary

The authors have employed moderate-coverage whole genome sequencing to build a resource of 1,000 haplotypes for a small, densely-populated country in northwestern Europe. They added 7.6 million SNVs to dbSNP, and also characterized a large number of new indels and SVs. Many more studies which apply genome sequencing to large population cohorts will be necessary to fill out the catalogue of human genetic variation.