Next-gen Sequencing for DNA Forensics

NGS forensics

Image Credit: brilliantbias dot com

The criminal justice system in the United States is rarely called an early adopter of new technology. Despite the major impact of forensic DNA testing over the last quarter-century, the tools deployed by most forensics laboratories are rudimentary by modern standards. If you compare what the FBI does with an unknown DNA sample to what 23andMe can do (faster and with lower costs), the differences are quite striking. The FBI can determine whether the DNA matches a reference sample (or an entry in its CODIS database) and that’s about it. In contrast, 23andMe can tell you how much Neanderthal is in your DNA and help you find your third cousins.

There’s encouraging news for the field of applied DNA forensics, as earlier this month the National Institute of Justice awarded an $825,000 grant to the Battelle Memorial Institute to “conduct feasibility and validation tests on a suite of new investigative tools that use next-generation sequencing.” It came to my attention thanks to Julia Karow’s feature article over at In Sequence.

Now seems like an excellent time to discuss the potential for NGS applications as well as some of the significant challenges.

How DNA is Currently Used in Forensics

Codis DNA forensics loci

Codis Core Loci (Wikipedia)

Importantly, nearly all routine DNA testing for forensic purposes involves capillary electrophoresis of short tandem repeats, or STRs. Unlike SNPs, which tend to have two alleles, STRs have numerous alleles, defined by the number of repeats at each locus. Because they’re so polymorphic, STRs are well-suited for finding a “match” between two samples.

Theoretically speaking, about 10-12 carefully chosen STR loci are sufficient to identify an individual. In the U.S., the FBI uses a panel of 13 (plus the AMELX/AMELY loci for sex determination) for its COmbined DNA Indexing System (CODIS) database.

The current applications for DNA in forensics laboratories are focused on matters of the justice system:

  • Matching a crime scene DNA sample to a reference (i.e. a cheek swab) to implicate a suspect, either directly or indirectly (i.e. using a close relative).
  • Searching for a match among the profiles of criminals in federal databases like the FBI’s CODIS.
  • Comparing samples from two crime scenes to learn if the same suspect was responsible
  • Identification of human remains or found individuals in missing persons cases

Notably, although the STR profiles in CODIS are well-suited for matching DNA samples, they’re poorly suited for distinguishing an individual’s ancestry.

How DNA Could be Used in Forensics

DNA sequencing/genotyping technologies and applied human genetics have advanced considerably since the establishment of CODIS in the 1990’s. One obvious application, hinted at in the proposed plans of the Battelle/NIJ grant, is to empower the identification of “unknown” DNA samples (from crime scenes or missing persons cases) by building a profile of likely physical characteristics:

  • Ancestry or continent of origin, based on ancestry-informative markers (AIMs). Back in 2006, when I worked in the lab of Raymond E. Miller, we developed a panel of about 25 SNPs that reliably distinguished between individuals of African, Asian, or European origins. With more SNPs one could trace ancestry with much higher resolution.
  • Physical appearance, especially eye or hair color. These are not simple Mendelian traits, but with sufficient markers one could determine the probability of brown versus blonde hair, or blue versus brown eyes. Of course, it’s now possible to change your apparent hair or eye color using dyes and colored contacts, respectively, but let’s not go there.
  • Deep familial matching is also possible, particularly with a sequencing-based assay that will detect rare variants unique to certain pedigrees.
  • Faster, less expensive results may ultimately be what drives the adoption of new technologies. The turnaround time for DNA testing in most situations is slow, and the backlog is substantial.

 Challenges of DNA Forensics

Given the obvious utility of new sequencing and genotyping technologies in forensic applications, you might be wondering: Why has it taken so long? Having done some work in this field, and discussed the matter with multiple law enforcement agencies, I can tell you some of the reasons.

Judicial Merit, Acceptance, and Precedence

First, adoption of new technologies is slow in the criminal justice system. To be useful in a criminal investigation, they must establish judicial precedence which takes a lot of vetting and a lot of time. No one wants to waste money on a technology that will be easily disputed by defense attorneys, right? The governmental agencies that operate most forensics laboratories exist as part of the justice system. They need to prove the merit of something before devoting substantial resources to it.

Established Infrastructure

It is difficult to emphasize how important the CODIS database is to forensic laboratories. Any technology that displaces capillary electrophoresis of STRs absolutely must provide CODIS profiles. Remember, CODIS has been around a long time (officially since the federal “DNA Identification Act of 1994″). As of last year, it contained 10.3 million offender profiles, 1.5 million arrestee profiles, and 493,500 forensic profiles. And it had assisted more than 200,000 investigations.

In short, CODIS is not going away. Next-gen sequencing is theoretically capable of generating STR profiles, but short read lengths make this more difficult.

Impure and Degraded Samples

In a research setting, we panic when we have sample contamination or low amounts of genomic DNA. We know it affects things like mapping rate and duplication rate of the resulting reads. Just a heads-up here: the DNA samples often encountered in forensic situations are a lab manager’s nightmare. They’re almost always mixtures of DNA from multiple people. They may be heavily degraded. They might not even be human in origin. Rarely will pure samples with good amounts of DNA come into a forensics laboratory.

This makes many of the desirable applications of forensic DNA sequencing a bit more difficult. Degraded DNA is typically fragmented and thus more difficult to sequence. Even when you can, the coverage might be incomplete. As I mentioned, the FBI uses 13 identification loci in its CODIS database. To add a profile, all 13 must be attempted, and at least 10 must be tested successfully.

Ethics and Privacy Concerns

You know how much I like to talk about the elephant in the room: ethical, legal, and social considerations for expanded DNA testing capabilities by government agencies. Groups such as the ACLU are already concerned with how DNA testing can provide “partial” (familial) matches. Imagine getting a knock at your door, and opening it to find two police detectives.

“Are you Mr. Smith?” one of them asks.

“Uh, yes,” you answer, already nervous from the sight of their badges and guns.

“We’re investigating a homicide,” says the other detective. “DNA evidence from the scene was a partial match to you. We’d like to ask you about all of your second and third cousins.”

This is a slight exaggeration, but with the power of next-gen sequencing and human genetics, it’s still plausible. We can’t live our live distrusting every single person and organization of authority, but we do need to have open conversations about implications of advanced DNA testing.

I tip my hat to the Battelle Institute for tackling both the hard science and the sticky issues. They have a bumpy road ahead.




Return of Results: Genetics Experts Weigh In

Genetics experts return of results

Image credit: 123 RF

In my last post, I wrote about the return of results from next-gen sequencing, specifically a recent paper in AJHG about secondary findings in ~6500 ESP exomes. Today we’ll delve into another paper in the same issue on the attitudes of genetics professionals on return of incidental findings from whole genome sequencing (WGS) and exome sequencing (ES).

Joon-Ho Yu and colleagues conducted a survey of around 850 genetics professionals to gauge their attitudes toward:

  1. The return of clinical ES/WGS results
  2. The process of returning results
  3. The ACMG recommendations for secondary findings

Responding Genetics Experts

To identify potential respondents, the authors first collected e-mail addresses, professional degrees, and states of residence from three societies: the American Society of Human Genetics (ASHG), the American College of Medical Genetics (ACMG), and the National Society of Genetic Counselors (NSGC). They sent out 9,857 invitation e-mails and had 847 respondents, for a completion rate of around 8%.

The majority of those respondents were:

  • Female (58%)
  • White (98%)
  • Non-Hispanic (96%)
  • Residents of the U.S. (81%)
  • In academia (73%)

Various professions were well-represented among respondents, including clinical geneticists (24%), genetic counselors (22%), and human geneticists (19%).

Return of Incidental Findings

Responses to the heady questions from the survey are depicted in Figure 1, which I’ve adapted here:

Return of results survey

Adapted from Figure 1 of Yu et al, AJHG 2014

Overall, genetics experts were very supportive of the idea that some secondary findings should be returned from clinical ES/WGS. The majority of respondents agreed that incidental results should be offered to:

  • Adult patients (85% agreed)
  • Healthy adults (75% agreed)
  • Parents of a child with a medical condition (74% agreed)

Where the Experts Agree

Nearly all experts (88%) supported offering results about childhood-onset conditions to the parents of child patients, and most (62%) would also offer information about the child’s results for adult-onset conditions. The last result in the figure above is an important one: the vast majority of experts (81%) agree that the preferences of a patient or family should guide which results are offered for return. And most (66%) agreed that a web-based tool would suffice to assess those preferences.

Where the Experts Don’t Agree

The experts were divided (~40% agreed/disagreed) on whether only actionable secondary results should be returned, and less than half (44%) thought that giving patients and families the option to choose which results to receive would improve care. Respondents also differed in their opinions on what kind of results to return.

Obligation to return results

Yu et al, Supp. Fig 1 (AJHG 2014)


When asked about the type(s) of conditions that merited return of positive incidental findings, the experts chose:

  • Mendelian disorders (67%)
  • Adverse drug reactions (61%)
  • Carrier status (49%)
  • Complex traits (20%)

And about 25% responded that healthcare providers had no obligation to return secondary results.

How to Return Results

Winning the survey’s least-surprising category was the part where respondents were asked to rank, by order of preference, the manner in which incidental findings should be communicated.

Method 1st Choice 2nd Choice 3rd Choice 4th Choice
A face-to-face meeting with a genetic counselor 78.5% 9.5% 7.2% 4.8%
A phone call with a genetic counselor 5.6% 63.0% 26.5% 4.9%
An interactive website with access to counseling 13.6% 23.4% 53.4% 9.5%
A report sent in the mail 2.3% 4.0% 13.0% 80.8%

Unsurprisingly, most respondents put a face-to-face meeting with a genetic counselor as their first choice. Most popular second choice, a phone call with a genetic counselor. An interactive website with access to genetic counseling by phone or online was a popular third choice (37% ranked it first or second). Everyone hated the idea of sending reports by mail.

The ACMG Gene List

Recently, the ACMG published their recommended list of 57 genes/conditions for which incidental findings should be returned. Their recommendations got a lot of press, and received a vigorous (and mixed) response from the medical and research community. In this survey, 68% of genetics professionals agreed that results from the ACMG list should be reported, regardless of the indication for sequencing.

However, only 29% felt it was the responsibility of the health care professional to decide which results on the minimal list should be returned. And the majority of respondents (70%) disagreed with the notion of returning secondary findings from the ACMG list regardless of the patient/family preferences for getting that information.

Challenges of Returning Results

Next, the genetics professionals were asked for their perspective on the greatest challenge of returning results from clinical ES/WGS.

Challenges of returning results

Yu et al, Supp. Fig 1 (AJHG 2014)

The greatest concern is one that I’ve heard before, particularly in the roundtable discussion on genetic testing last year: health care providers simply don’t have the time and may not have the expertise to return incidental findings. There are also concerns about the effect of returning secondary findings on the patients and families:

Concern results may cause

Yu et al, Supp. Fig 1 (AJHG 2014)

The foremost concern by a long margin was the anxiety and stress that this knowledge might cause the patient. That’s why asking and honoring the patient/family preferences beforehand (i.e. before the sequencing even happens) is so important. There are privacy concerns as well; about a third of respondents worried that recipients of secondary findings might experience discrimination.

Clearly, the decisions about whether to return results, which results to return, and how to do so will be difficult to address. It’s also unclear, at least to me, which group or organization or (dare I say) government body should call the shots. We’ve heard from the genetics professionals, but it’s also important to hear from two other groups: primary care physicians and the general public (i.e. patients and families). These people arguably have the most at stake, so their opinions should carry significant weight.

We should move quickly to collect the information necessary for well-guided decision making, because one thing is clear: Next-gen sequencing will soon be a routine part of clinical care, whether we like it or not.

Yu JH, Harrell TM, Jamal SM, Tabor HK, & Bamshad MJ (2014). Attitudes of genetics professionals toward the return of incidental results from exome and whole-genome sequencing. American journal of human genetics, 95 (1), 77-84 PMID: 24975944

Return of Results from Next-gen Sequencing

Return of results next gen sequencing

Image credit: CDC Blogs

The rapid adoption of next-gen exome and genome sequencing for clinical use (i.e. with patient DNA) raises some difficult questions about the return of results to patients and their families. In contrast to traditional genetic testing, which usually checks for variants in specific genes, high-throughput sequencing has the potential to reveal a number of secondary findings, i.e., genetic variants with medical relevance but not related to the condition that merited the test.

Two articles in the current issue of AJHG delve into the sticky issue of incidental genetic findings. Holly Tabor et al analyzed de-identified exome data from 6,517 individuals obtained from NHLBI’s Exome Sequencing Project (ESP). They examined the burden of pathogenic variants in three sets of biomedically important genes:

  1. Genes underlying 31 Mendelian conditions, most of which are inborn errors of metabolism, recommended for newborn screening (NBS, n=39)
  2. Genes associated with the risk of age-related macular degeneration, a complex disease and the most prevalent form of vision loss (ARMD, n=17)
  3. Genes known to influence drug response, i.e. replicated pharmacogenetics hits from PharmGKB (PGx, n=14).
Variant GERP scores

Tabor et al, AJHG 2014

Looking only at SNVs called by GATK, the authors identified 10,879 variants affecting the 70 disease genes across the full ESP cohort. Unsurprisingly, filtering this set to include only variants with a high call rate that were listed in OMIM and HGMD for the correct phenotype, reduced the set by over 90%, to around 400 total variants.

Included versus Excluded Variants

Next, the authors evaluated some of the characteristics of variants that made it through to their final set, versus variants that they’d excluded. Because pathogenic mutations should be under strong purifying selection, one would expect them to be extremely rare and to occur at positions with high conservation across evolution.

The lovely violin plots at right show the mean GERP scores (a measure of conservation) for included versus excluded variants in the newborn screening (NBS), age-related macular degeneration (ARMD), and pharmacogenetics (PGx) genes examined. As the authors hoped, GERP scores were significantly higher for included versus excluded variants, particularly for the severe recessive disease genes screened for in newborns. Included ARMD and PGx variants also had higher GERP scores, but with a wider spread.

A comparison of the Polyphen-2 scores, which offer computational estimates of how damaging amino acid substitutions will be, also showed significant differences, with included variants in the NBS predicted to be far more deleterious than the excluded variants. The effect was again consistent but less striking in the ARMD/PGx sets.

Together, these patterns are consistent with the idea that mutations underlying severe, highly penetrant phenotypes (i.e. the NBS set) are more deleterious — and thus under stronger natural selection — than variants associated with complex phenotypes like ARMD and PGx.

Rate of Incidental Findings

Carrier burden recessive alleles

Tabor et al, AJHG 2014

Having established that their final set of ~400 variants was properly vetted, the authors set out to establish the burden of pathogenic mutations that might be found in any individual’s exome. The majority of included variants were rare, with MAF<0.5%.

The carrier burden in the NBS set was surprisingly high (0.57 per exome), with 45% of individuals carrying at least one allele and 11% carrying at least two alleles. If the ARMD and PGx variants were also considered, each individual carried 15.3 risk alleles on average.

These findings challenge the assumption that secondary findings (actionable results) and incidental findings (potential clinical utility) uncovered by exome or genome sequencing are rare. Indeed, a research highlight on the paper from Nature Genetics noted that the study demonstrates the “striking prevalence of actionable incidental or secondary results, including ones of direct clinical usefulness, which might be obtained in patient sequencing.”

In my next post, I’ll tackle the medical community’s opinions about sharing secondary findings, based on a recent survey of 900 genetics professionals.

Tabor HK, Auer PL, Jamal SM, Chong JX, Yu JH, Gordon AS, Graubert TA, O’Donnell CJ, Rich SS, Nickerson DA, NHLBI Exome Sequencing Project, & Bamshad MJ (2014). Pathogenic variants for mendelian and complex traits in exomes of 6,517 European and african americans: implications for the return of incidental results. American Journal of Human Genetics, 95 (2), 183-93 PMID: 25087612

Sequencing Finnish Population Isolates (SISu)

sequencing in finnish suomiIf you compare any individual’s genome to the human reference sequence, you’ll find around 3 million differences. Most of these (95%) area already known, and have been catalogued in databases like dbSNP. Many are common, and shared by 5% or more of human populations. They may still have biomedical relevance, of course; genome-wide studies of common genetic variation (GWAS studies) have found thousands of genetic loci associated with disease susceptibility and other complex traits.

But there are still huge numbers of rare (MAF<0.5%) and low-frequency (MAF<5%) genetic variants. Their contribution to human health is harder to understand, particularly because such variants:

  • Are usually not included on high-density SNP arrays
  • Occur in few individuals, and thus require large cohorts
  • Have low individual power for genetic association

One way to address the challenges of rare variants is to study them in founder populations in which such variants are more common. Ashkenazi Jews and Amish families, for example, have undergone population bottlenecks effects: a limited number of founders gave rise to the current populations.

This breeding isolation, whether cultural or geographic in nature, increases the frequency of some variants that are otherwise quite rare in broad populations.  And if those variants underlie a genetic disorder, the risk of the disease is increased. Ashkenazi Jews, for example, have increased risk of many uncommon genetic disorders.

Finland has a unique population history — a bottleneck followed by geographic isolation — the result of which is a Finnish “disease heritage”: a high incidence of 40+ Mendelian disorders. Dozens of rare Mendelian disease genes were mapped in Finns, and that knowledge is valuable for understanding disease biology. What about rare variants underlying common, complex disease? Here the Finns have an important resource: nationalized health records with decades of follow-up data.

Sequencing Initiative Suomi (SISu)

The Sequencing Initiative Suomi (SISu) aims to combine the unique population structure, the health records, and the substantial Finnish interest in genetics. The first study from SISu, just out in PLoS Genetics, compares the exomes of 3,000 Finns to an equal number of non-Finnish Europeans (NFEs). They found:

  • A depletion of “singletons” (variants only seen in one individual) in Finns: 3.7 times fewer singletons than NFEs
  • An excess of low-frequency variants (MAF 0.5-5%) in Finns relative to NFEs
  • Similar patterns of common variants between Finns and NFEs
Finnish loss of function variants

Lim et al, PLoS Genetics 2014

All of these are consistent with the expected bottleneck effect on Finnish populations. When variants were stratified by annotation (i.e. their predicted effect on genes), Finns had a higher proportion of likely-deleterious missense variants and more severe loss-of-function (LoF, or protein-truncating) variants. The average Finn had 0.160 homozygous LoF variants, whereas the average NFE had 0.095.

To determine if some of these enriched LoF variants have phenotypic effects, the authors genotyped 83 of them in 36,262 individuals from three large Finnish cohorts. Using the deep phenotype data — quantitative traits like blood pressure, lipids, etc. — they found 5 significant associations.

LPA association

LPA level (Lim et al, PLoS Genet 2014)

One of these was an association between splice site variants in the gene encoding lipoprotein A (LPA) and decreased levels of circulating lipoprotein A. As it happens, circulating LPA is a risk for coronary heart disease. Looking at the medical records showed that LPA splice variants are protective for cardiovascular disease.

This is only a proof-of-principle study, the tip of the SISu iceberg. Yet it shows the value of sequencing Finnish populations to identify rare variants contributing to complex diseases. Undoubtedly, as large-scale sequencing of Finnish cohorts continues at places like WashU and the Broad Institute, we’ll have even more power to identify genes relevant for common diseases.