Many, if not most human diseases have a genetic component. Thanks to advances in next-gen sequencing, a plethora of studies in recent years have shed light on the role of germline variants in heritable diseases, and of somatic mutations in cancer. They are also beginning to unravel the role of true de novo mutations — genetic variants that arise in a child but are not present in either parent — in human disease. This was the subject of an excellent article just out in Nature Reviews Genetics. Let me give you the highlights.
Most of what we know about de novo mutations in humans comes from recent whole-genome and exome sequencing studies in families:
- On average, humans acquire ~74 de novo single nucleotide variants (SNVs) per genome per generation.
- The rate of de novo mutations seems higher in individuals with genetic diseases, particularly sporadic disorders such as intellectual disability and autism.
- Perhaps surprisingly, the de novo mutational load seems correlated with paternal (as opposed to maternal) age.
- Mutations linked to sporadic disease are usually highly disruptive to gene function, often affecting important domains of developmental genes.
Diseases Linked to de novo Mutations
De novo mutations tend to be more deleterious than inherited variation because they haven’t undergone the same level of evolutionary selection. This fact, combined with the observation that they occur with some appreciable frequency, makes de novo mutation an an intriguing explanation for sporadic diseases. In support of this notion, recent family-based exome sequencing studies have implicated de novo mutations in a number of rare syndromes.
Gene | Product | Disorder |
SETBP1 | SET binding protein 1 | Schinzel-Giedion syndrome (mental retardation and neurodegeneration). |
MLL2 | Mixed-lineage leukemia 2 | Kabuki syndrome (intellectual disability and congenital anomalies). |
ASXL1 | Additional sex-like combs 1 | Borhing-Opitz syndrome (severe intellectual disability and congenital malformations. |
ANKRD11. | Ankyrin repeat domain 11 | KBG syndrome (facial/skeletal malformations and developmental delay). |
CHD7. | Chromodomain helicase DNA binding protein 7 | CHARGE syndrome (birth defects, heart defects, breathing problems). |
ACTB/ACTG1 | Actin beta / actin gamma 1 | Baraitser-Winter syndrome (brain malformation). |
AKT1 | V-akt murine thymoma viral oncogene homolog 1 | Proteus syndrome (skin overgrowth and atypical bone development). |
The last example, Proteus syndrome, does not run in families but has been reported in discordant monozygotic twins. Intriguingly, the causal mutation was only identified when exome sequencing was performed on the disease overgrowth tissue; it was absent from peripheral blood. This phenomenon of somatic mosaicism has since been observed in other sporadic disorders.
Challenges for Detection of de novo Mutations
The authors of this article are well-informed, and they take particular care to highlight some of the challenges facing de novo mutation discovery. True mutations are quite rare, which means that the vast majority of initial predictions from next-gen sequencing will likely be artifacts. A search for variants found in a child but absent from both parents will enrich for two types of errors: false positives found only in the child, and false-negatives that were simply missed in one or both parents. Indeed, when I heard about some of the first attempts to characterize de novo variants by whole genome sequencing, the false positive rate was >99%.
Since sequencing mother-father-child trios is the preferred method to discover these mutations, one immediately thinks of an outstanding resource: the CEU (CEPH European) and YRI (Yoruban) trios that were part of the HapMap and later 1,000 Genomes projects. These are among some of the most extensively characterized genomes in the world, from genotyping to sequencing to gene expression to cellular phenotyping. Unfortunately, while the DNA from these samples is readily available, it comes from lymphoblastoid cell lines (LCLs). LCLs harbor mutations of their own due to immortalization and culturing. A similar difficulty would be encountered with WGA DNA.
Because true de novo mutations occur randomly (and newly) in individuals, there’s no database like dbSNP to guide discovery. We must instead rely on deeper sequence coverage, better algorithms, and ultimately, orthogonal validation.
Large Forms of De Novo Mutation
True de novo mutations need not be SNVs. Cytogenetic analysis has been used to characterize large-scale de novo events (such as trisomy 21 in Down Syndrome) for decades. Deletions and duplications have been linked to mental retardation for some time, initially using microarray technology. More recent studies have estimated that large (>100 kbp) de novo CNVs occur in about 1 in 50 individuals. Undoubtedly there are smaller structural variants and indels that we simply aren’t yet able to detect accurately.
Predicting Phenotypic Consequences of de novo Mutations
Even when true de novo mutations have been characterized, predicting their phenotypic consequences presents a number of challenges. In coding sequences, the best evidence to implicate a gene requires looking across a significant number of samples, to find genes that:
- Harbor mutations in multiple (unrelated) cases with a similar phenotype, and
- Lack similarly damaging mutations in populations of unaffected individuals
This evidence can be bolstered, of course, with information from model organisms, functional assays, pathway analysis, and evolutionary conservation of the affected nucleotides.
The realization that de novo mutations are collectively common across human populations (~74 per individual) suggests that they may contribute to common disease susceptibility as well. The authors deduce that for any given genetic disease, the proportion of cases due to de novo mutation will be high if:
- Monogenic causes predominate
- The number of dominant disease-associated genes is high
- Dominant mutations have strongly negative fitness effects, which reduces the likelihood that inherited variation plays a role due to natural selection.
Conversely, if dominant mutations have modest fitness effects, then inherited variants likely play a larger role because they’re far more numerous.
Maternal and Paternal Age and Social Change
The long-known fact that maternal age is correlated with incidence of aneuploidy in offspring, and the recent observation that the number of de novo SNVs correlates to paternal age, have important implications in light of this trend: people are waiting longer to start families. Average maternal and paternal ages have risen steadily for decades.
While there are many social and economic factors contributing to this, the biology is unchanged. This suggests that the number of diseases caused by de novo mutations, too, is probably on the rise.
References
Veltman JA, & Brunner HG (2012). De novo mutations in human genetic disease. Nature reviews. Genetics, 13 (8), 565-75 PMID: 22805709
[…] it has been established that the number of true de Novo SNVs in a genome is less than 100, we can expect one or more of the three individual genotypes at each of those ~150k sites in each […]