The current issue of PLoS Genetics has an interesting article on the distribution of fitness effects (DFE) among new amino acid changing (nonsynonymous) mutations.
Adam R. Boyko et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4(5): e1000083. May 2008.
Call me old-fashioned, but I’m still impressed by strong datasets. The authors of this study resequenced the exons of 11,404 protein-coding genes in 35 individuals (20 EUA, 15 AFA), which provided a uniform ascertainment and frequency estimate for some 47,576 coding SNPs. The paper itself is very statistical in nature, with various “selection models” applied to determine the demographic and selective effects on amino acid variation in the human genome. Let me admit that I understand only the fundamentals of such things. While the authors only look at nonsynonymous and synonymous variants, they’ve done a lot of work to comprehensively investigate evolutionary models with their data. Let me hit you with the highlights:
- They investigated the unfolded nonsynonymous site frequency spectra using 13 different selection models, including some complex two-parameter and three-parameter models.
- The authors inferred a similar mean selection coefficient (-0.030) for newly arising mutations in European Americans as in African Americans, despite complications of demographic history (admixture) in both groups.
- Various manipulations of the data showed that two major potential confounding factors, SNP ascertainment bias and weak selection at presumably-neutral sites, had little influence on the inferences from their data set.
- The authors estimate that 10-20% of amino acid divergence between chimps and humans is due to positive selection. This figure holds in both African and European derived samples.
- According to best-fit models, 27-29% of nonsynonymous changes are neutral, 30-42% are modestly deleterious, and the remainder highly deleterious. Due to the strength of purifying selection, however, deleterious mutations make up <1% of common segregating SNPs (MAF >= 0.05) in human populations.
It follows from the last point above that the vast majority of common human genetic variation, i.e. SNPs with derived allele frequencies of at least 5%, is neutral or nearly neutral with respect to fitness. If this is true, then there are important implications for genetic association studies, which often rely on surveys of common genetic variation in the human genome. Such studies may miss the rare, highly deleterious mutations that are both evolutionarily and medically relevant.
The authors conclude that “re-sequencing in large samples of phenotypically extreme individuals, on the other hand, is much more likely to discover rare, large-effect mutations that are predicted… to be deleterious.” As a HapMap consortium member I’m not sure that I agree outright, but as an employee of the WashU Genome Sequencing Center, I have to say, resequencing is not a bad way to go.