Last week I attended the third annual “Personal Genomes” meeting at Cold Spring Harbor. The meeting opened with a keynote talk by NHGRI director Eric Green, who reminded us that finding the pathway to genomic medicine is the central mission of NHGRI. He mentioned several of the past successful initiatives that have yielded key findings concerning human genetic variation and its relationship to phenotype: The HapMap Project (common variation), the ENCODE Project (functional variation), and the 1,000 Genomes Project (rare variation), to name a few. He showed the absolutely stunning growth of the NHGRI-hosted genome-wide association study (GWAS) catalog, which currently holds ~2,600 associations from 780 publications.
Dr. Green also discussed the dichotomy of genetic architecture underlying human diseases, and took the position that while we’ve made substantial progress studying rare, monogenic, mendelian disorders (predominantly caused by coding mutations), we face a more daunting task with common, complex, multigenic diseases because he believes that these arise from primarily noncoding mutations.
Theme 1: Human Mutation Rates
Several talks addressed the topic of mutation rate in human genomes. Donald Conrad, who will be joining the WashU Genetics Department next year, presented mutation rate as a quantitative trait based on 1,000 Genomes Project trio data. Three of the primary sources of variation in mutation rate are age (males have 3x-6x higher rates), environment, and genetic variation (e.g. inherited aging disorders).
Lee Hood gave an excellent keynote on “Systems Genetics and P4 Medicine”, part of which was a discussion of mutation rate. His group uses whole-genome sequencing (WGS) of family cohorts (in this case, the Miller syndrome family quartet), focusing on the ~2.3 GBP of non-repetitive reference sequence. Using the family information and inheritance modeling, they identify de novo mutations in the offspring, which manifest as errors of Mendelian inheritance. Validation using a custom capture array for 60,000 candidate sites followed by deep sequencing showed that only 1/1,000 “new” mutations in the offspring were real; the vast majority proved to be sequencing errors. That works out to a mutation rate of 1.1 x 10-8, or roughly 70 mutations per child.
Lynn Jorde (Univ. of Utah) later gave a talk on directly estimating human mutation rate by WGS, also using the Miller syndrome quartet. Sequencing by Complete Genomics yielded >50x fold coverage per subject; there were ~4 million positions in the 1.8 Gbp of “useful” reference sequence in which at least one subject differed from the reference. Only 330,000 or so SNPs were novel (not known to dbSNP), and 20% of these proved to be sequencing errors. More array validation, more calculations, and the same answer as given by Dr. Hood: a mutation rate of 1.1 x 10-8.
Theme 2: Personal Cancer Genomes
Cancer genomes were another focus of the meeting. Sean Grimmond (Univ. of Brisbane, Queensland, Australia) presented some of his group’s work on pancreatic cancer as part of the International Cancer Genome Consortium (ICGC). Pancreatic is one of the most deadly forms of cancer; about 90% of patients diagnosed die within one year. Brisbane has assembled a very nice workflow from sample collection to sequencing, that includes pathology review, tumor dissection, QA, and microarray analysis to determine tumor cellularity. The sequencing strategy (WGS, exome, and RNA-seq) differs between high-cellularity (70-100%) and low-cellularity (~30%) tumors. The ultimate deliverable is a “tumor report” documenting cellularity estimates, microarray findings, cytogenetics, what sequencing was done, and what mutations were found.
James Brugarolas (UT Southwestern Medical Center) described the genome evaluation and functional studies of a patient with clear cell renal carcinoma. I learned a bit more about this form of cancer – 85% of tumors prove to be the “clear cell” carcinoma; common lesions include 3p loss (VHL gene) and 5q35 gain. This particular tumor underwent Illumina whole-genome sequencing to 35x coverage; some 46 somatic mutations were validated. One of these was in a gene whose protein product complexes with mTOR, the central player in a known cancer pathway. The tumor was successfully xenografted to a mouse model; some 43/46 somatic mutations were retained, and all had higher frequencies (similar to our findings on basal-like breast cancer). The xenograft let them test a few different cancer drugs – erlotinib (an EGFR inhibitor that had no effect), sunitinib (the front-line therapy for these patients, also no effect), and others. Intriguingly, however, the tumor was sensitive to an mTOR inhibitor compound.
Rick Wilson (The Genome Center at Washington University) gave a talk on whole-genome sequencing of leukemia patients at WashU. Of the 50+ leukemia patients sequenced to date, most have less than 20 valid protein-altering mutations. For most patients, low-resolution cytogenetic screens are the paradigm for disease classification and treatment decisions. Favorable-risk patients (17% of cases) undergo light chemotherapy. For adverse-risk patients (22% of cases), an all0-matched bone marrow transplant is the standard of care. That leaves a large body of patients (~61%) with “intermediate” risk according to cytogenetics; here, the correct treatment decision is harder to make. Better stratification of intermediate-risk patients is the first goal. Dr. Wilson related a fascinating case study, a 39-year-old female with suspected acute promyelotic leukemia, in which rapid-turnaround WGS was able to provide an accurate diagnosis that was not obtained by conventional FISH, and ultimately guided her treatment.
Theme 3: Genome Regulation and Epigenetics
Peter Laird (Univ. Southern California, LA) led us out of the genome to the epigenome with his talk on mining the cancer methylome. He argued that the first steps in oncogenesis may be epigenetic changes, specifically, the dysrgeulation of genes due to abnormal methylation. Dr. Laird presented what he’s calling the first cancer methylome – a tumor sample and matched normal control that underwent bisulfite treatment and sequencing to ~30x coverage. As expected, bisulfite sequencing yielded very accurate estimates of DNA methylation (r=0.97 with Illumina Infinium) but was able to do so across the complete human genome with base-pair resolution.
Theme 4: Exome Sequencing
There is a ton of exome sequencing going on. I saw at least two posters describing “whole” exome sequencing in 1,000 cases and 1,000 controls. I put “whole” in quotes because it’s not true at this point; people really shouldn’t be going around saying that the “whole exome” was sequenced. It’s more like 80-90% of known genes. Rick Lifton spoke about some of the valuable applications of exome sequencing – finding dominant reproductive lethal mutations, unraveling recessive traits with high locus heterogeneity, characterizing somatic mutations in cancer, and identifying rare variants associated with common disease. He described recently published work in which recessive mutations in WDR62 were linked to severe brain malformations by exome sequencing. Matt Bainbridge gave a nice overview of the exome sequencing currently under way at Baylor. So yes, it turns out that groups outside of WashU are doing exome sequencing too.
Other Presentations of Note
There were just too many presentations to talk about. Stacia Wyman (Fred Hutchinson Cancer Center, Seattle) described post-transcriptional modification of microRNAs in prostate cancer. Randeep Singh (Philips Research Asia) brought us up to date on population genetics in India, and mentioned that we’ll soon see publication of the genomes of two “high profile” Indians. Two speakers from HudsonAlpha Institute (Huntsville, AL) – Richard Myers and Katherine Varley – spoke about “functional genomics” of allele-specific TF binding and methylation, respectively.
I look forward to hearing how CSHL talks compared to those going on at “Genome Informatics”, currently underway at the Wellcome Trust Sanger Institute.
wei says
is there an typo here ‘Three of the primary sources of variation in mutation rate are age (males have 3x-6x higher rates)’ ?
“only 1/1,000 “new” mutations in the offspring were real” is shocking. How does this translate to the error rate among singletons in a random sample?