Mary-Claire King on Inherited Breast/Ovarian Cancer

It is a rare but delightful opportunity to learn about something from an acknowledged world expert. Such was the case last month when I heard Mary-Claire King give the Stanley J. Korsmeyer Memorial lecture, hands-down one of the best talks I’ve ever heard. She was a wonderful public speaker: funny, charming, and straight-shooting.

Her topic, of course, was inherited breast and ovarian cancer. If you don’t know the story already, Dr. King wrote a wonderful perspective in Science about her role in the discovery of the BRCA1 gene and the race to clone it in the early 1990’s. Fascinatingly, she walked us through some of the pedigrees from early-onset breast cancer families described in the 1990 linkage study by her group.

The women in those families got breast cancer very young (20s or 3os) and usually died from it. Male obligate carriers were generally unaffected. Even for a highly penetrant mutation like BRCA1, there were exceptions, like the carrier who lived to 81 without ever getting cancer.

Of the seven early-onset breast cancer families, six harbored mutations in BRCA1 and one had a mutation in BRCA2. That paper was the culmination of 17 years of work and mapped the BRCA1 locus to chromosome 17.

Mapping BRCA1

Mapping the BRCA1 Region (Hall et al, Science 1990)

The existence of a gene for predisposition to breast cancer triggered enormous interest in big labs in government, universities, and the private sector. It was the birth of cancer genetics.

BRCA1, DNA Repair, and Chemotherapy

At the time of its discovery, we knew nothing about the function of the BRCA1 gene. Subsequent genetics studies would reveal that it worked as a tumor-suppressor in a two-hit model of inherited cancer: the disease develops only after carriers of one loss-of-function mutation (generally a nonsense change or frameshift indel) lost the other copy to somatic mutation in a vulnerable cell type.

Normally, BRCA1 forms a heterodimer with BARD1, which stabilizes the BRCA1/BARD1/Fanconi complex. That complex repairs double-stranded DNA breaks via the homologous repair pathway. Mutations in several DNA repair genes — TP53, PALB2, CHEK2, BARD1, BRIP1, ATM, RAD51C, and RAD51D — are also known to predispose to breast and ovarian cancer.

Although BRCA1/2 carriers suffer a significantly higher risk of breast and ovarian cancer, they also tend to respond better to chemotherapy. This is not terribly surprising, because the loss of homologous DNA repair capability diminishes the ability of cancer cells to recover from DNA damage. Yet there’s also a different mechanism for DNA repair, non-homologous end joining (NHEJ), that does not involve BRCA1/2.

The bad news is that this may enable tumor cells to resist chemotherapy. The good news is that we have a class of drugs, PARP inhibitors, that block the NHEJ pathway. The first clinical trial of PARP inhibitors in BRCA1/2 null cancer patients “crashed,” according to Dr. King, because the compound being used didn’t actually inhibit PARP. New clinical trials are under way. Hopefully, they’ll demonstrate that PARP inhibitors make BRCA1/2 null patients more responsive to chemotherapy, which will make genetic testing even more critical.

Genetics and Epidemiology of Familial Breast Cancer

The epidemiology of breast cancer is fairly well known. By rough approximation, 1 in 8 women will get breast cancer at some point in her lifetime, and 10-20% of patients will turn out to carry an inherited mutation in a known predisposition gene. Like many cancers, risk of breast/ovarian cancer is highly age-dependent. BRCA1/2 carriers not only have a higher lifetime risk of disease, but also have a considerably higher age-dependent risk; some might even be diagnosed with disease in their 20’s or 30’s.

There is also a widely accepted trend related to breast cancer incidence that’s been apparent for decades: more women are getting it, and seemingly at younger ages. Indeed, Dr. King showed some results from two large epidemiological studies of breast cancer showing that the incidence curves (incidence by age, classified by carrier/non-carrier status) are quite striking if you segregate the women into two groups: those born before 1940, and those born after 1940.

There are lots of theories for why this might be, including some I might call conspiracy theories (e.g. radiation exposure, or hormones in milk). Yet Dr. King offered an explanation that I find both simple and convincing. We know that certain factors increase a woman’s risk of breast cancer. For example, the age of first menstruation (earlier = higher risk) and when she has her first child (later = higher risk).

In 1950, a woman typically began menstruating at 15 and bore her first child at 21. Today, menstruation often begins sooner (say age 11, due to some complicated factors like better nutrition) and the first child often comes later (age 30, because women often pursue higher education and/or careers).

Nutrition and education/independence, of course, are good things. However, the side effect is that the window of time between menstruation and first child went from ~6 years in 1950 to ~19 years today. And during that window, a woman’s breast tissues are bathed in estrogen. It makes for some super-healthy cells that don’t die easily, even if they suffer mutations. That longer window simply increases the odds that a second “hit” will occur in the gene for which a woman already carries a loss-of-function mutation.

In support of this idea, if researchers adjust for the length of that time window, the year-of-birth effect totally goes away. I think that’s some fascinating stuff.

Genetic Structure of BRCA1/2

Interestingly, although the two most famous breast cancer susceptibility genes (BRCA1 and BRCA2) share no sequence similarity, they have a similar (and distinctive) genomic structure: many small exons and a large central exon. The central exon encodes a big portion of the protein and is surprisingly robust to amino acid substitutions, which is why most missense mutations in BRCA1 and BRCA2 are non-pathogenic.

brca1 and brca2 genes

BRCA1 and BRCA2 (Fackenthal & Olopade, Nat. Rev. Cancer, 2007)

Yet because these genes are so large, mutation databases have catalogued thousands of individual rare mutations that look deleterious. This is why a genotyping-based genetic test, like the one that was a cash cow for Myriad Genetics until recently, was never going to work in the long term. Now, with targeted sequencing, we have the capability to detect all types of mutation (substitutions, indels, even large SVs) affecting BRCA1/2 and other susceptibility genes.

From Gene Discovery to Population Screening

As the cost of sequencing-based genetic testing continues to drop, we’re in the position to screen the entire female population for cancer susceptibility genes.The World Health Organization offered guidelines for when genetic testing should be performed. In essence, four criteria must be met.

  1. The disease must be an important health problem
  2. Risk of disease for patients testing “positive” should be high.
  3. The mutations responsible for conferring risk must be identifiable
  4. Effective interventions must exist

Dr. King makes a pretty compelling argument that familial breast/ovarian cancer meets these requirements. #1 and #2 are well-established. #3 is true if you know your stuff: for a while, companies like Myriad leaned heavily on the “Variant of Unknown Significance” classification when they encountered a new variant, to the point that 88% of results were reported as such. Yet an expert team, like the one at UW, can classify all but <2% of variants as either pathogenic or non-pathogenic. The PARP inhibitor clinical trials should give us the answer for #4.

There are, of course, other considerations, like the cost of testing, the burden of genetic counseling, the age at which testing should be performed (Dr. King suggests 30), etc. Yet these are hurdles that can be overcome. Hurdles that must be overcome, if we’re to use our growing knowledge of disease genetics to improve the state of human health.


Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, & King MC (1990). Linkage of early-onset familial breast cancer to chromosome 17q21. Science (New York, N.Y.), 250 (4988), 1684-9 PMID: 2270482
King MC (2014). “The race” to clone BRCA1. Science (New York, N.Y.), 343 (6178), 1462-5 PMID: 24675952

The Human Epigenome Roadmap

Human epigenome tissues

Human Epigenomics Consortium, Nature 2015

Aside from the occasional somatic mutation, the genome of every cell in an individual’s body is largely preserved. Yet different types of cells (and tissues, and organs) are incredibly diverse. The majority of that specialization is governed by epigenetic changes — histone modifications, DNA accessibility, and methylation — that influence when and how genes are expressed.

Our knowledge of the epigenome has lagged well behind our knowledge of the genome, partly because it’s been difficult to study. The application of next-gen sequencing to RNA libraries (RNA-Seq), chromatin immunoprecipitates (CHiP-Seq), bisulfite-treated DNA, and regions of open chromatin (DNAse-Seq) makes it possible to interrogate many aspects of the genome in high-throughput fashion.

The NIH Roadmap Epigenomics Consortium has just published the largest collection of epigenomes characterized to date: 111 primary human tissues and cells profiled for histone modification patterns, DNA accessibility, DNA methylation, and gene expression. The 2,805 genome-wide datasets comprise 150.2 billion sequencing reads, equivalent to 3,174x coverage of the human genome. The findings, published in a slew of Nature papers earlier this year, provide remarkable insights into the complexity of the human epigenome.

Chromatin States from 5 Core Histone Marks

The authors first generated a common set of chromatin states across 127 epigenomes (111 of their own, and 16 more borrowed from ENCODE), all of which had been profiled for five core histone marks. These are somewhat confusingly named, but all of them indicate the addition of a methyl group (me) to a specific lysine residue (K) of histone H3, a key component of the nucleosome that undergoes various post-translation modifications. In any given cell, certain chromatin states tend to be marked with specific histone modifications:

  • Enhancer and promoter regions are marked with single or tri-methylation of lysine 4 (H3K4me1 or H3K4me3), respectively.
  • Transcribed regions are marked with tri-methylation of lysine 36 (H3K36me3)
  • Polycomb-repressed regions are marked with tri-methylation of lysine 27 (H3K27me3)
  • Tightly-packed heterochromatin is marked with tri-methylation of lysine 9 (H3K9me3).

Using the five core histone methylation marks I’ve just described, the authors trained a chromatin state model that classified every region into one of 15 states:

epigenome 15 states

15 Chromatin States (Roadmap Epigenomics Consortium, Nature 2015)

Each state has a characteristic histone pattern and can be roughly classified as either active (the first 8) or repressed. Looking at coverage, we can see that for any given epigenome, the majority of bases (68%) lack any histone marks, suggesting a quiescent (low activity) state. However, a significant fraction bore marks of active chromatin, including ~5% that appear to be active promoters or enhancers. Another important histone modification is acetylation (ac), which is the addition of an acetyl group to a lysine residue. A subset of the epigenomes were therefore profiled for H3K27ac and H3K9ac, which mark increased activation of enhancer and promoter regions.

Patterns of Chromatin States

DNA methylation gene structure

DNA methylation around genes (REC, Nature 2015)

Next, the authors turned to their other epigenomic profiling datasets — DNA accessibility (DNAse-Seq), methlation (bisulfite sequencing), and RNA transcription (RNA-Seq) to examine and compare the properties of these chromatin states. Consistent with previous studies, they found that:

  • Promoter states showed low DNA methylation and high accessibility
  • Transcribed states showed high DNA methylation and low accessibility
  • Enhancers showed intermediate DNA methylation and accessibility

Genes proximal to H3K27ac-marked enhancers showed significantly higher transcription, supporting the idea that enhancers act as local (cis) regulators of gene expression.

DNA accessibility

DNA accessibility by state (REC, Nature 2015)

Chromatin states sometimes predicted differences in RNA expression that weren’t captured by DNA methylation or accessibility measurements. For example, both enhancer (enh) and polycomb-repressed (repPC) states show intermediate levels of methylation (50-75%), but enhancers were more accessible and had more RNA transcription.

The Importance of Enhancers

Looking across all reference epigenomes, about 2.3 million regions (12.6% of the genome) showed evidence of promoter or enhancer activity in at least one cell or tissue type. These two states were enriched for non-exonic, evolutionary conserved regions. They also remained consistent across various cell types, except for a small subset that appeared to switch between promoter and enhancer states.

Enhancers that showed similar activity levels across cell/tissue types were enriched for similar gene functions and GWAS hits, suggesting that they may represent “coordinately regulated modules.” The sequences of these enhancers were enriched for many of the same transcription factor (TF) binding motifs; the authors could propose upstream regulators for about half of the enhancer modules they observed.

There were also many enhancers that showed tissue-specific activity. These were enriched for genes known to have tissue-specific expression. When the authors looked at disease-associated variants from the GWAS catalog, there were 58 studies with significant enrichment for certain tissue types. Many of these were in fact tissues known to be relevant for the disease. For example, several immune diseases (rheumatoid arthritis, lupus, coeliac disease, etc) were enriched for immune cell enhancers.

Epigenomes on the Map

There is much, much more to this study than I could hope to cover in one post, and that doesn’t begin to address the dozen or so companion papers that came out at the same time. The epigenome appears to be just as intricate and variable as the genome. Studying it will undoubtedly help us better understand how a deceptively simple genetic code provides the instructions for incredibly complex human beings.

Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu YC, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh KH, Feizi S, Karlic R, Kim AR, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJ, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai LH, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, & Kellis M (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518 (7539), 317-30 PMID: 25693563

Rare Variants in Complex Disease: ABCA7 and Alzheimer’s

Although the cost of sequencing continues to fall precipitously (cue the NIH sequencing-versus-Moore’s-Law figure), it’s still expensive relative to high-throughput genotyping. Whole-genome sequencing on the X Ten costs around $2500 per sample by the time you account for basic analysis and data storage. This means that a well-powered genetic association study for complex disease (10,000 samples) would cost over $20 million just for data generation. The same cohort genotyped on a high-density SNP array might only cost about $1 million. Undoubtedly, that’s why most large scale genome-wide association studies to date (>50,000 samples) have relied primarily on SNP array data.

There is a growing body of evidence, however, that rare variants (especially ones not present on SNP arrays) might confer a significant proportion of the genetic risk for complex disease. In age-related macular degeneration (AMD), for example, sequencing studies of moderate size (~5,000 samples) were able to identify rare coding variants in C3 and CFH associated with risk of disease. An important advantage of a sequencing approach is the ability to perform aggregation tests of private and rare coding variants (e.g. with the sequence kernel association test, SKAT) to boost the power to detect association.

A recent paper in Nature Genetics illustrates the feasibility of this approach for sequencing studies of complex disease. Stacy Steinberg and colleagues from deCODE Genetics conducted a search for rare functional variants in the known risk loci for Alzheimer’s disease (AD) using a unique resource: whole-genome sequences of 2,636 Icelanders imputed into 104,220 long-range phased individuals and their relatives.

So here we have a rare variant association study (RVAS) that employs several strategies for an efficient design:

  1. Studying an isolated population (Iceland), whose genetic structure enabled accurate genotype imputation of a large sample set (>100k individuals) with sequencing data for just 2,500.
  2. Analyzing missense variants with SKAT, which aggregates rare variants (i.e. collapses them at the level of the gene) to boost power for association but allows for multiple directions of effect.
  3. Examining only regions known to be associated with AD — which seem likely to harbor [rare] functional variants — to reduce the multiple testing penalty.

Targeted Association Studies

There are, of course, disadvantages to limiting the scope of association testing to known regions. Obviously you won’t be discovering any new associations, especially ones that sequencing (but not genotyping) might be able to uncover. Even so, you’re stacking the deck in your favor because the known GWAS loci almost certainly harbor some functional variation that hasn’t yet been fully interrogated.

Sometimes, sequencing will only serve to replicate the common variant association signal (i.e. not find anything new). Yet these targeted approaches might help narrow the boundaries of the associated region — which could encompass dozens or hundreds of genes — or, even better, identify disruptive variants whose LD with the lead SNP makes them good candidates for causal variants. Thirdly, one might uncover secondary independent association signals in GWAS loci, implicating that there are multiple haplotypes that influence disease risk.

Variant Annotation and Aggregation

As anyone who has done aggregation/burden testing in association studies can tell you, the analysis choices can have a significant impact on results. The annotation tool/source, MAF threshold, and variant mask (definition of what’s deleterious and should be included) can introduce a lot of variability. In this case, the authors tried two variant masks:

  1. Loss of function variants: nonsense, frameshift or canonical splice site variants. These are usually quite rare, and so the authors collapsed them to a single “meta variant” at the level of the gene.
  2. Missense variants: nonsynonymous variants or splice region variants. This latter one is an interesting choice, and not necessarily one I’d have thought to make at the discovery stage.

The burden tests included only variants with MAF<1% and information (call rate) >0.80. The authors tested about 80 genes across the 17 loci, and the top-scoring hit was ABCA7 (p=0.00020).

Splice Region Variation in ABCA7

ABCA7 encodes ATP-binding cassette transporter A7, a member of ABC transporters that move lipids across membranes. The SKAT result was primarily driven by a single variant, c.5570+5G>C. Without it, the test had a p-value of 0.46. If you’re familiar with the notation, then you know that c.5570+5 indicates a noncoding variant 5 bases into an intron. We call this the “splice region” and, unlike the canonical splice site (+/- 2bp) it’s not clear that variants here affect splicing.

But the authors had another NGS tool to look at this: RNA-seq. When they looked at the transcript sequences of c.5570+5G>C carriers, they included a retained intron that eventually included a stop codon.

splicing variant in ABCA7 in Alzheimers

Intron retention in carriers (Steinberg et al, Nat. Gen. 2015, Fig S1)

The image here is from Supplemental Figure 1 (the main text had no figures) and shows the intron retention in c.5570+5G>C carriers. Side note: according to the legend, the coordinates are on NCBI build 36, which practically a crime. But moving on, the RNA-seq results justified including the variant in the loss-of-function test (mask #1), which then yielded a p-value of 5.3e-10 with odds ratio of 1.97.

Follow-up and Replication of Association

With a possible causal variant in hand, the authors next examined the long-range haplotypes to see if this variant was on the same background as rs4147929, the common variant previously associated with AD by GWAS. It was never on the same allele, which is a fascinating result; the common variant signal and this rare variant association appear to be independent. It’s possible, therefore, that the mechanisms are different as well.

To replicate the association, the authors genotyped ABCA7 loss-of-function variants in study groups from Europe and the United States, finding a p-value of 0.0056 with OR of 1.73. When combined with the Icelandic data by meta-analysis, the OR was 2.03 and the p-value 6.8e-15.

What’s Next for AD and Common Disease

ABCA7 certainly merits future studies, both in the genetics realm and in the laboratory for functional evaluation. It’s strongly expressed in the brain, where it promotes the efflux of phospholipids and cholesterol to apoA-I and apoE. But the ortholog of ABCA7 in C. elegans and results from mouse models suggest that regulation of phagocytosis might be the primary function of the gene. The authors tested for correlation between variants in ABCA7 and two disease-associated alleles (in APOE and TREM2), but found none. Thus, the mechanism by which ABCA7 loss-of-function confers susceptibility to AD will need further investigation.

Still, it’s a promising start to detangling the etiology of a complex human disease, and a demonstration of the power of genome sequencing to uncover promising new leads.

Steinberg S, Stefansson H, Jonsson T, Johannsdottir H, Ingason A, Helgason H, Sulem P, Magnusson OT, Gudjonsson SA, Unnsteinsdottir U, Kong A, Helisalmi S, Soininen H, Lah JJ, DemGene, Aarsland D, Fladby T, Ulstein ID, Djurovic S, Sando SB, White LR, Knudsen GP, Westlye LT, Selbæk G, Giegling I, Hampel H, Hiltunen M, Levey AI, Andreassen OA, Rujescu D, Jonsson PV, Bjornsson S, Snaedal J, & Stefansson K (2015). Loss-of-function variants in ABCA7 confer risk of Alzheimer’s disease. Nature genetics PMID: 25807283

Science Fiction: Going Viral

The rapid advance of next-generation sequencing technologies, particularly in the last several years, has almost seemed like something out of a science fiction novel. Think about it: on a HiSeq X Ten instrument, we can sequence a complete human genome in less than a week, at a cost that’s 0.00001% of what it took to fund the Human Genome Project.

It might surprise you to learn that — in addition to my blog posts here, and the grant/paper writing I do for my job — that I dabble in science fiction writing as well. If you think that scientific publication/success is hard (10% acceptance rate for tip-tier journals, or 8% NIH funding level), you should look into the the fiction side of publishing sometime.

The acceptance rate for most professional science fiction magazines (for short fiction) is generally below 1%. The pay is usually $0.05-$0.10 per word, meaning that a 4,000 word story might bring $200-400 in the (unlikely) event that you get it professionally published. The odds of landing a literary agent — which is required, if you want to have your novel shopped to most traditional publishing houses — are about 1 in 1,000.

A few months ago, Third Flatiron Publishing (which does quarterly science fiction anthologies) announced that their Spring 2015 anthology would be themed around world-altering events. As it happened, I’d written a science fiction story that seemed like it might fit — it was about a couple of researchers working in a dusty lab who stumble upon a universal cure for cancer (you remember I said science fiction, right?), and their struggle to make it available to the world.

The Time It Happened

I’m thrilled to say that the editors at Third Flatiron liked my story enough to choose it for their anthology The Time It Happened, which just came out and is available on Amazon in both Kindle and paperback versions. They’ve also bought audio rights, and intend to create a free podcast of my story (as well as a couple of others) sometime in the near future.

Since you readers enjoyed the non-fiction I write for MassGenomics, hopefully you’ll enjoy this as well.