Integrating copy number and gene expression data in breast cancer

A study in Nature reports the genomic and transcriptomic architecture of breast cancer from a survey of ~2,000 tumors.  These samples were collected in Canada and the UK; what makes the collection particularly valuable is that they were fresh-frozen and clinically annotated, with long-term follow-up. Patients whose tumors were ER-negative and/or lymph-node-positive had received systematic chemotherapy, ER-positive or LN-negative patients had not, and none of the patients with Her2+ tumors received Herceptin (trastuzumab). Thus, the tumors were all clinically homogeneous within subgroups, making this a great resource to study the genomic landscape of breast cancer.

Breast Cancer Subtypes

A quick overview of breast cancer subtypes seems appropriate here. Most breast cancers are carcinomas, meaning that they arise from epithelial cells. A histology review typically classifies these as originating from the milk-passage (ductal) or milk-producing glands (lobules) of the breast. Tumors can also be assigned to subgroups on the basis of gene expression: a 50-gene assay called PAM50 is widely used to classify tumors as one of 4-5 “intrinsic” subtypes. Among the most important genes from a clinical perspective are those encoding estrogen receptor (ER), progesterone receptor (PR), and Her2 (ERBB2) receptor. The four most common intrinsic subtypes:

Subtype Typical ER/PR/Her2 Status Prevalence Notes
Luminal A ER+ and/or PR+, Her2- 42-59% Most common and best prognosis
Luminal B ER+ and/or PR+, Her2+ 6-19% Slightly worse prognosis
Her2-enriched ER-, PR-, Her2+ 14-20% Often poor prognosis
Basal-like/Triple-negative ER-, PR-, Her2- 7-12% Often aggressive, poorer prognosis
Source: Susan G. Komen Foundation

There is substantial but incomplete overlap between basal-like and triple-negative breast cancer. Their genetic basis is not as well understood, and they typically don’t respond to targeted hormone therapies because they don’t express ER, PR, or Her2.

Integrating SNP and Copy Number Data with Gene Expression

In this study, the authors assessed the impact of SNPs, inherited copy number variants (CNVs), and acquired copy number alterations (CNAs) on the gene expression landscape. With the statistical power of 2,000 samples (half in a discovery set, half in a validation set), they were able to search for both cis-regulatory (variants affecting nearby genes) and trans-regulatory (variants affecting distant genes) relationships. Genome-wide analysis of variance (ANOVA) revealed that germline SNPs/CNVs and somatic CNAs influenced >39% of gene expression probes, roughly half acting in cis and half in trans.

  • Somatic CNAs dominated the regulatory picture, contributing to >96% of significant expression associations
  • On a gene-by-gene basis, germline SNPs rivaled CNAs in explaining a greater proportion of the variation.
  • The contribution of inherited CNVs was minimal by comparison

Although the dominating influence of somatic CNAs is understandable, the relatively small contribution of CNVs to the expression picture is rather surprising. It’s possible that inherited regions of CNV with strong influence on gene expression are targeted for amplification/deletion by cancer cells, which might obscure their effect in an otherwise normal cell. Otherwise, it does seem to suggest that germline SNPs have a greater influence than CNVs when it comes to modulating gene expression.

Cis versus Trans Regulation

Some ~20% of loci examined exhibited cis-regulatory assocations between somatic CNAs and gene expression. In other words, acquired copy number alterations influence the expression of genes within them or nearby. The authors undertook a higher-resolution survey of these associations within tumor subtypes, finding known driver events, such as amplifications of MYC, CCND1, ERBB2, and CCNE1 and deletions of PTEN and MDM2, as well as putative but suggestive events involving MDM1, MDM4, CDK3, CDK4, PI4KB, NCOR1, and others. They also highlight three apparently novel cis-regulatory associations that may influence breast cancer development and progression:

  1. Loss of PPP2R2A, a regulatory sub-unit of a complex that governs mitotic exit. Somatic mutations in another subunit of the same complex (PPP2R1A) were recently identified in clear cell ovarian cancers and endometrioid cancers.
  2. Frequent deletion of MTAP that co-occurs with deletion of known tumor suppressors CDKN2A and CDKN2B.
  3. Recurrent deletion of MAP2K4 concomitant with outlying expression in ER-positive cases.

To examine trans-regulatory events, the authors plotted matrices of CNA-expression relationships by chromosome (gene location on the Y-axis, CNA location on the X-axis). Visualized in this manner, any patterns off of the diagonal (where a CNA influences a gene on the same chromosome) indicate a trans-acting event. There was strong-evidence of such patterns on chromosomes 1q, 7p, 8, 11q, 14q, 16, 17q, and 20q, all of which are the targets of frequent large-scale copy number alteration in breast cancer.

The “hotspots” of these trans associations, when grouped by pathway, highlight known targets of dysregulation in breast cancer such as ERBB2 and MYC. You might notice that these two were also cis-regulatory association list above, and make the intuitive leap to conclude that amplifications targeting ERBB2 (on chr17) and MYC (on chr8) increase the expression of these genes, which in turn drives expression changes for genes elsewhere in the genome.

Integrative Clustering Reveals Novel Subgroups

The authors next took 997 tumors in the discovery set, integrated copy number and gene expression data, and performed clustering analyses to identify subgroups of tumors with distinct features and clinical outcomes. They came up with 10 “integrative clusters”, which they replicated in the validation set (995 cases). Among these clusters are some interesting subsets:

  • A high-risk, ER-positive subgroup with a steep mortality trajectory (bad), composed of 11q13/14 cis-acting luminal tumors that harbor other common alterations. The authors note that 11q13 contains the CCND1 gene, frequently targeted for amplification in breast cancer. This is an important exception to the often favorable prognosis for ER+ tumors.
  • A subgroup of predominantly luminal A cases with low genomic instability that was enriched for histology types with good prognoses (e.g. lobular and tubular carcinomas).
  • Another subgroup with favorable prognosis, but containing a mixture of ER statuses and subtypes. Their common feature was a nearly flat copy number landscape. The authors note that this “CNA-devoid” subgroup is “ripe for mutational profiling.”
  • A stable, mostly high-genomic-instability subgroup comprising nearly all basal-like tumors with good long-term outcomes.
  • A group of Her2-enriched and ER-positive tumors with ERBB2 amplification. These patients were all enrolled before Herceptin (trastuzumab) became available, and had the worst disease-specific survival.

These findings demonstrate how useful it is to construct a cohort, not just of many cases, but with long-term follow-up so that researchers can link the genomic architecture of tumors to the eventual death or survival of the patients.


Curtis, C., Shah, S., Chin, S., et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups Nature DOI: 10.1038/nature10983


Print Friendly