Prostate cancer exomes, and sequencing matched normals

November 29, 2011 by Dan Koboldt

A new study in PNAS from Jay Shendure’s group at the University of Washington describes exome sequencing of 23 prostate cancers. These tumors were derived from aggressive primary tumors or lethal metastases, and propagated in immunocompromised mice as xenografts. For most of the tumors, matched normal DNA was unavailable, so the authors developed a filtering strategy in which the growing catalogs of human sequence variation are employed to identify and remove germline polymorphisms from the lists of tumor genetic variants. Specifically, the authors used pilot project data from the 1,000 Genomes Project, and internally-available variants from ~2,000 additional exomes they’d sequenced. For the majority of tumors, this reduced ~13,500 coding SNVs down to ~350 “nov-SNVs” per tumor (a reduction of 97.4%). The authors readily admit that these nov-SNVs comprise a mixture of:

Somatic mutations that were present in the original tumor.
Somatic mutations that occurred during tumor propagation and evolution in the mouse model.
Germline variants present in the patient’s constitutional genome that are absent from public databases, presumably due to rarity (e.g. private SNPs).
False-positive variant calls.

Recurrently Altered Gene Filtering

Given a set of mutations from multiple tumors of the same type, the logical next step was to look for genes recurrently altered in the group, since recurrence offers perhaps the best evidence of genes harboring “driver” mutations, which confer advantages for tumor growth and progression, as opposed to “passenger” mutations which do not. The problems for this study were two-fold: First, 16 unique tumors (from unrelated individuals) is a small cohort size with correspondingly small power to identify recurrent alterations. Nothing to be done about that. Second, even looking at just 16 tumors, there were 135 genes harboring non-synonymous nov-SNVs in two or more exomes. A substantial fraction of these are undoubtedly due to rare germline variants missed by the filter, rather than recurrently mutated genes.

To address this, the authors excluded from consideration the 1% of all genes (not just ones mutated in this study) with the highest rate of rare germline variants in control exomes. In other words, they removed genes with the highest rate of germline polymorphism, which I note likely includes (1) genes with high genetic diversity, and (2) genes whose sequence characteristics make them more likely to give rise to false-positive variant calls. The danger of this strategy is that, in principle, genes with high genetic diversity are more prone to mutations, and it’s quite possible that some of these are driver genes for carcinogenesis. Nevertheless, this strategy reduced the list to 104 genes altered in two or more exomes. That’s still too many to tell a story about, so another step was taken.

Using a control set of 1,865 exomes, the authors performed an iterative sampling (I believe this is a bootstrap) to estimate the probability that a given gene would harbor recurrent nov-SNVs that were due to germline variation. Any genes with a germline recurrence probability of 0.001 or higher were excluded from the list, which dropped it sharply down to 20 genes with nov-SNVs in two or more prostate tumors (10 of these were found in three or more).

After all of these steps were taken, the top recurrent gene was TP53, which was altered in 5 of 16 tumors (31.25%). No other gene had as many recurrent hits in the study. This is a vote of confidence for the approach, because TP53 is the one of the most frequently perturbed gene in many solid tumor types, including breast and ovarian cancers. Another believable recurrent gene was GPC6, which encodes a cell surface proteoglycan believed to act as a receptor for growth factors and other signaling molecules. Other recurrent genes highlighted in this study (DLK2 and SDF4) are less convincing. The simple fact is that we don’t know for certain which mutations are truly somatic in the primary tumor, so it’s difficult to draw strong conclusions.

Direct Comparison with Matched Normals

A few of the tumors did have matched normal tissue available, and the authors examined these in detail to assess the accuracy of their germline filtering approach. For three tumors, the authors had (1) mouse xenograft tumor tissue, (2) tumor tissue taken from the patient prior to metastasis, and (3) matched normal tissue. They applied exome sequencing to these to determine set of true somatic mutations (valid mutations) in the original tumor exomes. Valid mutations were compared with the xenograft’s predicted nov-SNVs to determine the number of valid mutations detected (valid detected), the number missed (valid missed), the fraction detected (sensitivity), and the proportion of nov-SNVs that were actually false positives (either germline variants or mis-calls).

Tumor ID	nov-SNVs	Valid Mutations	Valid Detected	Valid Missed	Sensit- ivity	False Positives
LuCap92	193	56	51	5	91.07%	73.58%
LuCap145.2	281	122	106	16	86.89%	62.28%
LuCap147*	2,122	2,045	1,823	222	89.14%	14.09%

Note that only LuCap 92 was the same tumor tissue that was used to make the xenograft; the other two (LuCap 145.2 and LuCap 147) were neighboring metastases, and presumably closely related to the xenografted tumor. Exome sequencing and germline filtering of the xenograft enabled detection of ~89% of valid somatic mutations across all three cases. This is worrisome, because it means that 11% of valid somatic mutations were removed by the germline filtering strategy. More on that later. Perhaps even more troubling is the inferred false positive rate (fraction of nov-SNVs that are not valid somatic mutations in the tumor), which was ~68% for LuCap 92 and LuCap 145.2.

LuCap 147 is notable in that it was one of three “hypermutated” prostate cancer tumors, with 10-fold the number of nov-SNVs. It also had a lower false-positive rate because there were so many valid somatic mutations to detect. There were no distinctive feature to explain the high number of mutations in hypermutated tumors, though it suggests an acquired defect in DNA repair machinery. As only 15% of tumors had this mutation phenotype, the low false positive rate is an outlier. For most tumors, two thirds of the nov-SNVs obtained by the filtering approach are not valid somatic mutations.

Reasons to Always Sequence the Matched Normal

I have heard it said that sometime in the near future, our catalogs of human genetic variation will be complete enough that we won’t need to sequence matched normal tissue when studying cancer samples. The authors of this study claim that their results give credence to that notion. I respectfully disagree. True, the germline filtering strategy provided a 150-fold enrichment for valid somatic mutations. However, more than half of the final set of nov-SNVs were false positives (not somatic), and 11% of valid somatic mutations were inadvertently removed. I give you, then, my reasons why I believe we should always sequence the matched normal:

Public databases are not as good as you think. In this study, curated catalogs of sequence variants from known sources (the authors themselves, and the 1,000 Genomes Project) overlapped with 11% of valid somatic mutations, causing their removal. A filter based on the latest dbSNP is even more dangerous because, as some of us have recently discovered, dbSNP contains a lot of somatic (not inherited) mutations. This is because certain cancer projects have submitted their somatic mutation callsets to dbSNP, and these have been accepted. Also, given the low barrier to entry, one should be aware that a lot of dbSNP entries are experimental false positives. Both of these can overlap with mutations in a tumor genome and cause them to be dismissed as germline variants.
Non-SNV alterations are not amenable to filtering. Tumor genomes acquire insertions, deletions, structural variants, and copy number alterations, some of which may activate oncogenes or disrupt tumor suppressors. Let’s be honest: the databases of non-SNV variants in germline form are woefully incomplete. Unlike SNVs, the coordinates and alleles of larger variants are ambiguous, which makes comparisons to existing variant catalogs very difficult. There are also other types of genetic changes in a tumor, such as loss of heterozygosity (LOH), that will be missed when you don’t know the normal genotype.
True somatic mutations are exceptionally rare compared to germline variants. Inherited sequence variants occur at a rate of one per 500-1000 base pairs. In contrast, for most tumors, somatic mutations occur at a rate of one per million base pairs. Let’s say you have 20,000 coding variants in a tumor and 98% of those are in dbSNP. That leaves 400 private SNPs that filtering won’t remove, whereas most solid tumors harbor less than 100 somatic coding mutations. In this realistic scenario, only one out of every five post-filtered variants is a somatic mutation.
Sequencing is cheap, but mistakes are not. Not long ago, you could argue that sequencing matched normals was too costly to be done systematically, even if they were available. That’s no longer the case. A single HiSeq lane gives you enough sequence for two exomes. Why not eliminate the largest source of false-positive mutations – the constitutional genome – by sequencing it as well? It will give you better predictions, and if you go on to validate candidate mutations (as you certainly should), it will probably end up saving you money. Trust me, it’s far better to sequence tumor-normal pairs together, at the same time, same exome platform, ideally same instrument run, to minimize batch effects between them.

Availability of Matched Normals

Of course, sequencing a matched normal sample requires that such material is available. I recognize that this is not always the case. Some of the better-studied cancer cell lines, for example, were made from the tumors of long-dead cancer patients. For less common cancer types, many of the available samples will be frozen or FFPE samples, and getting a matched normal won’t be possible. However, if matched normal tissue is available, I’d argue that it should be assigned for sequencing under identical protocols as the tumor sample. And when you find those germline variants, don’t forget to submit them to dbSNP.

References

Kumar A, White TA, MacKenzie AP, Clegg N, Lee C, Dumpit RF, Coleman I, Ng SB, Salipante SJ, Rieder MJ, Nickerson DA, Corey E, Lange PH, Morrissey C, Vessella RL, Nelson PS, & Shendure J (2011). Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers. Proceedings of the National Academy of Sciences of the United States of America, 108 (41), 17087-92 PMID: 21949389

The Great Divide: Cancer Genomics and Clinical Care

November 23, 2011 by Dan Koboldt

Fueled by advances in next-generation sequencing and consortium-scale efforts, the field of cancer genomics is maturing at a rapid pace. As the catalog of genetic lesions in cancer expands across samples and tumor types, we are learning more and more about the DNA sequence changes underlying tumor development, growth, progression, and response to treatment. One would think that these advances would quickly translate into better diagnosis and treatment of the disease. If only it were so. Despite their potential to improve patient management and care, the findings of cancer genomics efforts have been slow to reach the clinic.

One Step Forward: Limited Genetic Testing

There has been some progress. Major cancer centers like Johns Hopkins University announced that they’ll begin applying standard genetic tests to every cancer patient that comes in the door. While limited to a handful of common, clinically-actionable mutations, the test provides some genetic information that could guide the prognosis and treatment.

Genes Currently Tested
ALK BRAF CHIC2 CSF1R CTNNB1 DNMT3A EGFR	FLT3 IDH1 IDH2 JAK2 KIT KRAS MET	MAPK1 (ERK) MAPK2 (MEK) MLL NPM1 NRAS PDGFRA PIK3CA	PTEN PTPN11 RET RUNX1 TP53 WT1

It is good to see a general acknowledgement that genomic information is relevant for cancer patient management. And these clinical testing panels do offer some important advantages. First and foremost, these are well-established cancer genes which offer relevant diagnostic/prognostic/treatment-related information. Second, the limited scope allows for a perfection of technical assays, assurance of completeness, and a reasonable scope for interpretation of the findings. Third, mutations in these genes are recurrent across a number of tumor types, which means that this standard test can be given to any cancer patient, with a good chance of finding something actionable. Finally, the use of sequencing instead of a genotyping platform makes it feasible to detect rare, occult mutations without knowing the position and variant allele beforehand.

Limitations of Focused Testing Panels

Of course, for those of us in the sequencing world, it’s hard not to see the disadvantage. The use of sequencing and FISH will improve the sensitivity of the assay, but some types of alterations (such as SVs) will be missed. Case in point: in a study published in JAMA this year, Welch and colleagues used whole-genome sequencing to identify a cryptic fusion oncogene (bcr3 PML-RARA) in a patient with acute promyelocytic leukemia (APL). This discovery qualified the patient for treatment with all-trans retinoic acid (ATRA), which induced cancer remission and saved her life.

There are currently 468 known, curated cancer genes according to the Cancer Gene Consensus, and somatic mutations have been reported in hundreds (if not thousands) of others. Large-scale sequencing efforts are revealing that a single tumor may harbor anywhere from ten to 1,000 mutations in coding genes. Yes, only a fraction of these are likely to be driver events, and some of those will occur in genes currently tested for in these panels. Even so, we know other important cancer genes are out there. Once they’re discovered and validated, they may have clinical relevance. Sure, they can be added to the panel, but that doesn’t help any patients that were already tested.

Why Not Exome or Whole-genome Sequencing?

Some have argued that whole-genome sequencing is too expensive for use as a diagnostic tool. This is no longer a valid excuse; due to the plummeting cost of sequencing, the cost per sequenced genome is less than $10,000. That seems like a lot until you think about what surgery, radiation, chemotherapy, and other state-of-the-art treatments cost. Why not apply whole-genome, or at least whole-exome sequencing to every tumor that comes in the door? Doing so would offer a number of advantages:

For the patient, it would provide a catalogue of their tumor’s somatic mutations that could be stored and referred to as new relevant cancer genes are discovered.
For the clinician, it would provide a new avenue of investigation to be taken when all other treatment strategies have failed. A guided shot in the dark is better than no shot in the dark at all.
For other patients, this information might be valuable. Here’s the list of mutations your tumor harbors. Here are ones we’ve seen before, and here’s how those patients responded to the treatment you’re about to receive.
For researchers, standard-of-care clinical tumor sequencing could contribute substantially to our catalogue of somatic mutations, enabling new recurrent genes to be found, and new clinical correlations identified.

I applaud the efforts of Johns Hopkins, Washington University, and other major centers to incorporate genetic testing into cancer care. This is an important practical step as well as a symbolic one: it acknowledges that genomic information has clinical consequences that should be used in patient care. At the same time, I say it’s not enough. We should continue to push until more comprehensive genome sequencing is the standard of care in cancer diagnosis and treatment.

Major Exome Platforms Compared

October 26, 2011 by Dan Koboldt

A few recent studies have sought to compare commercial exome sequencing technologies. These kits, which selectively target coding regions for next-generation sequencing, have matured rapidly over the past couple of years. I like the recent study out of Michael Snyder’s lab (Stanford)the best. In it, the authors compared three major exome platforms – Agilent’s SureSelect Human All Exon (50 Mbp), Roche/Nimblegen’s SeqCap EZ v2.0, and Illumina’s TruSeq Exome Enrichment – to each other and to whole-genome sequencing (35x), all for a single individual.

From Figure 1: Major exome platforms (Clark et al, Nat. Biotech., 2011)

Differences in Target Space

First off, a comparison of the declared exome targets for each platform.

Credit: Clark et al, Nat. Biotech. 2011

A large number of bases (29.45 Mbp), presumably the “meat” of the exome are targeted by all three platforms. Individually, the platforms have 4-28 Mbp of unique target space. Agilent does better for Ensembl transcripts; Nimblegen has better coverage of miRNAs. These two platforms share more target space with each other than either did with the Illumina platform. This is primarily because Illumina goes after untranslated regions (UTRs). I can’t decide if this is an advantage or not. On one hand, it certainly appeals to the investigator interested in variation in UTR regions. On the other, that’s a lot to sequence. Indeed, the authors note that 50 million 2×100 bp reads yield only 30x coverage on the Illumina platform, compared to 60x for Agilent and 68x for Nimblegen.

Target Enrichment Efficiency and GC Content

The authors performed exome capture and sequencing on a single sample – a healthy volunteer of European descent – using all three exome kits. Each exome library got one lane of 2×100 bp reads on the Illumina HiSeq 2000 (11 to 18 Gbp per library). BWA mapped 99% of these to the reference sequence, and some 10-15% were PCR duplicates. Overall targeting efficiency was measured using 80 million reads for each exome, and evaluating the fraction of bases covered at 10x, 20x, and 30x. The authors wrote “At all read counts and depth cut-offs, the Nimblegen platform enriched a higher percentage of its targeted bases than the other two platforms.” They attribute this efficiency to the higher-density, overlapping baits used by the Nimblegen platform.

Unsurprisingly, all platforms demonstrated a marked reduction in coverage over high and low GC targets. At low GC (40% to 20%), however, the Agilent platform showed only a slight decrease in read depth, possibly due to fewer PCR cycles, longer baits, and/or the use of RNA probes that were unique to this platform.

Detection of Single Nucleotide Variants (SNVs) and Small Indels

Detection of small sequence variants, especially SNVs, is a major goal of exome sequencing. Using the normalized ~80m read sets, the authors performed SNV detection (using GATK) in each exome. All three platforms showed high concordance between SNV calls and high-density SNP array genotypes. The reference allele was slightly favored (0.53-0.55) at SNP positions, suggesting slight mapping bias against variant-containing reads. However, there were no biases toward or against specific substitution types. For all platforms, the SNV count increased as the coverage increased. This increase was not linear, however; at 30 million reads, over 95% of SNVs were detected. In shared regions, Nimblegen consistently captured the most SNVs and became saturated with the lowest number of reads.

Nimblegen also detected the most indels in shared and RefSeq regions, owing to more efficient capture and thus deeper coverage. At low read counts, Agilent detected more indels in shared regions, but at 50 million reads, Illumina surpassed Agilent (and, unsurprisingly, detected many more UTR indels). Most indels were 1bp in size, though the authors saw slight enrichment of indels in the 4bp and 8bp bins (consistent with human-primate genome comparisons), as well as the multiple-of-three enrichment expected due to selection against frameshift mutations.

Comparison with Whole-Genome Sequencing

A key strength of this study was that the authors also performed whole-genome sequencing to 35x mean coverage on the sample that was evaluated. WGS data had 98.5% concordance at heterozygous SNP positions as detected by SNP array. To simulate the multiplexed sequencing of 3 or 6 exome libraries per lane (GAIIx or HiSeq, respectively), the authors normalized exome datasets to 50 million reads apiece. In each exome-WGS comparison, the WGS dataset was restricted to regions targeted by that exome product. This step seems necessary for an apples-to-apples comparison, but I should note that it minimizes the strength of WGS, which provides relatively unbiased coverage across all coding regions. In other words, this restriction slightly favors the exome dataset by examining only regions that its platform was willing, and able, to target.

From Figure 6: Overlap of SNVs called by Exome and WGS (Clark et al, Nat. Biotech 2011)

The vast majority of SNVs in exome space were detected by both exome and WGS data, but there were some differences. Notably, the exome-specific and WGS-specific calls in each comparison tended to have (1) lower confidence scores, (2) higher proportions of novel-to-dbSNP variants, and (3) better coverage in the detection platform. WGS-specific SNVs often had zero reads in the exome data (probably hybridization failure). In contrast, most exome-specific SNVs had coverage in WGS, though it tended to be lower.

It seems clear from this figure that the number of SNVs detected by exome and WGS is correlated to the “reach” of the exome platform. Illumina, which had the biggest target space and also went after UTRs, had the highest number of shared SNVs. Agilent had more than Nimblegen, but Nimblegen’s sensitivity for true positives in its target regions was much higher than that of the other two platforms.

How to Choose an Exome

The authors conclude that all three exome platforms are pretty good. Choosing among them probably depends on the goals, priorities, and budget of the investigator. For the cost-conscious, Nimblegen offers the most efficient enrichment of exons (and also of miRNAs). For the variant-hunters, Agilent provides a wider reach but requires a bit more sequence data. Illumina requires the most sequence data, but it alone surveys untranslated regions, which might appeal to some researchers.

References

Clark MJ, Chen R, Lam HY, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, & Snyder M (2011). Performance comparison of exome DNA sequencing technologies. Nature biotechnology, 29 (10), 908-14 PMID: 21947028

Whole-genome sequencing and clinical annotation

September 16, 2011 by Dan Koboldt

Next-generation sequencing has immense transformative potential for medicine in the coming decade. Rapid, economical whole-genome sequencing can provide a wealth of information useful for diagnosis, treatment, and even prevention of disease. Very soon (if not already), generating whole-genome sequencing data will be routine. The challenges will lie in accurate variant calling, phasing, annotation, and clinical interpretation.

A new study in PLoS Genetics reports the whole-genome sequencing and detailed genetic risk assessment of a family quartet with a history of familial thrombophilia. There’s a lot to like about this paper, but let me give you the highlights.

Construction of and alignment to an ethnicity-specific major allele reference sequence yielded improved alignment and more accurate genotyping, especially at disease-associated loci.
Mendelian inheritance state analysis in the family structure enabled identification and removal of >90% of variants arising from sequencing errors.
Per-trio phasing, inheritance state of adjacent variants, and population-level linkage disequilibrium data were integrated to provide long-range phased haplotypes.
By fine-mapping recombination events to sub-kilobase resolution, the authors were able to perform sequence-based human lymphocyte antigen (HLA) typing.
A curated database of genotype-phenotype correlations made it possible to construct comprehensive genetic risk profiles, including multigenic risk of inherited thrombophilia, common disease susceptibility, and pharmacogenomics.

Advantages of an Ethnically-Concordant Reference Sequence

The human reference sequence is a composite, assembled using pooled sequence data from about 20 individuals. Several groups have reported that the current reference harbors a number of biases – some alleles represented are the minority of those present in world populations, and insertions are better represented than deletions. Using SNP genotype data from the 1,000 genomes project (~6-10m loci), the authors of this study developed three ethnicity-specific reference sequences for the CEU (Western Europe), YRI (Sub-saharan Africa), and CEU/JPT (Han Chinese / Tokyo Japanese) populations. They did so by determining the major allele in each population, and swapping it in when the NCBI reference base differed. This resulted in ~1.6 million substitutions for each population reference:

Credit: Dewey et al, PLoS Genetics 2011.

There were almost 800,000 positions where the reference allele was not the major allele in all three populations. Thus, at roughly 10% of SNP positions examined, the NCBI reference sequence contained a minor allele relative to European, African, and Asian populations.

Self-reported ethnicity of the parents in the quartet was northern/western European, a claim largely confirmed by PCA analysis. The authors therefore aligned all genomes to the CEU major allele reference, resulting in a small increase (0.1%) in the fraction of reads mapped by BWA. This seems like a small fraction, but it works out to around 6 million reads across the four samples. Presumably, more reads were mapped because the population-matched reference reduces allele-specific mapping bias (ASMB) against non-reference bases. Next, the authors compared variants to an internally-curated database of genotype-phenotype correlations, identifying 9,389 correlated variants in the family quartet. This number would have been 10,396 if the NCBI reference were used, indicating that 10% of disease-associated markers are in fact major population alleles less likely to contribute to inter-individual variation in disease susceptibility.

The ethnicity-matched reference also enabled a more accurate estimation of population mutation rate (7.8 x 10-4). Using the NCBI reference, this rate was 9.2 x 10-4, indicating that a standard reference sequence yields inflated population mutation rates.

Mendelian Inheritance and Long-Range Haplotyping

Whole-genome sequencing of a “nuclear” family (mother, father, son, daughter) has a number of advantages:

It enables comprehensive Mendelian inheritance analysis, to facilitate the removal of false-positive variants, isolate putative de novo mutations, and even identify regions of structural variation based on blocks of Mendelian inconsistencies.
Meiotic crossover sites can be comprehensively surveyed, in this case to sub-kilobase resolution.
Trio information (each child compared to both parents) helps to phase the variants, in other words, to determine which variants are on the paternal chromosome, and which are on the maternal chromosome. This is especially useful for identifying compound heterozygotes for recessive traits.
Paired with population linkage information from the HapMap and 1,000 Genomes Project, this information can be used to infer long-range haplotypes. On chromosome 6, the authors used haplotype and population information to accurately determine HLA genotypes for every sample.

The family information also made possible this fascinating mosaic of chromosomal inheritance:

Credit: Dewey et al, PLoS Genetics 2011.

There are obviously key benefits to having sequence data for everyone in the family. In the future, when clinical sequencing is commonplace, don’t forget to bring your parents along.

Synonymous But Not the Same

One downstream analysis that I particularly enjoyed was that of synonymous coding variants. These variants are often ignored in studies of human genetics, despite a growing body of evidence that they can have translational effects via codon usage bias, mRNA stability, and splice site alteration. The authors developed an algorithm to evaluate these effects for 186 rare, novel synonymous SNPs found in the family. One of these, in the gene ATP6V0A4, is predicted to significantly affect mRNA secondary structure by disrupting a stable “tetraloop” – likely reducing mRNA stability. This is relevant because homozygous loss-of-function variants in this gene have been associated with distal renal tubular acidosis (a disease in which the kidneys don’t remove enough acid into the urine).

Clinical Annotation and Interpretation

The authors build on their previous work to comprehensively annotate clinically-relevant variants in all family members. There’s an extensive amount of work done here, much of it hinging on the authors’ internally-developed, hand-curated database of 16,400 SNPs associated with disease traits. An analysis of rare variants bolstered with evolutionary conservation data highlighted variants in two genes related to thrombophilia: one in the F5 gene, encoding Leiden factor V, with increased risk for thrombophilia, and another in the MTHFR gene (love that gene symbol), which predisposes carriers to hyperhomocysteinemia.

Looking ahead to the probable treatment of family members with blood-thinning medication, the authors next undertook a pharmacogenetic analysis. Perhaps the best-known example of pharmacogenetics is warfarin (coumadin), an oral anticoagulant given to patients at risk for stroke or deep vein thrombosis (DVT). Warfarin was the fifth-most prescribed drug in the U.S. the last time I checked, but it has a narrow therapeutic window. Too little, and it has no anticoagulant effect. Too much, and it can cause internal bleeding. Variants in a number of genes have been associated with warfarin dosing, but two are predominant: CYP2C9, the primary metabolizing enzyme for the drug, and VKORC1, the drug target. In this family, all four members were homozygous for the CYP2C9*1 allele, associated with normal dose, but heterozygous for VKORC1-1639, associated with “therapeutic prolongation” of warfarin response at low doses. Based on these genotypes and patient clinical data, the authors applied the International Warfarin Dosing Algorithm to determine the appropriate dose.

All told, this is an interesting study that clearly involved a substantial amount of work (the pre-print PDF totaled more than 100 pages). Undoubtedly, many of the strategies presented here will be useful as whole-genome sequencing moves into the clinic.

References

Frederick E. Dewey, Rong Chen, Sergio P. Cordero, Kelly E. Ormond, Colleen Caleshu, Konrad J. Karczewski, Michelle Whirl-Carrillo, Matthew T. Wheeler, Joel T. Dudley, Jake K. Byrnes, Omar E. Cornejo, Joshua W. Knowles, Mark Woon, Katrin Sangkuhl, Li Gong,, Madeleine P. Ball, Alexander W. Zaranek, Heidi L. Rehm, George M. Church, John S. West, Carlos D. Bustamante, Michael Snyder, Russ B. Altman, Teri E. Klein, Atul J. Butte, & Euan A. Ashley (2011). Phased whole genome genetic risk in a family quartet using a major allele reference sequence PLoS Genetics, 7 (9)

« Previous Page