Science Fiction: Going Viral

March 13, 2015 by Dan Koboldt

The rapid advance of next-generation sequencing technologies, particularly in the last several years, has almost seemed like something out of a science fiction novel. Think about it: on a HiSeq X Ten instrument, we can sequence a complete human genome in less than a week, at a cost that’s 0.00001% of what it took to fund the Human Genome Project.

It might surprise you to learn that — in addition to my blog posts here, and the grant/paper writing I do for my job — that I dabble in science fiction writing as well. If you think that scientific publication/success is hard (10% acceptance rate for tip-tier journals, or 8% NIH funding level), you should look into the the fiction side of publishing sometime.

The acceptance rate for most professional science fiction magazines (for short fiction) is generally below 1%. The pay is usually $0.05-$0.10 per word, meaning that a 4,000 word story might bring $200-400 in the (unlikely) event that you get it professionally published. The odds of landing a literary agent — which is required, if you want to have your novel shopped to most traditional publishing houses — are about 1 in 1,000.

A few months ago, Third Flatiron Publishing (which does quarterly science fiction anthologies) announced that their Spring 2015 anthology would be themed around world-altering events. As it happened, I’d written a science fiction story that seemed like it might fit — it was about a couple of researchers working in a dusty lab who stumble upon a universal cure for cancer (you remember I said science fiction, right?), and their struggle to make it available to the world.

The Time It Happened

I’m thrilled to say that the editors at Third Flatiron liked my story enough to choose it for their anthology The Time It Happened, which just came out and is available on Amazon in both Kindle and paperback versions. They’ve also bought audio rights, and intend to create a free podcast of my story (as well as a couple of others) sometime in the near future.

Since you readers enjoyed the non-fiction I write for MassGenomics, hopefully you’ll enjoy this as well.

Ten Years and 50 Publications in Big Science

February 21, 2014 by Dan Koboldt

This week at the American Journal of Human Genetics you’ll find a new method for exome-based mapping and rare variant prioritization in Mendelian disorders. The freely-available software package, MendelScan, is designed to help researchers score and prioritize candidate variants in family exome sequencing studies of Mendelian disease.

It’s my 50th research publication, and since I’m also celebrating ten years of service at Washington University, it seems like the perfect time for a retrospective.

HapMap Beginnings

I came to work for WashU in 2003, joining the lab of Ray Miller as a bioinformatician. I had a background in computer science and decent Perl programming skills, and enough biology to skimp by. Ray’s lab was in a partnership with Pui Kwok’s group at UCSF to serve as one of the genotyping centers for phase I of the International Haplotype Map (HapMap) project, the ambitious effort to map common genetic variation (SNPs) in human populations. The HapMap is what made all of those SNP chips and genome-wide association studies possible.

While the HapMap was ramping up, I got my name on a paper for the first time, High-density single-nucleotide polymorphism maps of the human genome (Genomics, 86:2, 2005). SNP discovery was a big deal back then. Simply put, we didn’t have enough SNPs to genotype for the dense map that was planned. Most of dbSNP’s early growth was driven by the needs of the HapMap project.

The HapMap was also my introduction to so-called “big-science”: large-scale, multi-center projects with a data control center and weekly conference calls. They’re a lot of work, but they’ve helped build some of the most important resources for genetic research. I also had my first experiences with Illumina, a company that’s still a big part of my work.

The SNP-in-Primer Effect

Another milestone I can attribute to the HapMap Project was my first first-author paper, Distribution of human SNPs and its effect on high-throughput genotyping, one of the first studies to highlight the connection between nearby SNPs (in primer sites) and genotyping assay failure (including allele dropout).

Mapping C. briggsae

As our role in the HapMap concluded, we took on a new grant: constructing the genetic map of C. briggsae, a small roundworm similar to the well-known model organism, C. elegans. This was a two-part project: first, we used shotgun sequencing data to identify SNPs in a few recombinant inbred lines (RILs). We leveraged the same genotyping platform we’d used for the HapMap (and later, an Illumina GoldenGate array) to genotype the variants in a number of different samples.

It was an interesting project, and I learned far more about worms than I probably ever wanted to. The thing I enjoyed most during this project was the worm research community. They’re a tight-knit and extremely collaborative group. We built some fantastic collaborations, and the three papers out of that one project are a testament to that.

Genetics Hired Guns

We had excess capacity during the C. briggsae project, and that opened the door to some unique collaborations at WashU. Ray knew so many people, both at WashU and in the research community as a whole. Thus, we had a number of smaller projects that helped support the lab and generate publications.

Pharmacogenetics of Warfarin Dose

We worked with Brian Gage and Deepak Voora on the pharmacogenetics of warfarin (coumadin), a widely-used oral anticoagulant. Warfarin is the poster-child for PGx. It has a narrow therapeutic window: too little won’t prevent clotting, and too much causes internal hemorrhaging. The two well-known genes in which genetic variants influence dose are:

CYP2C9, a cytochrome P450 enzyme that metabolizes warfarin in the liver, and
VKORC1, a component of the multi-protein VKOR complex involved in the vitamin K cycle that’s targeted by warfarin.

Interesting side note, VKORC1 was first mapped in warfarin-resistant rats, because high-dose warfarin is also used as a rodenticide. We helped with the analysis of targeted 3730 sequencing of some candidate genes in a warfarin patient cohort, and uncovered another gene, (CALU, encoding calumenin) in which variants influenced warfarin dose.

Immunogenetics of Smallpox Vaccination

We also worked with Sam Stanley on a targeted sequencing study to characterize the immunogenetics of smallpox vaccination. Smallpox has been all but eradicated in the West thanks to the development of a vaccine (which most of our parents had). Even so, members of the military were routinely given the vaccine because they might travel to parts of the world where smallpox persisted.

As with many vaccines, the pox vaccine sometimes had side effects, the most common (and measurable) of which was fever. Dr. Stanley’s project sought to identify variants in candidate immune system genes that might contribute to the phenotype.

This was before the completion of the HapMap and the availability of SNP chips, so we had to choose SNPs on a per-gene basis, using dbSNP and preliminary HapMap data. Then we genotyped them in cases and controls, and ran an association study. The study was published in the awesomely titled Journal of Infectious Diseases and highlighted the role of cytokine IL-1A in vaccine response.

The Genome Center

All good things must come to an end, and Ray’s lab eventually closed due to lack of funding. We knew this was coming, so I began looking for a new position at WashU, and I really only had eyes for one place. The Genome Sequencing Center, the place that had helped sequence the human genome while I was still in high school. It was mecca for me. I was fortunate enough to know some people there already, and others at WashU who could put in a good word.

Cancer Genomics

Ultimately, I landed a job in the Medical Genomics Group, charged with the analysis of sequencing data from our high-throughput pipelines. At the time, that was traditional capillary sequencing data. The projects were still ambitious: large-scale targeted sequencing of deadly tumors, like lung cancer and glioblastoma.

AML Cells. Credit: Univ. of Virginia

Things were changing, though. We were using two new sequencing platforms — Solexa and 454 — to unravel the genome and transcriptome of a leukemia patient, one we called AML1. In 2008, it became the first published cancer genome. We went on to publish a second leukemia genome, this one in The New England Journal of Medicine. That was an important milestone for us, because even clinicians and med students read NEJM.

Cancer genomics has been our bread-and-butter for a long time. Collaborations with outstanding oncologists at the Siteman Cancer Center, the Pediatric Cancer Genome Project with St. Jude Children’s Hospital, and our role in The Cancer Genome Atlas, let us work on many of the common cancer types — notably breast cancer and ovarian cancer, as well as brain, colorectal, renal, and others.

The next few years were an incredibly productive period for our group. Next-gen sequencing of tumor genomes was relatively new, and the potential payoff was huge. Admittedly, for many tumor types, the most frequently mutated genes had already been identified by candidate gene sequencing, copy number analysis, and other approaches. NGS nevertheless provided an unbiased assay for genomic changes, and we managed to get some nice publications out of it.

Human Genetics

Our institute took on a number of non-cancer human genetics projects as well. One of the earliest of these was a collaboration with Steve Daiger’s group at the University of Texas, Houston. His group studies inherited retinal diseases, with an emphasis on retinitis pigmentosa (RP), a Mendelian disorder of photoreceptor (rod) degeneration affecting 1 in ~3,500 individuals. I jumped at the chance to work on this project because I have family members with RP. And these were some great collaborators, as evidenced by the publications we managed to put together and our continued interest in discovering new RP genes.

Perhaps because of my background in human genetics and my work on RP, I was pulled into other human genetics projects as they came along too. One of the longest-running of these, a targeted sequencing study of GWAS regions for metabolic syndrome, just came out in PLoS Genetics. Eventually, our workload in human genetics grew enough that we formed a separate analysis group, the group I now manage.

VarSan and NGS Tool Development

Way back during the AML1 project, I offered to try and uncover indels from the 454 data. Given the nature of pyrosequencing, this was not the best idea. But that initial methods development would evolve into VarScan, our tool for SNP and indel calling in individual and pooled samples. It needs to be said that development of analysis tools was not our primary mission. Often there simply wasn’t a good software tool that did what we needed.

Nearly all of our analysis tools were created to meet a specific need in our analyses of NGS data. We developed SomaticSniper to detect somatic mutations in whole-genome sequencing data while allowing for tumor contamination of normal cells that occurs in some types of leukemia. We developed VarScan 2 for somatic mutation and copy number alteration calling in exome data for tumor-normal pairs. The MuSiC package comprised our growing suite of cohort-level mutation analysis tools for cancer. This week’s publication, MendelScan, represents our methodology for linkage-type mapping and scoring candidate variants in Mendelian disorders, most of which was developed during our RP collaboration.

From 50 to 100 Publications

There’s a thrill to getting any paper accepted and published that I enjoy no matter the journal or impact factor. One of the greatest perks of our information age is that biomedical literature goes out to the world, for the entire community to search for, read, and build upon. I also take a certain enjoyment of getting something into a journal that’s never published my work before.

That said, one of the benefits working in big science is taking part of work that can get into top-tier journals. My most-published-in journal by a big margin is Nature (14 papers), followed by the New England Journal of Medicine (4 papers). Those, along with Cell and Science, account for 41% of the publications in my list.

I’ve always liked the notion that 100 publications is the mark of a distinguished research career. Assuming that one works and actively publishes for 20 years, that works out to 5 publications a year. Ten years in, with 50 publications, I’m on track for that. Honestly, I hope it won’t take another decade to reach 100, and I certainly don’t plan on stopping when I do.

Functional Validation of Genomic Discoveries

July 12, 2013 by Dan Koboldt

Credit: Riken Research

Next-gen sequencing technologies have enabled rapid identification of many genes contributing human disease. Rapid, inexpensive exome sequencing quickly gave us access to the low-hanging fruit: rare Mendelian disorders with single, highly penetrant coding mutations. Since 2009, we’ve seen an avalanche of reports of disease-causing mutations and novel disease genes. Family studies, case-control studies, and population cohorts are picking up this kind of signal everywhere.

The trouble, as anyone who’s analyzed this kind of data understands all too well, is that there are a lot of possibilities out there. You can take just about any gene from a sequencing study or GWAS and — with the assistance of a nice resource like Gene Cards — come up with a story that might connect mutations/variants in that gene to your phenotype of choice. But the burden of proof remains.

Functional Validation Required

It should now be obvious to most that publication of novel disease genes in top-tier journals requires more than just genetic or genomic data. It requires some kind of functional validation, an assay that demonstrates how genetic differences have a measurable phenotypic effect that makes sense for the disease. Genomic and statistical approaches are hypothesis generation tools. Those hypotheses, well-supported as they may be, must be tested in vitro or in vivo to see if they hold up. Because, as I said, you can spin a story about almost any gene.

Let’s say that you’ve identified a new possible cancer susceptibility gene, a candidate tumor-suppressor. You found it by looking for rare germline variants in a cohort of patients with a specific form of cancer. You’ve already done the genomics to establish that:

Rare variants in the gene are enriched among cases (maybe 5% of patients harbored rare deleterious variants in that gene, compared to 0.1% of 1000 Genomes or NHLBI-ESP populations).
In tumors, the gene is a target for biallelic inactivation by somatic mutation, deletion, LOH, or epigenetic silencing
Expression of the gene is reduced or ablated in affected patients or tissues.

Everything looks right, it sure looks like a tumor suppressor, but where’s the proof? With over 20,000 known protein-coding genes, widespread genetic variation, and the continual accumulation of mutations in somatic tissues, there are plenty of candidates that will meet these criteria by chance alone. Editors and reviewers of top-tier journals know this, and they want more. They want functional tests demonstrating that defects in your gene improve the growth, survival, proliferation, or metastatic potential of cells. They want a null mouse for your gene that’s prone to tumors. As much as it pains me to say it, the following statement is true.

Genomics is not enough.

Options for Functional Validation

This bitter medicine undoubtedly tastes sweet to the molecular biologists and bench scientists whose efforts may have been overshadowed by genomics in recent years. Because now, after all of our fancy high-throughput instruments, robust informatics and clever statistics have provided some answers, we have to leave the computer and head back to the laboratory. And many of us, including the author, have little to no experience there.

Even so, I’ll do my best to summarize some of the options for functional validation, and ask you readers to comment with the things I’ve gotten wrong or forgotten.

Molecular Assays

Functional validation by subcellular localization

Subcellular localization (Weiqiao, PNAS 1998)

Additional evidence can be garnered at the molecular level by showing that your gene functions

mRNA expression. Genome wide (RNA-seq) or targeted (RT-PCR) mRNA expression assays provide insight about gene expression at the transcript level, including exon usage and alternative splicing.
Transcript/protein localization. It has been possible for some time to examine the tissue and/or intracellular location of a protein using specific dye-tagged antibodies, which may lend support to the idea that your gene of interest plays an important role at that location.
Protein-DNA interaction. New, high-throughput chromatin immunoprecipitation and sequencing (CHiP-Seq) make it possible to identify sequences bound by specific proteins. This can be used to evaluate the protein that does the binding (showing that a variant alters when/where/how it binds) or the target regulatory sequence (showing that variants affect binding of an important regulatory protein, such as a transcription factor).
Protein-Protein interaction. Another intriguing possibility for functional validation is showing that your suspect gene encodes a protein that interacts with a known key player in your disease pathway, such as BRCA1/2 for homologous DNA repair in breast and ovarian cancers.

Biological Assays

Morpholino Knockdown (Wikipedia)

Functional validation of a candidate disease gene can also be performed in living cells or organisms. Often this garners more compelling evidence of a gene’s importance, because it demonstrates the relationship between a genetic entity and phenotype visible at the cellular level or above. Some of the approaches here include:

Human cell lines. Gene knockdown (by siRNA or other methods) or transfection (infection of cells with a virus genetically engineered to carry a certain gene) in cell lines serves to demonstrate its importance for measurable cellular phenotypes, such as apoptosis, growth, proliferation, and contact inhibition.
Animal models. We are lucky enough to control the fates of lesser organisms, which means we can use reverse genetics techniques to alter their genomes and see what happens. The advantage here is that you get to study a gene’s effect on a complete organism, which more closely resembles what could be happening in humans. Mouse models are often the method of choice, though some other model organisms provide good experimental systems for certain phenotypes, such as morpholinos (antisense oligos) in zebrafish.
Human patients. This generally isn’t possible, but in some cases genetic information (i.e. specific tumor alterations) has been used to tailor treatment to individual patients, in which case the outcome of the treatment validates the genomic finding. Case in point: the use of whole-genome sequencing to diagnose a cryptic PML-RARA fusion. This approach obviously has many ethical and legal hurdles, and probably wouldn’t be approved for truly novel discoveries.

A Call to Reviewers

In closing, I would like to appeal to peer reviewers of those journals who now wish to see functional validation of genomic findings. Asking the authors to “provide some functional validation” of their findings may be a valid critique, but it’s not terribly helpful. It would be better to outline what kind of experiments you’d like to see to become convinced. Because the odds are, you’ll be reading this manuscript again at some point, and wouldn’t it be nice if they performed the validation that you were looking for?

In fairness, some of those who work in the field of next-gen sequencing, even to tackle genetic diseases, do not have knowledge of (or even access to) laboratory techniques that could functionally validate their findings. It would benefit the entire research community if we took a moment to outline potential avenues of functional validation so that we “dry lab” scientists can begin to explore them.

Science Blogging with WordPress, Part 1

May 30, 2013 by Dan Koboldt

In my last article, I covered how and why to start a science blog. Now it’s time to get into the nuts and bolts: setting up your web site through WordPress. The wonderful thing about this is that there’s no real technical skill required — you don’t need to know HTML or buy any expensive web design software — everything is point-and-click.

Set Up Your Hosting and Domain Name

As I mentioned before, if you’re serious about getting into blogging you should host your own site (meaning you have your own domain name, instead of myblog.wordpress.com). And I recommended getting a hosting package and domain through GoDaddy simply because:

The prices are competitive. It’s usually something like $3-5 per month for unlimited bandwidth.
Their administration panel is easy to use. You log in and can set everything up with a few clicks.
I host multiple web sites with them, and thus I’m quite familiar with the setup.

Please, for the love of God, get one of the Linux hosting packages (not Windows). Once you’re set up on GoDaddy, you log into the control panel which looks something like this:

Your Control Panel

Here, you’ll manage your domain, e-mail, and web site. Under the “Web Hosting” menu, find your web site and click “Launch” to bring up its control panel.

Install WordPress to Manage Your Site

We live in a golden age of web publishing, because there are great content management systems (CMSs) that do almost everything for you. WordPress is the most widely-used of these. GoDaddy’s servers can install it automatically for you. On your web site’s control panel, find the “Applications” section and click on it. You’ll see something like this:

Install WordPress Application

Install WordPress in your root directory (/). You’ll be prompted for a username and password, then it begins processing. This will take a few minutes, and you’ll get an e-mail when it’s done. Congratulations! You now have a web site.

Configure WordPress

The next few steps are really important. WordPress is great, but it needs a little customization so that you don’t look like the complete rookie that you are by having a site named “My Web Site”.

Change the General Settings. Under Settings > General Settings, you’ll see where you can give your website or blog a name (the name of your website) and a “Tagline” (a brief sentence of what your blog is about). For example, you might have a site named “Fashion Diva” with the tagline “Fashion tips and advice for women”. You’ll also put in your e-mail address so that WordPress can notify you of comments on your site and such.
Change your website’s link structure. Go to Settings > Permalinks. Under common settings, change the radio button to “Post name”. Save this setting. Now, your posts won’t be categorized under yourdomain.com/year/month/date/first-post. Instead, they’ll be at yourdomain.com/first-post. Search engines like this, and it also makes your web site easier to bookmark and navigate.
Remove the clutter. Go to Appearance > Widgets. In the middle of the page are all of the available “Widgets” that you can put in your sidebar. These are little display items, like the “Tag Cloud”. Most of the time, all they do is distract visitors. So you should remove most of these from the sidebar. Maybe just start with Recent Posts, Categories, and Meta (which has your login link and RSS feeds for visitors).

Customize Your Theme and Appearance

Last but not least, you should customize your site’s look and feel. The principal way that you do this is by selecting a theme. There are many great free themes out there for WordPress; I personally like Suffusion because it’s incredibly versatile and well-supported. You customize your fonts, color scheme, and header image. Experiment with these elements until you find a look that you like. Graphic and site design isn’t my core area of expertise, so I’ll leave that area for others. In my next post I’ll discuss how to begin networking with other bloggers and building your profile in the online world.

« Previous Page