• Home
  • About
  • Aligners
  • Genomes
  • Subscribe
  • VarScan
  • Jobs
  •  

    How and Why to Start a Science Blog

    May 24th, 2013

    There are two features of the digital age in which we live that have changed the world. The first is that a vast wealth of information is available to us, more than we could ever hope to look at in our lifetime. Wikipedia alone has 4.2 million articles written in English. The second is that the barrier to publishing has almost disappeared: where once a handful of society had the means to print their work (or have it printed), now virtually anyone can. All it takes is a computer with an internet connection.

    Writing and publishing are an integral part of most careers in science, particularly for academic research. The digital age has transformed how research is published, but perhaps more importantly, has offered new channels by which scientists around the world can communicate with one another. Blogs and social media are powerful tools for scientists, and as highlighted by a recent article on the Scientific American blog, they may even offer a path to a career in scientific writing.

    blogging

    A few friends of mine have considered the idea of starting a blog and come to me for advice. It’s inspired me to begin a series of articles on science blogging.

    Why To Start A Blog

    There are many good reasons to consider starting a blog related to your work or research:

    • Share opinions. A blog is the epitome of free speech. There’s no editorial or peer review; you can write whatever you want, within reason. Just remember that there’s no guarantee of anonymity.
    • Gain fame or notoriety. There are thousands of scientists out there, but only a fraction of them bother to start a blog. It’s therefore another means to separate yourself from the crowd.
    • Build relationships. What I’ve enjoyed the most about Massgenomics is the network of other scientific bloggers, science writers, and researchers I’ve met along the way.
    • Practice writing. It is a simple fact that how well you write will affect your scientific career. Obviously practice makes perfect, but the opportunities to practice writing can be limited in the traditional workplace. Blogging changes all of that; you can get as much practice as you want.

    Why Not to Start A Blog

    It occurs to me that I should offer some notes of caution. Blogging isn’t for everyone. Here are some points to consider before you launch into one:

    • It takes time. The hours you spend writing on your blog could be spent writing grants or manuscripts or you know, doing your job. Blogging can be a huge time sink, and you might end up with little to show for it.
    • It can get you into trouble. Free speech, combined with the lack of editorial oversight, can be a hazard to your career. Assume that the people who pay your salary, fund your grants, and review your papers will read it, and proceed with caution.
    • Readers are fickle. Let’s be honest, with the incredibly low barrier to entry, there’s a lot of crap in the blogosphere. People have only a limited amount of bandwidth for reading in any given day. Building and retaining an audience is tough. You may write dozens of wonderful articles, only to learn that no one is reading them.

    Getting Started with a Blog

    There are a few different ways to get started with blogging. The lowest investment (but also lowest staying power) is to start a free blog on a site such as wordpress.com. But if you’re serious about writing a blog, gaining recognition, and building an audience, you should probably get your own site. This requires two things:

    1. A domain name, which is how people get to your site.
    2. A web hosting package, which provides the files and database required for your site.

    Hosting your own domain and web site has a cost, though it’s not substantial. The up-side is that it requires little or no technical expertise to run your own site. I recommend that you start with a basic hosting package from GoDaddy.com. Right now it’s about $2.99 a month, which works out to less than $36 per year, and you get a free domain name with it. In my next articles, I’ll talk about how to set up your blog and begin to build a following.

    AddThis Social Bookmark Button

    Genetics of Vision Disorders: Research, Testing, and Translation

    May 10th, 2013

    arvo-meeting-logoThis week I attended the annual meeting of ARVO — the Association for Research in Vision and Opthalmology — which brings together 15,000 clinicians and researchers working in those fields. This is an unusual type of meeting for me, and quite unlike CSHL or Marco Island meetings in that the primary focus is on patient care. Few areas of medicine have benefited from next-generation sequencing more than research into vision disorders. Many groups employing whole-exome or more targeted sequencing presented their work here, and also highlighted the challenges of translating NGS technologies into clinical use.

    Inherited Eye Disorders

    Vision disorders with a large genetic component range from rare, penetrant Mendelian disorders like retinitis pigmentosa (RP) to common complex diseases (AMD, glaucoma). In many ways, these are well-suited to genetic studies:

    • They often manifest phenotypes that can be quantitatively measured and tracked over time.
    • For Mendelian disorders, many (if not most) of the common disease-causing genes have been identified.
    • Vision disorders are usually non-fatal, meaning that they can be studied over lifetimes and generations.

    Next-generation sequencing has become a powerful tool for genetic screening in visual disorders. In autosomal dominant RP, for example, screening a panel of known retinal disease genes uncovers the casual mutation in about 75% of cases. Many groups including ours have employed exome sequencing to search for new causal mutations in the other 25%.

    Surprisingly, however, the number of new RP genes identified since exome sequencing became widely available is relatively small. An explanation I can offer from my own experience, and one echoed by many of those presenting at ARVO, is that there are thousands of variants in every individual’s exome and identifying the one that causes disease (especially dominant disease) is hard. By definition, we are looking for a variant in a gene that isn’t yet linked to retinal disease, and that means almost every gene.

    Challenges of Mendelian Disorders

    Searching for Mendelian disease genes by exome sequencing seems like a straightforward exercise, but I think many of us have come to recognize that it’s not as easy at perhaps it should be. Inherited retinal diseases are a good model to help explain some of the difficulties:

    1. Modes of inheritance.

    Even with extensive pedigrees and good clinical phenotyping, the inferred mode of inheritance for retinal disorders can be wrong. A study this year by Steve Daiger’s group at the University of Texas- Houston, reported that around 10% of apparently dominant RP pedigrees turn out to be X-linked RP, often with expression of the disease in carrier females. Further complication arises from the fact that 6 of the 23 known genes that cause dominant RP are also linked to recessive RP.

    2. Refractory mutations.

    By this I mean segmental duplications, large-scale deletions, and other classes of variants that remain difficult to detect by current sequencing technologies. About 5% of disease-causing mutations in dominant RP are large deletions; these remain difficult to detect by capillary or even exome sequencing. Interestingly, many are associated with the gene PRPF31, the only known gene for which haploinsufficiency causes dominant RP.

    3. Unexpected causals.

    Ed Stone of the University of Iowa presented a fascinating report on Stargardt’s disease, one of the most common forms of inherited eye diseases and caused predominantly by mutations in ABCA4. His group tackled 208 patients with classic (autosomal recessive) Stargardt’s in which only one of the requisite 2 mutations in the gene was observed. It turned out that alternative splicing due to a noncoding or synonymous coding variant explained the biallelic gene loss in 40% of such cases. Many were in regions not captured by current exome kits.

    Targeted Panels versus Exomes

    Another important debate in retinal genetics surrounds the use of targeted panels versus exome sequencing as the front-line diagnostic tool. Currently, CLIA testing for dominant RP identifies the causal mutation in about 50% of cases. It’s conducted by PCR and Sanger sequencing, with a cost of about $10,000. It seems likely that this routine diagnostic tool will soon be replaced with a next-generation sequencing assay. The question is whether or not that should be a custom panel targeting known retinal disease genes (which seems to explain about 75% of the cases) or exome sequencing.

    Both approaches have strengths and weaknesses. Panel testing is cheaper than exome sequencing, can be tweaked to ensure coverage of important genes/regions, and generally yields results that can be confidently reported to the clinician. Exome sequencing provides more information and would likely be performed anyway should a panel test comes back negative. But it comes at a higher cost and might also not sufficiently cover some important targets. It seems likely, too, that an exome approach would often yield reports of “variant of unknown significance” which has limited use to clinicians (and patients for that matter).

    It may also turn up variants not relevant for the vision disorder but with important health implications nonetheless; whether and how to report such findings is an important ethical debate.

    Clinical Translation and Genetic Counseling

    An emphasis on patient counseling and care distinguishes ARVO from the genetics/genomics meetings I more often attend. Hearing from clinicians and genetic counselors provided a new perspective. These individuals are the front line for patients, whereas researchers like myself are often two or three levels removed. I might see sample codes, phenotype information, and demographics like age and ethnicity, but they know the faces and voices and families of their patients.

    As a result, we have some philosophical differences in how we conduct the research. I might be comfortable reporting a gene with, say, 90% probability of causing the disease. A clinician or genetic counselor might not.

    Ultimately, next-generation sequencing is destined for the clinic, to aid the diagnosis, prognosis, and treatment decisions for patients. Most of us in the field can see this. It must be said that not everyone is convinced, especially those on the clinical side. Some of them may take the initiative on their own, and come to our meetings (ASHG, AGBT, etc.) to learn about NGS and its potential. We will do even better, I think, by taking that message to them at their own society meetings and lecture halls. There is a great deal of common ground between research and clinical care; meetings like ARVO seem like a good place to find it.

     

    AddThis Social Bookmark Button

    Rare Variants in Human Disease

    April 29th, 2013
    rare variants in humans

    Image Credit: Nature (1000 Genomes)

    In the ten years since the first draft of the human genome was published, our knowledge of the relationship between genotype and phenotype has grown exponentially. Perhaps the most important achievement of the genome assembly is that it provided a reference, against which individual DNA sequences could be compared to identify sequence variation.

    The sequel to the Human Genome Project, the International HapMap Project, set out to build a map — specifically, a haplotype map — of common genetic variation in several world populations. And the sequel to that project, the 1,000 Genomes Project, sought to characterize the nature and extent of genetic variation, especially variants that were rare (MAF<0.05).

    HapMap and Genome-Wide Association Studies

    The completion of the HapMap marked an important turning point for human genetics: the possibility of genome-wide association studies (GWAS). With this map, and the high-throughput genotyping technologies that developed during the HapMap, it became feasible to rapidly genotype many samples at thousands of polymorphic, informative positions throughout the genome. And because these markers were chosen on the basis of linkage disequilibrium information, they managed to capture (represent) much of the variation in each individual.

    In the years that followed, researchers used GWAS to study a multitude of diseases and phenotypes. Many new genetic associations arose, as illustrated by this figure from the NHGRI GWAS database.

    gwas reports

    GWAS reports from 2005-2012 (Credit: NHGRI)

    At the same time, it became apparent that common variants alone did not explain the genetic component of most complex human diseases. And from the 1,000 Genomes Project, we began to understand that a significant proportion of variants in any individual genome (5-10%) have never been seen before.

    Rise of Rare Variants

    It has become clear that these rare variants are equally important for understanding the genetic basis of human disease. Next-generation sequencing makes it possible to identify these in individual samples, but understanding their contribution to phenotype presents some challenges:

    • By definition, rare variants will only be found in a small fraction of sequenced samples. 
    • Large cohorts will be required to assess genetic association of rare variants with a phenotype. We’re talking about thousands or tens of thousands of individuals.
    • Sequencing large numbers of samples, especially whole-genome sequencing, remains costly. Yes, I know, NGS is so cheap now. But when you multiply the cost for WGS (~$3,000) by the number of samples (say 1,000), suddenly we’re talking about $3 million.

    Currently, researchers are employing a number of strategies to get results without spending millions of dollars. In general, these strategies fall into two categories:

    1. Targeted sequencing, where only certain regions of the genome are sequenced (i.e. the exome)
    2. Fewer samples, with the idea of using statistical methods to address insufficient power

    Special Frontiers Issue: Rare Variants in Human Disease

    The development of statistical and analytical methods to uncover rare variants associated with human disease is a rapidly evolving field. This summer, I am helping edit an issue of Frontiers (now Nature Frontiers) entitled Identification of rare genetic variants contributing to human diseases. The deadline for abstracts is July 1. Desired topics include:

    1. Calling, genotyping, QC, and validation of rare variants
    2. Discovery of novel variants and de novo mutations
    3. Annotating and predicting function of rare variants
    4. Methods for discovering rare causal variants in Mendelian disease.
    5. Methods for associating rare variants with complex disease
    6. Burden and non-burden test methods for genes, pathways and other self-defined variant sets;
    7. Estimating rare variant effect size and heritability
    8. Integrative analysis of rare and common variants;
    9. Applying rare variant analysis to specific diseases

    For details, please visit the special issue’s page on the Frontiers web site.

    AddThis Social Bookmark Button

    Genotype Imputation Aids eQTL Discovery

    April 8th, 2013
    human-chimpanzee differences

    (image by Tim O’Brien)

    There are only about 20,000 genes in the human genome, but they generate a surprising amount of diversity. Given that 0.1% of DNA sequence differs when any two individuals are compared, and only 4% differs when a human and a chimpanzee are compared, it’s clear that protein-coding differences alone can’t account for how we differ from chimp and from one another.

    Differences in gene expression, however, are virtually unlimited in their ability to influence phenotypic diversity, and we know that many of those have a genetic basis. It stands to reason that achieving our goal of identifying functional noncoding variation will require a deep understanding of how transcription is controlled at the genetic level.

    Gene Expression Quantitative Trait Loci

    Gene expression levels themselves are a trait that can be studied and correlated to genetic variation in a particular individual. The identification of expression quantitative trait loci (eQTLs) offers insights into the mechanisms of gene transcription regulation, and also helps interpret the results of pure genetic studies, such as the GWAS.

    Since high-throughput gene expression and genotyping technologies became available, a number of studies have sought to use both on the same samples to better understand the relationship between genotype and transcription.

    At the same time, efforts such as the HapMap and 1,000 Genomes projects are creating incredible resources for understanding (and exploiting) the nature of genetic variation in humans. A new study on eQTLs in Genome Research demonstrates how leveraging those resources improves the power of eQTL detection, and may help uncover much of the “missing heritability” in complex human diseases.

    Gene Expression and SNP Genotype Data

    Liang et al generated a dataset of global gene expression and genome-wide SNP genotypes in two family cohorts:

    Panel Gene Expression Samples
    MRCA Affymetrix Hu133A 206 siblings of British descent ascertained from a child with asthma
    MRCE Illumina Human6 v1 550 children from 320 British families ascertained from a child with eczema

    All samples were genotyped using Illumina SNP chips (ILMN300K, ILMN100K, or both). You’ll notice, however, that the gene expression platforms were from different manufacturers. The probe design is quite different: Affy uses multiple 25 bp probes per transcript, whereas Illumina uses a single 50-mer, and there’s no guarantee that the probe sites overlap.

    Even so, without making any adjustments to the data, the authors could map 2,934 individual eQTLs from the combined dataset (1534 for MRCA/Affy, 1784 for MRCE/Illumina).

    Reducing False Positives and Increasing Power for eQTL Detection

    The authors investigated two strategies to evaluate and improve their eQTL detection. Because current literature suggests that most eQTL associations are in cis (gene expression correlates with variants near the gene), the proportion of distant (trans) associations provides a conservative metric of false discovery rate (FDR).

    eQTL false discovery rate

    Proportion of trans effects by eQTL rank (Liang et al 2013)

     Principal Components Analysis of Gene Expression

    To address the influence of non-genetic effects in their expression data, the authors estimated principal components (PCs) from the gene expression values in the family panels:

    • 69 PCs for the MRCA/Affy panel
    • 61 PCs in the MRCE/Illumina panels.

    These PCs were used as covariates in the genetic association analysis to control for non-genetic effects in the data. Notably, including top PCs yielded three times as many eQTL probesets as the un-PC-adjusted analysis.

    HapMap and 1,000 Genomes Genotype Imputation

    Genotype imputation — in which missing genotypes in a sample are inferred using a panel of [ethnically-matched] individuals that have already been genotyped at those positions — is a common practice when conducting genome-wide association studies. The thinking is that you can save money by genotyping only a few hundred thousand SNPs, but then impute genotypes for the rest. Well-characterized reference panels from the HapMap and 1,000 Genomes projects now make it possible to impute genotypes for millions of SNPs genome-wide.

    How accurate are imputed genotypes? How do they improve the results of genetic association studies? These are two important questions that the authors were able to address in this study. Using the MaCH progrem, they were able to identify stretches of shared haplotypes between the reference panels and their study samples, and impute ~2.5 million SNPs (HapMap) or ~7.4 million SNPs (1000G) into them. Because some samples were genotyped on two platforms (Illumina 100K and 300K), they were able to measure the accuracy of this imputation genome-wide at over 60,000 SNP positions. Here’s the correlation between imputed and actual genotypes:

    genotype imputation accuracy

    Genotype imputation accuracy (Liang et al, Genome Res. 2013)

    Overall, the imputation accuracy is pretty good, with HapMap SNPs yielding slightly better results that we can probably attribute to the precision of array-based genotyping compared to low-pass whole-genome sequencing. Furthermore, imputation yielded more power to detect eQTLs: 6-7% additional signals from imputing 2.4 million HapMap SNPs, and another 5-8% more signals from imputing ~8 million 1000G SNPs.

    genotype imputation increases power

    Increased power from imputation (Liang et al, Genome Res 2013)

    Imputation also provided a denser and more localized map of eQTLs, which the authors nicely demonstrate for a cis-eQTL identified for TIMM22.

    Before Imputation:
    eQTL before imputation
    After Imputation:
    eQTL after imputation

    Conclusion: PC Adjustment and Imputation for GWAS

    In summary, the authors have demonstrated that principal components analysis to control for non-genetic effects, and systematic imputation using HapMap/1000G data, serve to:

    1. Reduce false-positive associations
    2. Detect additional associations by increasing power
    3. Provide a denser and more precise map of genotype-phenotype associations

    These results demonstrate (to my satisfaction, at least) that genotype imputation should be a standard practice whenever possible for GWAS analysis, and that such studies will have even more to gain as we continue to build ever-more-powerful maps of human genetic variation.

    References
    Liang L, Morar N, Dixon AL, Lathrop GM, Abecasis GR, Moffatt MF, & Cookson WO (2013). A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines. Genome research, 23 (4), 716-26 PMID: 23345460

    AddThis Social Bookmark Button