Archives for February 2009

AGBT Day 2: Pac Bio, Helicos, and Complete Genomics

February 6, 2009 by Dan Koboldt

The full complete day of Marco Island was overwhelming. It opened with a general session in the morning, a diverse and fascinating set of talks, including:

Jun Wang from BGI’s “Sequencing, Sequencing, Sequencing” in which he began to illustrate the quietly-growing powerhouse of sequencing in China. Three campuses, 1,000 people, and who knows how many sequencers. They completed their first whole genome (a Han Chinese) late last year, and are also sequencing the rice and panda genomes. Dr. Wang’s explanation: “If it’s cute, sequence it. Tastes good? Sequence it.” Hilarious.
Eddy Rubin from from the DOE-funded Joint Genome Institute on Genomics of Cellulosic Biofuels – a fascinating overview of how DOE is driving sequencing of switchgrass, poplar trees, and numerous other plants/organisms to extract sugar (and thence alcohol) from “biomass” that can be grown on marginal lands.
Cancer talks – an overview of Canadian efforts and the ICGC by Tom Hudson, and Phil Stephens (in for A. Futreal) on structural variation in cancer.
Comparative analysis of 2X mammalian genomes by Adam Siepel, a well-spoken individual who’s charged with correcting and analyzing the 40+ vertebrate genomes that have been sequenced to light coverage.

Strange Cats: Pacific Biosciences and Helicos

I was eager to hear presentations from the third-generation sequencing companies. First came Pacific Biosciences, which made a big splash last year and seems to have been the golden child since. Their SMRT-seq technology uses color fluore labeling of nucleotides that are read (at 3 bases/second) with a “zero mode waveguide.” At the start of the talk, he fired off a real-time run that scrolled along colorfully at the bottom of his screen for the rest of the slides. At present, they have 12 prototype instruments in house that run 3,000 reactions in parallel with average read lengths of 964 bp. Since their recent proof-of-principle paper, they’ve also sequenced a 107-kb human BAC (to 68X) and a strain of E. coli (to 38X). They claim accuracies of over 99%, though in the human BAC their technology missed 3/24 SNPs in non-repetitive regions and 7/20 SNPs in repetitive regions. Even so, the data they showed was impressive, and I’m sure I’m not the only one eager for their early-access release planned for the second half of 2009.

Immediately following was Helicos, a company that I’d first heard of a few years go, but haven’t noticed much since. Amusingly, they opened with the quote: “The rumors of our demise are greatly exaggerated.” You have to appreciate candor like that! Then came a not-too-subtle jab at the previous speaker: “Unlike technologies that go in a circle, we are going to show you true single-molecule data…” They sequenced the canonical strain (N2-Bristol) of C. elegans to 27X coverage (7 channels, 88 million reads, with an average error rate of 3.4%). The concerns I had were two-fold: first, error rates are fairly high, which will severely affect variant detection. Second, their reads are only 33 bp long, and they readily admit that they’re not sure why.

Daniel MacArthur covered Complete Genomics

I chose a different session to attend, but Daniel MacArthur has a detailed profile of Complete Genomics’ presentation yesterday. Though I doubt the big genome centers will have need of a sequencing-for-service-only company, the data they presented must have been impressive, because he was pretty fired up afterward.

AGBT Day 1: Illumina Shines

February 5, 2009 by Dan Koboldt

The first day of AGBT was an exciting glimpse of what’s to come. On the shuttle from the airport I met Aaron Quinlan, formerly of Gabor Marth’s lab, who will give a talk on SV detection in mouse genomes. While we talked, Jon Armstrong from WashU put the finishing touches on his talk concerning WU-CAP, the WashU solution-phase capture platform.

img_6586c My own poster on short read aligners was done, though as usual I wished for time to do even more with it. In the end I focused on 10 aligners, but more on that later.

The opening sessions offered tantalizing views of some of the running themes of this meeting. Jay Shendure gave a great overview of different approaches to capture and their principal areas of application. Ryan Morin, a graduate student in the Marra lab, walked us through a summary of transcriptome sequencing and how it can be used not only for gene expression, but also discovery of mutations and characterization of alternate splicing. He also pointed out that RNA-Seq coupled with paired-end libraries can be used to detect fusion transcripts in a straightforward fashion. Next up was Alex Meissner of the Broad Institute, who provided the epigenomics perspective – particularly how ChIP and bisulfite sequencing can be used to examine epigenetic modifications (DNA methylation and histone modifications).

Illumina Makes Ambitious Promises for 2009

One of the highlights for me was the Illumina “exclusive” event just before dinner. After a brief marketing intro, the Illumina CIO presented some improvements to the Illumina pipeline software (better clustering and quality scoring) that should add 20% more data to the already-impressive 15-20 GB of output from the GAII. The newer software can be used to re-process old data, but unfortunately, it needs all of the image files to do so (who keeps them!?). His last slide was a plot of Illumina quality compared to Sanger (from Maq) quality for a 36-bp read, which showed that the two scoring systems track pretty well, though Sanger scoring has a notable drop between bases 28 and 29 (this we knew). Still, it was amusing to hear them discuss Maq so offhandedly, something that inspired my question at the end.

Next they presented another improvement to the system that yielded insight into why we see so many false-positive SNP calls from Illumina. It turns out that once DNA pieces are attached to a flowcell, they’re amplified using a polymerase that Illumina admitted, “has a lot of mis-incorporation.” If a mis-incorporation event happens early, it’s propagated to all of the subsequent copies, and thus hard to distinguish from a real mutation. Evidently they are releasing a new higher-fidelity polymerase to address this problem. The software and polymerase together will bring output to at least 20GB per run. Also, there’s a machine upgrade which uses a new manifold design to get more “imageable” area on the flowcell.

Together, these improvements are being released as the Illumina GAIIx “upgrade” and will yield 35 GB per run with higher sequence quality. More intriguing were the slides about Illumina “long reads” – 2×50, 2×75, 2×100, and even 2×125. While the fact that you need multiple kits to get these read lengths was conveniently not mentioned, I must admit that they’re starting to push into the 454 domain.

Solexa + BeadArray = Semi Ordered Cluster Array

Then Illumina gave us the real highlights. They’re working to combine the Solexa technology with the BeadArray technology into a “semi-ordered cluster array” that will squeeze more reads onto a flowcell. Using micron-sized beads, by around Q3 this year they hope to have outputs of 50GB per run. Using sub-micron beads, with 2×150 libraries, by the end of this year Illumina promised 95 GB per run.

That’s an enormous wealth of data, and at the end of the talk, I had to ask: are you planning improvements to ELAND to catch up with the performance of other aligners (note subtle slam of ELAND) and to address the longer reads? Of course, his answer was “yes” – they plan to both improve ELAND performance, and make the entire pipeline more plug-and-play so that you can use Maq or other aligners in its place.

Countdown to AGBT

February 2, 2009 by Dan Koboldt

The Marco Island meeting is just a few short days away! As I look over the agenda, I find myself more than a little daunted at the esoteric group of scientists who’ll be presenting. From WashU there will be talks by David Dooling on next-gen informatics, Todd Wylie on miRNA sequencing, and Jon Armstrong on our liquid-phase capture technology. I’ve seen early versions of these talks, and all look to be outstanding. Our center’s leadership will be well-represented; Elaine Mardis is on the organizing committee and will chair the “New Genomic Frontiers” session on Saturday just after a keynote session by our own Rick Wilson. I’m also looking forward to hearing:

Chad Nusbaum (Broad) on technology platforms.
Keynotes by Eric Lander (Broad), Eddy Rubin (JGI), Richard Gibbs (Baylor), and Kari Stefansson (deCODE)
Tom Hudson (OICR), an old friend from the HapMap Project
Andy Futreal (Sanger) on structural variation in cancer
Howard McLeod (UNC, formerly WashU) on pharmacogenomics
Len Pennacchio (LBNL) on ChIP-seq to predict tissue-specific enhancers
Workshops by Illumina/Solexa and Roche/454
Presentations by Pacific Biosciences and Complete Genomics

Genomics Bloggers Will Meet!

I’m also looking forward to meeting Daniel MacArthur, of the Genetic Future blog (and also Sanger) who will give a talk on molecular bar-coding with Illumina sequencing. We’re planning to get together with David Dooling (PolITigenomics), Anthony Fejes (Fejes.ca), and other bloggers from the genomics community for lunch one day. Please shoot one of us an e-mail if you’d like to join us!

« Previous Page