The first day of AGBT was an exciting glimpse of what’s to come. On the shuttle from the airport I met Aaron Quinlan, formerly of Gabor Marth’s lab, who will give a talk on SV detection in mouse genomes. While we talked, Jon Armstrong from WashU put the finishing touches on his talk concerning WU-CAP, the WashU solution-phase capture platform.
My own poster on short read aligners was done, though as usual I wished for time to do even more with it. In the end I focused on 10 aligners, but more on that later.
The opening sessions offered tantalizing views of some of the running themes of this meeting. Jay Shendure gave a great overview of different approaches to capture and their principal areas of application. Ryan Morin, a graduate student in the Marra lab, walked us through a summary of transcriptome sequencing and how it can be used not only for gene expression, but also discovery of mutations and characterization of alternate splicing. He also pointed out that RNA-Seq coupled with paired-end libraries can be used to detect fusion transcripts in a straightforward fashion. Next up was Alex Meissner of the Broad Institute, who provided the epigenomics perspective – particularly how ChIP and bisulfite sequencing can be used to examine epigenetic modifications (DNA methylation and histone modifications).
Illumina Makes Ambitious Promises for 2009
One of the highlights for me was the Illumina “exclusive” event just before dinner. After a brief marketing intro, the Illumina CIO presented some improvements to the Illumina pipeline software (better clustering and quality scoring) that should add 20% more data to the already-impressive 15-20 GB of output from the GAII. The newer software can be used to re-process old data, but unfortunately, it needs all of the image files to do so (who keeps them!?). His last slide was a plot of Illumina quality compared to Sanger (from Maq) quality for a 36-bp read, which showed that the two scoring systems track pretty well, though Sanger scoring has a notable drop between bases 28 and 29 (this we knew). Still, it was amusing to hear them discuss Maq so offhandedly, something that inspired my question at the end.
Next they presented another improvement to the system that yielded insight into why we see so many false-positive SNP calls from Illumina. It turns out that once DNA pieces are attached to a flowcell, they’re amplified using a polymerase that Illumina admitted, “has a lot of mis-incorporation.” If a mis-incorporation event happens early, it’s propagated to all of the subsequent copies, and thus hard to distinguish from a real mutation. Evidently they are releasing a new higher-fidelity polymerase to address this problem. The software and polymerase together will bring output to at least 20GB per run. Also, there’s a machine upgrade which uses a new manifold design to get more “imageable” area on the flowcell.
Together, these improvements are being released as the Illumina GAIIx “upgrade” and will yield 35 GB per run with higher sequence quality. More intriguing were the slides about Illumina “long reads” – 2×50, 2×75, 2×100, and even 2×125. While the fact that you need multiple kits to get these read lengths was conveniently not mentioned, I must admit that they’re starting to push into the 454 domain.
Solexa + BeadArray = Semi Ordered Cluster Array
Then Illumina gave us the real highlights. They’re working to combine the Solexa technology with the BeadArray technology into a “semi-ordered cluster array” that will squeeze more reads onto a flowcell. Using micron-sized beads, by around Q3 this year they hope to have outputs of 50GB per run. Using sub-micron beads, with 2×150 libraries, by the end of this year Illumina promised 95 GB per run.
That’s an enormous wealth of data, and at the end of the talk, I had to ask: are you planning improvements to ELAND to catch up with the performance of other aligners (note subtle slam of ELAND) and to address the longer reads? Of course, his answer was “yes” – they plan to both improve ELAND performance, and make the entire pipeline more plug-and-play so that you can use Maq or other aligners in its place.