Working at a major genome center can skew one’s view of the scientific community. You forget, for example, that not every research lab has access to dozens of next-gen sequencers churning out data and an entire building of computing infrastructure to help analyze it. In fact, there’s a very strong market for smaller, cheaper instruments that meet the needs and budget requirements of a smaller lab. Three different benchtop next-gen sequencers have come on the market to address that need: the 454 GS Junior (Roche), MiSeq (Illumina), and Ion Torrent PGM (Life Technologies).
This week in Nature Biotechnology, Nick Loman and his colleagues from the University of Birmingham (UK) present a performance comparison of these three instruments by sequencing an E. coli strain linked to an outbreak of food poisoning in Germany last year. This experiment is well-suited to the comparison for two reasons:
- This strain of E. coli has already been characterized by previous whole-genome sequencing efforts; the key genetic structures underlying its toxicity have been extensively characterized.
- The rapid turnaround and reduced throughput of benchtop sequencers is ideal for sequencing bacterial genomes.
Reference Assembly for E. coli O104:H4 strain 280
- Long reads on the 454 FLX+ platform, with a model read length of 812 bp and maximum read lengths >1100 bp.
- Large-insert (8kb) paired-end reads on the 454 FLX platform with Titanium chemistry.
Benchtop Sequencer Characteristics
|Platform||454 GS Junior||IonTorrent PGM||Illumina MiSeq|
|Sample Prep:||Emulsion PCR||Emulsion PCR||On-instrument|
|Cost per Run:||$1,100||$425 (316 chip)||$750|
|Throughput/run:||71-72 Mbp||260-304 Mbp||1,653 Mbp|
|Avg. Read Length:||522 bp||123 bp||2 x 150 bp|
Note that the MiSeq throughput was enough that the authors multiplexed some other samples on the same run; the actual dataset generated for the comparison totaled 250 Mbp.
Base Quality Score
Each instrument manufacturer has its own software algorithm to generate base qualities, so a direct comparison of these is difficult. To address this, the authors recalibrated base qualities by alignment to the reference genome. Their quality score takes into account the number of matches and mismatches between read and reference sequence, since these generally represent sequencing errors. By this metric, the MiSeq produced the highest-quality reads with few mismatches and virtually no indel errors. There was generally good agreement between this score and the one provided by the manufacturer’s software, though the PGM slightly under-estimated base quality and the other instruments slightly over-estimated it.
Homopolymer Errors on 454 and Ion Torrent
The 454 sequencing platform is infamous for sequencing errors associated with runs of a single base (homopolymers). Indeed, the base quality recalibration revealed 0.38 indel errors per 100 sequenced bases, or 1.74 indels per read. This issue was a concern for the IonTorrent platform as well; homopolymer-associated errors were quite obvious in the first public release of IonTorrent 316 chip data despite the spoken assurances from Jonathan Rothberg at AGBT 2010 when the question was raised by a certain blogger.
In the current study, homopolymer issues were again apparent on the Ion Torrent PGM platform; there were 1.5 indel errors per 100 bases, or 1.72 indels per read. Even homopolymers of 2-3 bases caused a significant number of sequencing errors. This put the PGM at a disadvantage for sequence assembly; it had large numbers of gaps in its assemblies relative to the other two platforms, likely because it could not match the accuracy of the MiSeq or the read length of the GS Junior.
Comparison of De Novo Assemblies
Speaking of assemblies, we must appreciate the work of Nick Loman and his co-authors in dutifully generating assemblies with four different assembly programs (MIRA, Newbler, Velvet, and CLC Assembly Cell). That’s a lot of work. Depending on how assemblies were generated, they fell into two groups;
- Both IonTorrent PGM datasets, single 454 GS Junior runs, and single-end (ignoring pairing) MiSeq data yielded heavily fragmented assemblies
- Combining both GS Junior runs, or utilizing the read pairing information in MiSeq, yielded less fragmented assemblies.
None of the assemblies aligned unambiguously to cover 100% of the high-quality reference that was generated from long-read and long-insert data. Contigs from the 454 data covered a greater proportion of the reference (96.28%) than MiSeq (96.05%) or PGM (95.4%).
Choosing a Benchtop Sequencer
To their credit, the authors find advantages for each benchtop sequencer. The GS Junior, though affected by homopolymer issues, yielded the longest read lengths and the greatest assembly coverage. The MiSeq offered the highest throughput and sequencing accuracy; it assemblies also generated accurate MLST (multi-locus sequence typing) profiles for the E. coli strain. It’s also the only platform that doesn’t require emulsion PCR as part of the sample prep. The IonTorrent is the lowest-price instrument, and offers greater flexibility in reagent costs because three different chips are available. It’s also a platform undergoing rapid development and improvement.
There’s no clear winner among benchtop sequencers in this comparison, meaning that researchers will have to consider all of the pros and cons to make the choice that’s best for their individual needs.
Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, & Pallen MJ (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nature biotechnology PMID: 22522955