Comparison of Benchtop Sequencers

Working at a major genome center can skew one’s view of the scientific community. You forget, for example, that not every research lab has access to dozens of next-gen sequencers churning out data and an entire building of computing infrastructure to help analyze it. In fact, there’s a very strong market for smaller, cheaper instruments that meet the needs and budget requirements of a smaller lab. Three different benchtop next-gen sequencers have come on the market to address that need: the 454 GS Junior (Roche), MiSeq (Illumina), and Ion Torrent PGM (Life Technologies).

This week in Nature Biotechnology, Nick Loman and his colleagues from the University of Birmingham (UK) present a performance comparison of these three instruments by sequencing an E. coli strain linked to an outbreak of food poisoning in Germany last year. This experiment is well-suited to the comparison for two reasons:

This strain of E. coli has already been characterized by previous whole-genome sequencing efforts; the key genetic structures underlying its toxicity have been extensively characterized.
The rapid turnaround and reduced throughput of benchtop sequencers is ideal for sequencing bacterial genomes.

Reference Assembly for E. coli O104:H4 strain 280

To enable comparisons of the benchtop sequencers, the authors first generated a reference assembly for the bacterial isolate. Their sample came from a female traveler who, after traveling to Germany, developed hemolytic uremic syndrome and thrombotic thrombocytopenic purpura, which I take as fancy terms for “really bad food poisoning.” Using standard (not benchtop) Roche/454 instruments, the authors generated two datasets:

Long reads on the 454 FLX+ platform, with a model read length of 812 bp and maximum read lengths >1100 bp.
Large-insert (8kb) paired-end reads on the 454 FLX platform with Titanium chemistry.

The combination of these datasets resulted in a “very high quality draft genome assembly” with three scaffolds, which are essentially big blocks of assembled sequence. The largest of these corresponded to the 5.3 Mbp bacterial chromosome, while two smaller scaffolds corresponded to two large E. coli plasmids. There were still some gaps in these assemblies, but overall they were pretty good.

Benchtop Sequencer Characteristics

There are a number of differences among the three benchtop platforms in terms of instrument cost, run time, sample prep, throughput, and even (as the authors found) data quality, all of which are important considerations for the lab looking to buy one. That’s part of why this study is important; it provides a direct and somewhat unbiased comparison of these platforms in a real-world application setting.

You should be aware of some possible financial conflicts of interest: the first author has been a paid speaker at IonTorrent and Illumina meetings, and the senior author won his IonTorrent instrument through the European PGM Grant Program. In spite of this, I find their comparison to be very fair with regard to all three platforms.

Platform	454 GS Junior	IonTorrent PGM	Illumina MiSeq
Instrument Cost:	$108,000	$80,490	$125,000
Sample Prep:	Emulsion PCR	Emulsion PCR	On-instrument
Run Time:	4h	3h	27h
Cost per Run:	$1,100	$425 (316 chip)	$750
Throughput/run:	71-72 Mbp	260-304 Mbp	1,653 Mbp
Avg. Read Length:	522 bp	123 bp	2 x 150 bp
Reads Aligned:	99%	90%	99%

Note that the MiSeq throughput was enough that the authors multiplexed some other samples on the same run; the actual dataset generated for the comparison totaled 250 Mbp.

Base Quality Score

Each instrument manufacturer has its own software algorithm to generate base qualities, so a direct comparison of these is difficult. To address this, the authors recalibrated base qualities by alignment to the reference genome. Their quality score takes into account the number of matches and mismatches between read and reference sequence, since these generally represent sequencing errors. By this metric, the MiSeq produced the highest-quality reads with few mismatches and virtually no indel errors. There was generally good agreement between this score and the one provided by the manufacturer’s software, though the PGM slightly under-estimated base quality and the other instruments slightly over-estimated it.

Homopolymer Errors on 454 and Ion Torrent

The 454 sequencing platform is infamous for sequencing errors associated with runs of a single base (homopolymers). Indeed, the base quality recalibration revealed 0.38 indel errors per 100 sequenced bases, or 1.74 indels per read. This issue was a concern for the IonTorrent platform as well; homopolymer-associated errors were quite obvious in the first public release of IonTorrent 316 chip data despite the spoken assurances from Jonathan Rothberg at AGBT 2010 when the question was raised by a certain blogger.

In the current study, homopolymer issues were again apparent on the Ion Torrent PGM platform; there were 1.5 indel errors per 100 bases, or 1.72 indels per read. Even homopolymers of 2-3 bases caused a significant number of sequencing errors. This put the PGM at a disadvantage for sequence assembly; it had large numbers of gaps in its assemblies relative to the other two platforms, likely because it could not match the accuracy of the MiSeq or the read length of the GS Junior.

Comparison of De Novo Assemblies

Speaking of assemblies, we must appreciate the work of Nick Loman and his co-authors in dutifully generating assemblies with four different assembly programs (MIRA, Newbler, Velvet, and CLC Assembly Cell). That’s a lot of work. Depending on how assemblies were generated, they fell into two groups;

Both IonTorrent PGM datasets, single 454 GS Junior runs, and single-end (ignoring pairing) MiSeq data yielded heavily fragmented assemblies
Combining both GS Junior runs, or utilizing the read pairing information in MiSeq, yielded less fragmented assemblies.

None of the assemblies aligned unambiguously to cover 100% of the high-quality reference that was generated from long-read and long-insert data. Contigs from the 454 data covered a greater proportion of the reference (96.28%) than MiSeq (96.05%) or PGM (95.4%).

Choosing a Benchtop Sequencer

To their credit, the authors find advantages for each benchtop sequencer. The GS Junior, though affected by homopolymer issues, yielded the longest read lengths and the greatest assembly coverage. The MiSeq offered the highest throughput and sequencing accuracy; it assemblies also generated accurate MLST (multi-locus sequence typing) profiles for the E. coli strain. It’s also the only platform that doesn’t require emulsion PCR as part of the sample prep. The IonTorrent is the lowest-price instrument, and offers greater flexibility in reagent costs because three different chips are available. It’s also a platform undergoing rapid development and improvement.

There’s no clear winner among benchtop sequencers in this comparison, meaning that researchers will have to consider all of the pros and cons to make the choice that’s best for their individual needs.

References

Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, & Pallen MJ (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nature biotechnology PMID: 22522955

Comments

Gavin Oliver says

April 27, 2012 at 1:07 pm

Great work by Nick and team. He was kind enough to summarise some of his findings for me pre-publication which helped guide our own benchtop purchase.

I’m just wondering if generating the reference genome by 454 sequencing is likely to introduce homopolymer issues to the reference and/or improve the 454 results in any way? I don’t have much hands-on experience of the PGM or 454 homopolymer problems so I might be way off!