Like most curmudgeons I fought the change as stubbornly as I could. Leave Maq behind for something else? Never! Yet over the past few months I have come to realize that BWA, as it’s called, is not bad. At our genome center we still generate both Maq and BWA alignments for Illumina data; thanks to the hard work of our automated pipeline (apipe) group, I never have to run either on my own. Admittedly, even without a learning curve to daunt me, I found myself reluctant to use the BWA results over Maq.
How I Came to Love BWA
Speed was obviously the most compelling argument. Whereas Maq can take 1-2 days to map a single lane of Illumina data to the human genome, BWA can do it in a few hours. Also, BWA outputs directly into SAM format, which we’ve universally (and gratefully) adopted as the standard for next-gen sequencing data. There are, as it turns out, many differences between Maq and BWA, which I’ve summarized in the table below:
|Algorithm:||Hash lookup table||Burrows-Wheeler Transform|
|Input format:||Binary FastQ||FastQ|
|Output format:||MAP file||SAM file|
|Gapped Alignment:||Paired-end only; one mate must map||Single-end and paired-end|
|Maximum Read Length:||63 bp||200 bp (aln) or 1,000 bp (dbtwsw)|
|Assembly / SNP calling:||Tools included||None. Use SAMtools.|
|Simulation Tools:||Tools included.||None.|
There are other differences, but these are the ones I find most important. The ability to take standard FastQ input is a plus for BWA, but its native SAM output is critical. Granted, it’s possible to convert the output of several aligners (Maq, Bowtie, NovoCraft, even BLAST) to SAM format, but a direct output is convenient. Compared to Maq, BWA is lean and mean – no assembly, SNP calling, or simulation tools – with the single purpose of mapping reads to a reference, and a reliance on companion software (SAMtools) for subsequent analysis.
Two new features in BWA – gapped alignment and the ability to take longer reads – are a devastating blow to other next-generation sequencing aligners. Now, it’s possible to map single-end reads with gaps (Novoalign) and to map longer 454 reads (SSAHA2). Like Maq, BWA has color-space functions as well.
The take-home message: BWA can map data from all three available next-generation sequencers – Illumina/Solexa, Roche/454, and ABI SOLiD.
The Heng Li Factor
No software tool can remain competitive without continued development. Thus, when I heard that Heng Li did not plan any further releases of Maq, I knew its days were numbered. Some of us had already gotten the impression that Heng was focusing on BWA more than Maq anyway, but that made it official. Maq is still an incredible piece of software, and represented a technological leap for NGS analysis that helped bring about the era of whole-genome sequencing. We were among the first to climb aboard the Maq train, and it took us far. Now, sadly but with eager anticipation, we transferred to the express train, the bullet train. BWA.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform Bioinformatics, 25 (14), 1754-1760 DOI: 10.1093/bioinformatics/btp324