Heng Li’s brilliant short read alignment tool finally went legit with a publication in Genome Research that came online this month. It’s an important milestone for the open source tool that, by most accounts, out-performs just about every next-gen alignment algorithm to come out.
To commemorate the occasion, I decided to put together this list of my Ten Favorite Things About Maq:
10. The map file. This single file is a one-stop shop. It keeps the alignments, sequences, everything you need to process Solexa data.
9. Random placement. Reads in repeats are assigned alignment scores of zero and randomly placed, which helps paint a more accurate picture of the sequencing coverage across a genome.
8. Conversion tools. Although you have to convert just about any input file 2-3 times, at least maq provides all of the conversion scripts.
7. No gaps, please. Maq generally won’t even try for gapped alignments for short reads, a decision that I wholeheartedly support.
6. The version. It’s widely used and well documented, yet the version’s not even to 0.7.
5. Binary files. You know a program’s fast when it won’t touch ASCII input.
4. Alignment qualities. The real reason maq is superior to most aligners: maq uses individual base qualities when searching for a read’s best alignment.
3. Read simulation. Maq will “train” itself on a real data set, then generate simulated Solexa reads from a reference sequence based on the “real” data characteristics.
2. Good docs. For once, software that comes with complete, usable documentation.
1. The name. You might think “maq” is confusing, but it’s better than the old name, mapASS.
It reminds me of this gem of dialogue from “The Princess Bride”:
Prince: Such an unusual name, “Latrine.” How did your family come by it?
Latrine: We changed it in the 9th century.
Prince: You mean you changed it TO “Latrine”?
Latrine: Yeah. Used to be “Shithouse.”
Maq is good stuff. Thanks, Brian, for showing me the light.
Two more favorites:
10: The Sanger pileup file. This seems like great one stop shopping for track information, including SNPs, quality scores, and position within read if you so desire.
11: Heng. Incredibly responsive to questions and requests
One drawback:
Maq can spend an inordinate amount of time on low complexity or repetitive sequences. Would be nice to have an option to disregard things with a certain number of seed hits.
12. Integration with other aligners if you dont want to use “maq map” to do alignments. In the latest development version of maq there are converters for eland (eland2maq) and novocraft (novo2maq) and you still have the ability to use all those neat downstream tools like assemble, cns2win, cns2snp,mapstat, abpair, mapstat,etc
I’ll second the point (11), Li Heng is fantastic with helping out on maq-related problems.
Thanks to the folks at GT for picking up this post in the Daily Scan:
http://www.genome-technology.com/issues/blog/general/149036-1.html
The quote is not from “The Princess Bride” but from “Men in Tights”