Heng Li’s brilliant short read alignment tool finally went legit with a publication in Genome Research that came online this month. It’s an important milestone for the open source tool that, by most accounts, out-performs just about every next-gen alignment algorithm to come out.
To commemorate the occasion, I decided to put together this list of my Ten Favorite Things About Maq:
10. The map file. This single file is a one-stop shop. It keeps the alignments, sequences, everything you need to process Solexa data.
9. Random placement. Reads in repeats are assigned alignment scores of zero and randomly placed, which helps paint a more accurate picture of the sequencing coverage across a genome.
8. Conversion tools. Although you have to convert just about any input file 2-3 times, at least maq provides all of the conversion scripts.
7. No gaps, please. Maq generally won’t even try for gapped alignments for short reads, a decision that I wholeheartedly support.
6. The version. It’s widely used and well documented, yet the version’s not even to 0.7.
5. Binary files. You know a program’s fast when it won’t touch ASCII input.
4. Alignment qualities. The real reason maq is superior to most aligners: maq uses individual base qualities when searching for a read’s best alignment.
3. Read simulation. Maq will “train” itself on a real data set, then generate simulated Solexa reads from a reference sequence based on the “real” data characteristics.
2. Good docs. For once, software that comes with complete, usable documentation.
1. The name. You might think “maq” is confusing, but it’s better than the old name, mapASS.
It reminds me of this gem of dialogue from “The Princess Bride”:
Prince: Such an unusual name, “Latrine.” How did your family come by it?
Latrine: We changed it in the 9th century.
Prince: You mean you changed it TO “Latrine”?
Latrine: Yeah. Used to be “Shithouse.”
Maq is good stuff. Thanks, Brian, for showing me the light.