The Open Source Software Debate in NGS Bioinformatics

The rise of next-generation sequencing technology has been a boon for the field of bioinformatics, since the unprecedented throughputs — along with the diversity of possible applications in research and healthcare — brought forth a new generation of software tools for sequence analysis and interpretation. The fact that the growth in sequencing throughput has outstripped Moore’s Law for several years running has forced incredible innovation in the field of bioinformatics tool development, because we no longer have the luxury of more computing power than we could ever possible use.

The demand for new and improved analytical tools has only increased as next-gen sequencing technologies became accessible to the wider research community. NGS bioinformatics has become an industry of its own: researchers can now make a career out of it, and countless private organizations are trying to sell it. Unlike the market for sequencing technology, which is dominated by Illumina, the market for sequence analysis tools and platforms remains wide open.

Open Source Software Innovation

Importantly, many of the most innovative and popular tools, such as the BWA-MEM aligner, are open source software packages developed by academic researchers. The free-to-use, open source license is undoubtedly a huge factor in their success, as it conferred several key advantages:

Rapid adoption by the research community to establish a strong user base
Community-sourced code improvements and support
Incorporation and expansion into other tools and pipelines
Free, fully-featured hosting on sites like SourceForge

There’s also a general sentiment among programmers and bioinformaticians — the people who build, implement, and apply software tools — that free, open-source software is a good and noble thing. Particularly when it provides a cheap alternative to commercial software monopolies, e.g. the rise of Linux as a competitor to Microsoft Windows.

Disadvantages of Free Open Source

Although choosing a free, open-source software model for bioinformatics tools has benefits, it also carries some disadvantages. Just as bioinformatics analysis is not free, neither is software development. Good bioinformatics software must be maintained, supported, and improved to remain competitive and useful in this rapidly-evolving field. This can become a substantial burden for developers, one that takes time away from developing new tools, writing grants, publishing papers, etc.. Often, promising bioinformatics tools don’t get this follow-through, which leads many researchers to hesitate before adding a new software component to their pipeline.

The open nature of the code can also be a disadvantage, because it allows one’s competitors to see exactly what was done, and how it was done — information that they can incorporate into their own competing tools. Most open source licenses also permit commercial entities to freely modify and adapt software into a “product” that they can sell for a profit. This is essentially the business model for many commercial NGS software providers.

The Financial Crunch

In theory, bioinformaticians can continue to develop and publish open source software because their work is supported by grant funding. This model works quite well in a world where there’s plenty of grant money to go around. But we don’t live in that world: research budgets are shrinking, and competition for grants is higher than ever. This means that the researchers who develop crucial bioinformatics tools may not be able to find the funding to improve and support it. That leaves two options:

Stop supporting, developing, and improving the software tool, or
Identify new, alternative funding sources that could support the work

A possible solution for door number two would be to find a way to generate revenue from bioinformatics tools. In theory, allowing them to be licensed by other groups and organizations could provide the funds to sustain development. Of course, one loses some of the advantages of free open-source software: this will limit adoption of the tool, and can also be damaging to its reputation.

The Middle Ground: Free for Non-Commercial Use

There is a middle ground, which is developing open source software packages that are free for non-commercial use. In essence, this allows researchers at academic and nonprofit institutions to use the tool without paying for it. Commercial users, however — biotechs, pharmaceutical companies, bioinformatics software/service providers — must negotiate and pay for a license. It’s not a perfect solution, because such a license does not allow a tool to be hosted on SourceForge and may limit adoption of the tool by the private research community.

This is the licensing model that we use for VarScan, our tool for identifying germline variants and somatic mutations in NGS data. The license, as made formal in the publications in Bioinformatics and Genome Research, is “free for non-commercial use.” In other words, the binaries and source code are freely available, and researchers at nonprofit/academic institutions can use them however they’d like to. Commercial users, however, must obtain a license through WashU’s Office of Technology Management (OTM).

We are not alone in this: you’ll notice that other widely-used NGS analysis tools such as SOAP2 and GATK also have a license requirement for commercial users. We’re all motivated by the same thing: the desire to continue supporting our software for the research community, but the inability to support that by grants alone. I’m sorry, but there’s just not enough public funding that supports bioinformatics software maintenance and improvement. There’s not enough public funding in general. Call your elected officials and point this out, please. In the current funding climate, licensing software to commercial entities is often the only way to survive.

And I’ll go out on a limb here to argue that it’s only fair. Biotechs and pharmaceutical companies use next-gen sequencing to develop patents, drugs, crops, other products that they sell for enormous profits. NGS software companies charge steep prices for their products and services, most of which they sell to biotechs, hospitals, and clinics. Rather than all profits going to the shareholders of such companies, a small portion should perhaps support the researchers and institutions who developed these tools in the first place.

Comments

BrianRepko says

November 13, 2015 at 2:10 pm

I would argue for a Creative Commons Attribution / Non-Commercial / ShareAlike license personally. But there are a lot of software engineers out there that can help make bioinformatics software better. The challenge is how to get them involved when they work at a commercial entity. Commercial groups are willing and able to do this and actually provide joint funding when it is not a competitive advantage – which this type of software rarely is. You could stick with Non-Commercial and then consortium-based licensing or reduced licensing costs for share-alike vs non-share-alike?
wmnwmn says

November 28, 2015 at 10:46 am

I’ve been saying this for years. I think the NSF/NIH needs to take the lead an put into its grants money for academic end-users to PAY FOR functioning software, instead of paying only for grad students to waste their time kludjing something from ten different “free” packages that are all halfway maintained.
The model where NSF and other grants fund development of tools which are then supposed to be provided free is a bad model, or at least an incomplete one.
Bioinformatics isn’t new anymore…it needs large and well-maintained software packages. The flip side is that academic bioinformaticists aren’t likely to be the ones developing such packages, which will need a conventional software dev team plus testers etc. In other words, a software company.