Last week I attended a meeting on Semantic Physical Science organized by Peter Murray-Rust and colleagues. The stand-out talk for me was given by Cameron Neylon on why good software engineering practice is so important in science and how the scientific publishing market is in dire need of change.

He started by listing a few good reasons for scientists to take good software engineering seriously.

  • Firstly, one of our major purposes in academia is to produce a trained workforce, many of whom will move into industry where good software engineering skills will be an asset and a requirement.
  • Secondly, it simply makes our lives easier. A bare minimum of effort invested in version control and good documentation, for example, is repaid many times over when we share our code with others or come back to it ourselves at a later date.
  • Thirdly, practices such as unit testing, continuous integration etc. are thoroughly compatible and supportive of the repeatability, consistency etc. we expect of good practice in science. Also, good software engineering can inform and translate into the experimental domain.

Unfortunately, science (perhaps especially so in the long tail) is plagued by badly written code, bad habits and the consequent inability or unwillingness to share code and data alongside publications. A large part of the problem is that our current incentive system in academia, centred as it is around the journal publication and various measures such as the number of citations of a publication, does not reward the generation of high quality software that might be used by others.

Cameron’s answer is to “hack the system” – take the existing measures and play with them a little. He has recently established a new journal: “Open Research Computation” under the aegis of open access publisher Biomed Central. ORC will take submissions focussing on software developed for use in any area of science, “algorithms, useful code snippets, as well as large applications or web services, and libraries.” As Cameron pointed out, if ORC publishes 100 papers/year with perhaps 5-10 papers on software with a substantial number of users, ORC stands a good chance of gaining a respectable impact factor.

On the scientific publishing industry in general, the current problems with the proposed Research Works Act (not to mention SOPA & PIPA) in the US have been well documented elsewhere. Cameron’s view is that the market is simply now broken. It used to be that the distribution of paper copies of journals was the main service provided by publishers. Now that everybody reads journals online, the cost of distribution is essentially zero. All the costs are now in the process of generating the first copy of an article. In fact, the main service now provided by publishers is arranging for the peer review of articles. This is a service worth paying for. Why not re-configure the market to recognize this?

Of course, this won’t happen overnight. The efforts of open access publishers such as the Public Library of Science and Biomed Central (both of which charge authors a publication fee, rather than charging readers) are a great step in the right direction.

Another great place to start is to encourage funders, research institutions and publishers to recognise the importance of quality software engineering in science. Educational resources such as Software Carpentry deserve our support.

My own view is that the UK eScience programme recognised early on the importance of good software engineering. The most successful eScience projects, among them RealityGrid and MyGrid, employed full time software engineers. The annual All Hands meeting regularly featured papers focussed on new software tools and methods. One of the recommendations of the RCUK review of eScience in 2009 was that “professional software engineers or informatics specialists who build reliable production-grade systems” in academia need better defined roles, reward structures and career progression. My hope is that efforts like Open Research Computation will help bring this to fruition.

Suffusion theme by Sayontan Sinha