2010-02-02

Credibility of Science and Open Source Methodology

Having discussed the issue of the need for the freedom of data in Credibility of Science and the Freedom of Data given the slightly tarnished credibility of the sciences, in this post, I'll speak briefly on the issue of the methodology in science and why open sourcing it would help.

Here's what I mean for the methodology of an experiment to be open source: that is all the steps and instruments must be open to careful examination and testing. All the steps from the test tube to the data analysis must be open for examination to ensure validity. Traditionally, this hasn't been a problem at all. Beakers, test tubes, thermometers, etc, are physical objects inherently open to inspection, in case the instruments themselves added something into the materials, ingredients, or skew the data of the experiment. More and more, however, the modern instruments are becoming opaque to the scientists, especially as its insides are often controlled with computer and software technologies. Unfortunately, software is notoriously closed off and difficult to inspect.

The issue of computers and software getting in the way of reproducible experiments in science, especially physical science (as opposed to math type science), has been discussed elsewhere as well (see Keeping computers from ending science's reproducibility). The problem isn't just confined in the physical sciences though and in fact it's widespread through math and computing science as well. In fact, the need for open source systems was one of the motivations in the development of the open source mathematics software, Sage (see Mathematical Software and Me: A Very Personal Recollection).

This problem is widespread in computing science as well, the area of research I'm currently engaged in. A huge amount of research into computer vision and machine learning is done using Matlab, for example, which is why open source projects like Octave is so important. Yet if computing scientists continue to use Matlab, communicating and depending on results obtained through it and other proprietary closed source software, then it seems questionable how the existence of Octave helps (cf, if we have the cure to cancer, but no one uses it, how effective is the cure? — not to say that any software is like cancer: focus, instead, on the problem of the ineffective cure).

Ultimately, scientists needs to take action themselves to use tools that are open for examination and testing. The public credibility of science rests largely on the premise that it's participatory in nature, the promise that anyone with the time can obtain the tools and material to repeat any experiment they like so they can see for themselves that what the scientists say is in fact true (ie, "don't take my word for it, try it for yourself!").

Not only does science have to be conducted transparently, it must appear transparent, if we want people to trust the results obtained by scientists. Closed source methods, instruments, and software, on the other hand, appear like obstacles for scientists to hide behind. How can anyone in the public, even just in principle, know the data wasn't tainted by the analysis software if what the software does (ie, its underlying program) is closed off from public view? [1]

[1] This is, by the way, not in itself a rationale to stop using commercial software on a general basis. This is not about open source versus commercial closed source software. This is only about the software used by scientists as part of the scientific process.

No comments: