Open Recipe: Why Open Source Methodology in Science Matters

If data is the ingredients, then methodology is the recipe. Previously in Credibility of Science and the Freedom of Data, I raised the issue that even so-called raw data is often interpreted data, just as the temperature reading of a classic mercury-in-glass thermometer is really an interpretation of the height of the column of mercury. Modern scientific instruments wrap a lot of interpretations of the raw sensor feed up before presenting anything to the user as "raw" data.  But why does this even matter?

A key component that makes science so effective for discovering useful properties of the world is that experiments can, and are, repeated to ensure discoveries are in fact consistent properties of the world rather than the result of mere chance or accident. Being able to repeat some procedure to get back the same or similar result is a very important and useful feature of science. It is like having a recipe that claims to produce a great souffle: we'd have to verify the claim by repeatedly baking souffles to see if we repeatedly get good tasting souffle. We could, however, only do this if the entire recipe is open for us to follow and examine.

Imagine if one step in the souffle recipe says to feed all ingredients into to some Ultra-Souffle-Maker-branded machine, which then gives back to you a great souffle: how do you know the machine didn't cheat by adding something other than your ingredients? The analogy is a bit strained, but actually the point is, given a set of data X and some result Y obtained from feeding X into some analysis machine, how can we be sure that we got Y validly from X, and not just as a result of the machine adding something extraneous into the data?

Here's an example of what this means. Imagine you're a scientist from SETI, looking at radio telescope data in a Search for Extra-Terrestrial Intelligence. You receive some data, feed it into some machine, and out comes the result: you've found an extra-terrestrial and intelligent signal! Or did you? How do you know the machine didn't just add into the data, by accident or malice, what you were looking for to begin with? To know, you'd have to examine the machine's insides and see what it's doing to your data.

That is why it's important for the methodology be open source, so that it is open to careful examination and testing. All the steps from the test tube to the data analysis results must be open for examination to ensure validity. More and more, however, modern instruments are becoming opaque, often controlled with computer and software technologies that are notoriously closed off and difficult to inspect. That is why it's so important for the methodology of science to remain open source .

No comments: