Arfon Smith from Github was up to talk to us about Academia and open source.
Arfon started with an example of a shared research proposal. So you create a document and then you edit the filename with each iteration because word processing applications is not good at tracking changes and allowing collaboration. Git though is meant for this very thing. So he showed us a book example on Github where the collaborators worked together on a document.
In open source there is this ubiquitous culture of reuse. Academia doesn’t do this – but why not? The problem is the publishing requirement in academia. The first problem is that ‘Novel’ results are preferred. You’re incentivized to publish new things to move ahead. The second problem is that the value of your citation is more powerful than the number of people you’ve worked with. And thirdly, and more generally, the format sucks. Even if it’s an electronic document it’s still hard to collaborate on it (see the document example above). This is state of the art technology … for the late 17th century. (Reinventing Discovery).
So, what do open source collaborations do well? There is a difference sometimes between open source and open source collaborations, this is an important distinction. Open source is the right to modify – it’s not the right to contribute back. An open source collaborations are highly collaborative development processes that allow anyone to contribute if they show an interest. This brings us back to the ubiquitous culture of reuse. These collaborations also expose the process by which they work together – unlike the current black box of research in academia.
How do we get 4000 people to work together then? Using git and Github specifically you can fork the code from an existing project and work on it without breaking other people’s work and then when you want to contribute it back you submit a pull request to the project. The beauty of this is ‘code first, permission later’ and every time this process happens the community learns.
The goal of a contribution of Github is to get it merged in to the product. Not all open source projects are receptive to these pull requests though, so those are not the collaborative types of projects.
Fernando Perez: “open source is .. reproducible by necessity.” If you don’t collaborate then these projects wouldn’t move forward – so they need to be collaborative. The difference in academia is that you have to work alone to and in a closed fashion to move ahead and get recognition.
Open can mean within your team or institution – it doesn’t have to be worldwide like in open source. But making your content electronic and available (which does not me a word doc or email) makes working together easier. Academia can learn from open source – more importantly academia must learn from open source to move forward.
All the above seems kind of negative, but Arfon did show us a lot of examples where people are sharing in academia – we just need to get this to be more widespread. Where might more significant change happen? The most obvious place to look is where communities form – like around a shared challenge – or around shared data. Science and big data are where we’re going to see this more hopefully.
There are challenges still though – so how do we make sharing the norm? The main problem is that academic reward ‘credit’ – so articles written by you solely. Tools like Astropy is hugely successful on github, but the authors had to write a paper about it to get credit. The other issue is trust – academics are reluctant to use other people’s stuff because we don’t know if their work is of value. In open source we have solved this problem already – if the package was downloading thousands of times it’s probably reliable. There are also tools like codeclimate that give your code a grade.
In short the barriers are cultural not technical!