Attribution for Software and Source Code
The Earth Science community has made significant progress in developing expertise related to data citation. Data citation principles and guidelines, including those produced by ESIP [1], define recommended practices and identify key challenges. Data, however, are only one resource that scientists use in the course of their work. Many of the arguments for data citation also hold for other digital objects, such as software, models, etc.
This session will focus on the issue of attribution for software and source code. As a first step, we'll provide an adaptation of the ESIP data citation recommendations for software, with the goal of outlining where software as a research object aligns with, and differs from, data.
We also want to open a broader dialogue about how software and source code attribution might go beyond citations, and what role the ESIP community might play in designing or promoting these policies.
Recent initiatives to make Github source code repositories externally archivable and citable provide one example of how existing widely used software development tools can serve as building blocks for better attribution mechanisms [2,3,4].
[1] http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations...
[2] http://arfon.org/building-a-data-archiving-service-using-the-github-api-...
[3] http://www.isgtw.org/spotlight/tool-developed-cern-makes-software-citati...
[4] http://thenextweb.com/dd/2014/03/17/mozilla-science-lab-github-figshare-...
DK quoting Borne: “Software work is inadequately visible in ways that count within the reputation system underlying science”
Metrics : Sessions directly after us is about the counting
This session is about visibility …
First speaker Bob Downs - discussed importance of citation for software - along dimensions of visibility, reproducibility, credit, and acknowledgment.
Overview of what should be included in a software citation - guidelines from major reference standards (MLA, APA, Chicago) differ slightly.
Second speaker Don Middleton - overview of approach to attaching identifiers (DOIs) to software and computing facilities. Has been boon to tracking use (able to text mine for single identifier - use resolution metrics to understand how people are arriving at the resource)
Lightening round!
Ted Hart of R Open Sci project... in packages that they serve, they add bibTex file - but find they are cited , put in acknowledgments, and sometimes just mentioned.
Ruth Dueer - works with computational linguistics - ask for very specific citation information, find that community is engaged in doing formal ack because of central importance of algorithms in their work
Tom Narock - stressed importance of Ontology as a software object - need to be recognized in same way that algorithms are.... asks question of when to assign DOIs in versioning both software and ontology.
Pascal Hitzler - Top 3 cited documents are for tool descriptions... Semantic Web Journal conducts semi-open peer review (reviews are posted alongside pubs - but reviewers can opt in or out of being identified)
Discussion:
- DOAP ontology was recommended for resolving links between different types of resources: https://github.com/edumbill/doap/wiki
- JSON-LD is another approach to embedding information about software package, encouraging reuse and attribution http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit
- Similarly, Katz mentions idea of 'transitive credit' as way to avoid the "are we going to cite everything conversation" paper here: http://figshare.com/articles/Citation_and_Attribution_of_Digital_Product...
- Howison asks why not embed the citation in the license for the software - establishes a norm for citation - you're breaking the license if you use and do not acknowledge