Shedding Light on the "Dark Software" of Science


Although it is not uncommon to see scientific software published for others to use, this tends to happen with software that has significant investment associated with it. In geosciences, for example, modeling software is often published in shared repositories or in open source hosting services. However, there is a significant amount of software developed by scientists that is never published, and as a result this software cannot be reused by others and is eventually lost. This includes software for data transformations, quality control, and other data preparation software. We refer to this as “dark software”, by analogy with Heidorn’s “dark data”. This talk will argue that this software represents vary valuable scientific products, and will describe our work on lowering the barriers for sharing all forms of software developed by scientists.


Dark Software: All the software that sits on scientists desktops, that took a long time to create and isn't being shared
-Sometimes they don't know how to share it

Value of Software: Reproducibility.
-Science must be reproducible. Software is a big part of that.
-We want to break barriers between software sharing

"Scientists and engineers spend more than 60% of their time just preparing the data for model input of data-model comparison." (NASA A40)
-Some place that as high as 80%

"Because of the cost of reproducing somebody's work, we're not building on it as much."

Open source communities in the geosciences offer repositories for "Dark Software"

Open source licensing for software and code?

Software stewardship is a multidisciplinary challenge
-Recognizing and documenting the problem of dark software is the start of the solution

EarthCube GeoSoft group is looking at ways to bring a common framework for interoperability and best practices into the geosciences

-Standard and common formats
-Existing software repositories & modeling frameworks

-Disseminate best practices for software sharing
-Develop educational and professional training materials accessible to geoscientists
--Make software more reusable and portable
--Software licensing
--Capturing science relevant data
--Best practices for open source software sharing
--Software citation
--Getting credit for software developed

-Geoscience software is a valuable research product
--Must embed best practices of software sharing into research activities
-Improve productivity, quality, dissemination and training

