Here we present an approach to help scientists collaborate in multi-disciplinary research, providing a wide spectrum of software tools for data science and enabling the reproducibility of their research outputs. The main tool is based on the extensive use of a web application, the IPython Notebook, that gives the scientists the ability to work on very diverse and heterogeneous data and information sources, providing an effective way to share the source code used to generate data products and associated metadata as well as save and track the workflow provenance. A key feature in IPython (Interactive Python) is that metadata, embedded in the Notebook, can be generated during the access and processing of data. We are presently developing functionalities to collect the provenance generated at each run of the workflow and store this metadata in the JSON-LD (JSON for Linking Data) standard format. In this way it is possible to record the provenance for derived data products, to trace back to their original sources and the processing conducted to generate them.
Development of cyberinfrastructure to facilitate collaboration and knowledge sharing for marine Integrated Ecosystem Assessments
Creative Common License:
Creative Commons Attribution 3.0 License