Data management for the Marine Biodiversity Observation Networks
Biodiversity includes not only the species, genes, and ecosystems that make up the natural world but also the processes and functions carried out by those species (Noss 1990) and can be used as a measure of ecosystem health and stability (Duffy et al. 2013). Marine biodiversity observation networks serve as a way to integrate existing data related to marine biodiversity and expand upon that data to assess the status and trends of biodiversity in the oceans (Duffy et al. 2013). However, the breadth and variability in biodiversity means that biological data are messy, with data collection varying greatly from one project to another. Therefore, while a central tenet of biodiversity observation networks is coordinating and standardizing legacy data with new data collection (Scholes et al. 2008, Duffy et al. 2013), the reality of making that possible has been a barrier to advancement.
USGS through the Ocean Biogeographic Information System-USA are partnering with NOAA, NASA, and BOEM on the Marine Biodiversity Observation Network demonstration projects. OBIS-USA has been seen as one of the stepping stones in making these data publically accessible but suffers from the perception that it’s a “first order” data repository. In fact, OBIS-USA takes in a downstream derivative form of the data, standardized to Darwin Core which describes the presence, absence, and abundance of taxa observations.
Part of the MBON strategy for data management involves teaming up with the Integrated Ocean Observing System (IOOS) Regional Associations in Alaska and Florida. Those IOOS RAs work closely with a commercial company, Axiom Data Science, to build and manage data management technologies and are moving ahead with an online digital data management system. The other option being pursued in relation to the Channel Islands Marine Sanctuary part of the project is to team up with the Santa Barbara LTER at UCSB.
The questions we are asking come partly from our role as a Federal partner on these efforts that is also concerned with the Open Data and Public Access policies of the U.S. Government and partly from our role as a downstream derivative data producer that needs to have solid provenance and traceability back to source data.
- How can these data be best managed so that they are available for the long term?
- Is managing the data through a commercial partner's data management system sufficient or appropriate for long-term preservation and availability of federally funded data assets?
- What is the role of federal data centers like the National Centers for Environmental Information at NOAA or one or more NASA DAACs in these projects?
- What guidance should federal program managers for the grants/funding opportunities be providing to the project PIs at this stage of projects already in motion?
- What guidance should we be putting into future funding opportunities to ensure open, public access for federally funded data production?
- If federal data centers are to take on a role for these and similar data, are they fully equipped to do so? Do they require additional funding from the project or funding programs to successfully take on the additional workload?
- If federal grant-based projects are collaborative across agencies with multiple groups putting in funding, who gets responsibility for the data?
- What might we learn from the developing ideas of the NIH Commons1 idea? Could we construct such a framework across earth science and not just within one agency?
1. Part of the idea with the NIH Commons from a funding perspective is to have project proposals include specific data management and distribution line items in the budget that are issued as credits to specific data facilities in the Commons as opposed to funds transferred directly to the successful institution. This means that data management is funded and conducted in the way that the NIH wants it to be done but it is done at the project level where the PI and team are also invested in the outcome.
- The presenters for the session are Abby Benson from USGS and Gabrielle Canonico from US IOOS.
-
Networked Science: open science has its benefits, but it is challenging to bring the cultural change. We are in the transitional phase between prenetwork and networked science.
-
The Marine Biodiversity Observation Networks (MBON) will allow the collection of data, which will allow the combination of different types of marine data and enable the identification of any gaps in the knowledge base.
-
Some of the key areas that will be crucial in aiding and ensuring the success of MBON’s data collection include metadata.
-
-
The data management life cycle might be familiar to some, especially experienced data managers. However, the concept and the steps of data management could still be very new to many researchers/scientists.
- MBON is aimed to be a “demonstration” project to show how data management can be integrated and used for a project that will produce data for the long term.
-
Problem Statement: How do we provide guidance to multi-agency federally funded projects collecting biology data about adequate data management?
-
Input from the audience:
-
There are existing tools, such as DMPTool, that could be utilized to provide common platforms for the different agencies to collaborate together.
-
Data management Plan is a good starting point for researchers/scientists to think about what they could do to manage their data. It could potentially lead them to discover and consider the services that could be provided by archives/repositories, so that the data could be used for the long term by other communities.
-
Is the Data Management Plan the only way to provide “guidance”? Perhaps it is about providing options for the researchers/scientists to review? In other words, perhaps there might be aspects that need to be considered for choosing not to manage certain data? If this is the case, what are the consequences?
-
Regardless of what the decision is, taking time to consider the strategies for data management is a worthwhile activity.
-
It would be important to consider the strategies before the research starts and review/modify the strategies as needed. In this case, it would be easier for the team to build in time for the agreed activities instead of trying to fit in additional data management tasks. If the data management tasks were not considered in advance, the added tasks could be viewed by the research/science teams as extra burdens.
-
-
The “guidance” could also be in the form of a list of basic “principles” that could assist each agency to develop their specific actions within their own environment. However, in this case, it would be helpful to the agencies to still share their local practices with each other, so that the experience/lessons-learned could be shared.
-
A related issue to the development of the “guidance”: how do we come up with an agreed language that should be used by the funding agencies to require data management?
-
Also, when asking people to “work together on data management”, what are some of the activities that we are actually expecting people to work on and how?
- This is important to consider because different science disciplines, research groups, project types, etc. would have different practices. As a result, how do we allow people to work on data management without “disturbing” the way that the research is being performed?
-
-
-
Other related questions:
-
A cost model or the economic sustainability is crucial to consider in ensuring the needed resources are available for long term data management.
-
Interoperability is important in terms of using consistent language.
-
The sustainability of data management is not restricted to just marine biodiversity data. It is also applicable to other data types from other science disciplines.
-
Having enough resource/funding to do everything that is needed for a science project is a real concern. In other words, if there is not enough resource/funding, data management activities might be the first items to be placed on lower priority.
- The resolution for the question - “If federal grant-based projects are collaborative across agencies with multiple groups putting in funding, who gets responsibility for the data?” - could depend on many different factors beyond who provided the most funding. For example, whether there are sufficient resources to provide adequate stewardship or whether the data fits with the existing data collections could both potential factors in influencing who should be responsible for the data.
-