ScienceBase: a big ol' scientific database


ScienceBase is a U.S. Geological Survey (USGS) effort to reflect in a database what we know about the complex earth system. We admit that it is a challenging vision but we think it is a worthwhile pursuit. ScienceBase started off as the Scientific Data Catalog and later the Comprehensive Science Catalog. This first generation of the concept was "yet another metadata catalog," and cataloging of resources is still something that ScienceBase has at its core. The second generation, where we attached the moniker of ScienceBase, has added a data repository capability and started addressing the long tail of dark data in USGS. The third and fourth generations of ScienceBase will take us into the territory of data integration and proactive analytics, respectively. Those two future iterations are still in the notional stages for the most part, so where are we currently?

  • ScienceBase has a document-based, NoSQL database (MongoDB) and a data model based on the simple idea of an item (similar to OWL:Thing, RDF:Resource, etc.). Items have simple core information (what, where, when, how, why), any number of facets/extensions (map to standards, etc.), and are arranged in a contrived hierarchy to enable permissions inheritance and generate dynamic services (WMS, WFS, KML, etc.).
  • ScienceBase is completely API-centric from the ground up. All functionality is built around a RESTful API that drives user interface and data management software development.
  • The ScienceBase Repository is a digital repository service that allows any digital file to be attached to an item. Some types of files have special handlers to generate services.

What keeps us up at night?

  • ScienceBase is not a fully curated and managed repository and long-term archive, but we need to figure that out for the USGS. Storage is cheap; managing the data that goes into that storage for the long haul is expensive.
  • There are too many competing notions of what it takes to solve the discover-access- use problem, but we all want to enable fully open access to a digital government. We need to stop competing for scant resources and pull together to implement sustainable solutions to the problems that are already conceptually solved.

Name: Sky Bristol
Organization(s): U.S. Geological Survey