The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance, across geoscience domains. The inventory is compiled from metadata catalogs maintained by governmental and academic data facilities, as well as from user contributions. Once harvested into CINERGI, metadata records are processed according to harvest adapter definitions, loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19115/19139 metadata model and schema. Several types of metadata defects detected by the validation engine are either automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. All these components are organized into CINERGI metadata curation pipeline. The core component of the pipeline is a set of "metadata enhancers," which represent services responsible for correcting or enhancing metadata content: adding spatial extent information; adding keywords based on a collection of registered vocabularies and SciGraph annotation API, validating and correcting organization information, etc. The curated metadata records, along with provenance information, are re-published and accessed programmatically and via a CINERGI online application. The latter represents a custom search application built over Geoportal, SOLR and Neo4J.
The project’s website is http://workspace.earthcube.