Ontology engineering for provenance enablement in the third National Climate Assessment


Every four years, the U.S. Global Change Research Program (USGCRP) [1] produces a National Climate Assessment (NCA) report that presents the findings of global climate change and the impacts of climate change on the United States. The topic of global change builds on a huge collection of scientific research, which also generates provenance information about entities, activities, and people involved in producing datasets, methods and findings. Capturing and presenting global change provenance, linking to the research papers, datasets, models, analyses, observations and satellites, etc. that support the key research findings in this domain can increase understanding, credibility and trust of the assessment process and the resulting report, and aid in reproducibility of results and conclusions.


The USGCRP is now producing the third NCA report (NCA3) and is developing a Global Change Information System (GCIS) that will present the content of that report and its provenance, including the scientific support for the findings of the assessment. As the GCIS will be built on the Internet, it provides a platform for representing the provenance information and implementing the results with semantic web technologies.


We are using a use case-driven iterative development methodology [2] that will present this information both through a human accessible web site as well as a machine readable interface for automated mining of the provenance graph. A use case describes an objective that a primary actor wants to accomplish and the sequence of interactions between the primary actor and a system such that the primary actor's objective is successfully achieved. A use case sets up a context in which domain scientists and computer scientists can work together on a computer system. Key steps in the iterative methodology include drafting use case, making a team, developing ontologies for the use case, reviewing and iteration of ontologies, adopting technical infrastructure and rapid prototype, evaluation and iteration to all the works, and preparation for the next use case. Focusing on the technical part, we use the developing World Wide Web Consortium (W3C) PROV data model and ontology [3] for representing the provenance information in the GCIS.


The ongoing research concentrates on the provenance for the NCA3 report. Following the iterative development methodology, we have worked on a number of use cases to refine an ontology for describing entities, activities, agents and their inter-relationships in the NCA3 report. We also mapped those entities and relationships into the PROV-O ontology to realize the formal presentation of provenance. Several prototype systems have been developed to provide users the functionalities to browse and search provenance information with topics of interest. In the future, the GCIS will collect and link records of publications, datasets, instruments, organizations, methods, people, etc. eventually covering provenance information for the entire scope of global change.

[1] http://www.globalchange.gov
[2] http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
[3] http://www.w3.org/TR/2012/WD-prov-overview-20121211
Creative Common License: 
Creative Commons Attribution 3.0 License