Global Change Information System (GCIS)
The U.S. Global Change Research Program (http://globalchange.gov) is sponsoring the creation of a new information system, the Global Change Information System (GCIS) that provides a web based source of authoritative, accessible, usable, and timely information about climate and global change for use by scientists, decision makers, and the public. It captures and presents supporting information from the Third National Climate Assessment. A public version of the GCIS API is available at: http://data.globalchange.gov
This session will present an overview of the GCIS system, status and progress. There will be time for discussion and feedback about the long term vision for the system.
Reference: X. Ma, P. Fox, C. Tilmes, K. Jacobs, A. Waple, Capturing provenance of global change information, Nature Climate Change 4, 409–413 (Online 28 May 2014), doi:10.1038/nclimate2141, http://www.nature.com/nclimate/journal/v4/n6/full/nclimate2141.html?WT.ec_id=NCLIMATE-201406
Title: Global Change Information System (GCIS)
Presenters: Robert Wolfe ([email protected]) and Justin Goldstein ([email protected]) (Steve Aulenbach [email protected] is on the phone)
Speaker 1: Robert Wolfe of USGCRP (GCIS Technical Lead)
Presentation Title: Global Change Information System
Notes:
-
An introduction of GCIS will be presented.
-
5 discussion questions have also been identified.
-
US Global Change Research Program: coordinates, prioritizes, assesses, and communicates → has 14 agencies involved in the program.
-
Contributors: GCIS is a fairly new activity, but with number of people already contributing (mainly from USGCRP, NCA TSU, Habitat Seven, RPI TWC, Forum One, NASA JPL, and Rahul Ramachandran from NASA and Jeffrey Chen from Presidential Innovation Fellow).
-
Every 4 years, the National Climate Assessment is required to be published by Congress. The 2014 version is the third publication (http://assessment.globalchange.gov). (Previous 2 were published in 2000 and 2009, respectively)
-
Long Term Vision of GCIS is intended to eventually become a unified web resources about climate and global change for use by scientists, policy makers, and the public.
-
Information Quality Act (IQA) focuses on reproducibility and transparency. In other words, transparency is what drives reproducibility.
-
Transparency and reproducibility can be measured on a scale from traceable sources (easier) to traceable tools (harder).
-
Data and the National Climate Assessment: The challenge involves primarily the amount of resources involved during the production/generation of the report, including the number of authors, pages, chapters/appendices, figures, images, references, and data sources. → The solution is to define categories of information within the report as well as build a process for collecting source information that will satisfy IQA and Highly Influential Scientific Assessment (HISA) requirements.
-
The responsive design of globalchange.gov - v2.0 can be used on all types of major devices/platforms.
-
Te GCIS Structured Data Server is where the provenance information can be found.
-
globalchange.gov website exchanges information with the structured data server via an open API.
-
An example provided is the dataset metadata for a figure that discussed “observed-us-temperature-change”.
-
Another example is linking to an instruent that is related to a figure, which provides information regarding “past-and-projected-changes-in-global-sea-level-rise”.
-
(There are 5 steps in this example).
-
-
GCIS Structured Data Server: has the following main actions: capture (from a variety of sources), identify (identifiers for each element), organize (relationships between elements), present (machine accessible interfaces for metadata, including API), and maintain (to ensure quality and integrity - this is key because the information will be the basis for the next assessment).
-
Global Change Content Elements: the different “objects” (or "resouces") such as figures, images, datasets, used as components in the report; findings, such as “climate is changing,” and concepts, such as “adaptation.”
-
GCIS Database/API: sample key points - RESTful API is available at data.globalchange.gov; primary storage is RDBMS (PostgreSQL); representation is serialized.
-
3 different examples of GCIS Ontology are also presented.
-
Full GCIS Ontology documents are available at: http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology
-
-
SPARQL examples are also presented.
-
Two Parallel Paths: National Climate Assess release and GCIS population.
-
Data and GCIS: The future roadmap is that the information in GCIS will continue to be completed; Health Assessment and Ontology Improvements effort will also continue; in additional Earth Observation Assessment might also be discussed.
Comments/Questions from the Audience:
-
How is information extracted from the National Climate Assessment report?
-
GCIS did not extract information from the NCA3 report after it was completed (the final PDF and the information GCIS were released concurrently). Instead, GCIS played a important role in the NCA3 report production process. GCIS provided reliable consistent identifiers for the references (GCIS generated UUIDs were added to endnote), it helped to automate some of the validate process (invalid DOIs for instance) and it was a point of a reference for these artifacts in an otherwise distributed process.
-
Most of the effort was automatic through use of a TSU Resource Collection Tool, scripts to interact with the End-note database and scripts that extracted findings and other information from the draft PDF documentation.
-
Manual efforts were needed to obtain some additional information that was not part of the original document such as links to datasets and activities.
-
More semi-automatic processing and even tighter integration with the document production process are good goals to aim for.
-
Apache Tika is a tool that could be used for parsing. [Statement from the audience.]
-
-
Are there any changes that were learned along the way that might be applied to the current, ongoing effort, such as the indicators and ontology?
-
Tracing references was a complex issue; DOI were used to help with the effort. As a result, being organized was very helpful. Also the use of alias along with the full names made it easier to reference to the entities with different names.
-
GCIS Ontology 1.2 was the basis; however, it was realized that the ontology should be augmented to broaden the inclusion of additional climate change communities. As a result, the future ontology will include additional concepts beyond climate assessment.
-
dbpedia.org has been discussed.
-
-
Discussion Questions:
-
What should the GCIS criteria be for authoritative sources for details of datasets, models, etc? What are the authoritative sources that meet this criteria?
-
Suggestions:
-
Use the report, or the information that has already been used in the report.
-
Use the metadata that is associated with the objects.
-
Define what it means to be authoritative and then how to disambiguate the sources.
-
Defining the depth to disambiguate is important as well.
-
The most difficult issue would be traceability, i.e. clarifying the lineage.
-
DOI does not cover the full lineage; DOI is only part of the picture that facilitates identification.
-
The slow change in culture to require data publication and citation could help with this issue because the next generation of contributors/users might be more used to identifying and tracing their work.
-
-
-
-
What steps can we take with GCIS to inspire broader community adoption? This includes the wider use of USGCRP instance of GCIS and adoption of the GCIS ontology by agencies (and others including universities, states, NGOs and international partners).
-
Most people believe that GCIS could be adopted by other communities, including in concept as well, with modification.
-
Testimonials from scientists might help.
-
The GCIS and the Assessment report are already very good showcases.
-
Getting these tools into the classrooms are going to help with the younger generation.
-
-
-
How do we manage the wide variability across (and within) agencies in the granularity of “datasets”? Is there a way to automatically aggregate datasets to the appropriate level for incorporate in GCIS?
-
Different people have different definitions of “datasets;” as a result, the USGCRP would like feedback on what people have to suggest.
-
One feedback agrees that it is not always clear what “datasets” mean. Likewise, granularity is also hard to definite (ranges from 1 file to multi-files).
-
As a result, it might be better to define the datasets based on the uses. In other words, it would depend on what are the information that would be useful to the uses.
-
This issue also applies to DOI definition, which is a related issue. DOI could be considered as a locator; as a result, the assignment of DOI would depend on the level and the details that need to be located.
-
-
-
-
How do we map the current platform-instrument-sensor model to measurement networks (temperature, water quality, etc.)?
-
In ISO19115 terms, ISO19115-2 could be applied.
-
-
How should the ontology be extended to include the relationship between “tools” and datasets?
-
Semantic cluster is working on “tool match” that could help with this effort. There are currently 2 case studies that are being developed by the semantic cluster.
-
Identified key characteristics of tools and data that are related to each other.
-
-
This development could help with future scientific efforts.
-
GCIS is available on github at:
https://github.com/USGCRP/gcis
Please join the API mailing list, by accessing the page at:
https://groups.google.com/a/usgcrp.gov/forum/#!forum/gcis-api-users-group
Contact information for the presenters is available at the top of this page.