Linking Open Research Data for Earth and Space Science Informatics


Earth and Space Science Informatics (ESSI) is inherently multi-disciplinary, requiring close collaborations between scientists and information technologists. Identifying potential collaborations can be difficult, especially with the rapidly changing landscape of technologies and informatics projects. The ability to discover the technical competencies of other researchers in the community can help in the discovery of research partnerships. In addition to collaboration discovery, this data can be used to analyze trends in the field, which will help project managers identify emerging, irrelevant, and well-established technologies and specifications. This information will help keep projects focused on the technologies and standards that are actually being used, making them more useful to the ESSI community. We present a two-part solution to this problem: a pipeline for generating structured data from ESSI abstracts and an API and Web application for accessing the generated data. We use a Natural Language Processing (NLP) technique, Named Entity Disambiguation, to extract information about researchers, their affiliations, and technologies they have applied in their research. The extracted data is encoded in the Resource Description Framework using Linked Data vocabularies, including the Semantic Web for Research Communities ontology and the Friend-of-a-Friend ontology. The data is exposed in four ways: a SPARQL query-able endpoint, linked data, Java APIs, and a Web application. We also capture the provenance of the data transformations using the Proof Markup Language, including confidence scores from the NLP algorithms used. Our implementation has used only open source solutions, including DBPedia Spotlight and OpenNLP. We plan to set up an open source project for this work so that it can continue to evolve through community contributions. Submitted by: Eric Rozell, Rensselaer Polytechnic Institute, [email protected]

Attachments for download: