Linked data is a paradigm for publishing data on the Web by using, among other things, non-proprietary data formats and resolvable identifiers for things in your dataset. One linked data initiative, DBPedia, is widely used as a "crystallization point" for linked data on the Web [1]. It serves as a hub for data-level links from external datasets covering a broad variety of domains. Our project, ESSI-LOD, has converted more than 100,000 abstracts from the American Geophysical Union (AGU) into linked data using the Resource Description Framework, a graph-based data format. We have used this project to help visualize connections between members of the ESIP community, and within the broader Geosciences communities that attend AGU conferences. ESSI-LOD has uncovered a few key challenges when publishing linked data at scale, such as co-reference resolution (i.e., knowing when two authors or organizations are actually the same thing).
Beyond those challenges, we see ESSI-LOD as its own crystallization point for linked data in the geosciences. Like the Wikipedia data DBPedia is derived from, AGU publications have extremely broad coverage of topics in the geosciences. Some opportunities to build out this "ESSI-LOD cloud" include the ability to annotate abstracts, provide links to referenced tools or datasets, and to enable a crowd-sourcing approach to co-reference resolution. Terms from the SWEET ontology can be used to tag publications and datasets to improve information retrieval. Identifiers from Geonames [2] can be used to annotate datasets and publications so that people interested in a spatial domain can quickly discover them. There are numerous ways to crystalize linked data from ESSI-LOD. This poster will briefly highlight the work accomplished in ESSI-LOD and discuss the directions we hope to take it in the future.
This project was funded by a 2011 ESIP Funding Friday award.