Lessons learned in deploying a cloud-based knowledge platform for the ESIP Federation



Ontologies and semantic technologies are an essential infrastructure component of systems supporting knowledge integration in the Earth Sciences.  Numerous earth science ontologies exist but are hard to discover because they tend to be hosted with the projects that develop them.   There are often few quality measures associated with these ontologies, such as creation date, versioning, purpose, number of classes and properties.  Projects often develop ontologies for their own needs without considering existing ontology entities and derivation from formal and basic ontologies.  The outcome is a large number of orthogonal ontologies, and ontologies that are not modular enough to re-use in part or adapt for new purposes, in spite of existing, popular standards for ontology representation.  Additional obstacles to sharing and re-use include the lack of maintenance once a project is completed.  These obstacles prevent the full exploitation of semantic technologies in a context where they could become needed enablers for service discovery and for matching data with services.  To start addressing this gap, we have deployed BioPortal, a mature, domain-independent ontology and semantic services system developed by the National Center for Biomedical Ontologies [1], on the ESIP Testbed under the governance of the ESIP Semantic Web cluster.   ESIP provides a forum for a broad-based, distributed community of data and information technology practitioners and stakeholders to collaborate on coordinated efforts and develop new ideas for interoperability solutions.  The Testbed provides an environment where innovations and best practices and can be explored and evaluated.  One objective of this deployment was to provide a community platform that would harness the organizational and cyber infrastructure provided by ESIP at minimal costs.   Another objective was to host ontology services on a scalable, public cloud and investigate the business case for crowd sourcing of ontology maintenance. 

We deployed the system on Amazon’s Elastic Compute Cloud (EC2) where ESIP maintains an account.  Our approach had three phases:  1) set up a private cloud environment at the University of South Carolina to familiarize the developer with the complex architecture of the system and enable some basic customization, 2) coordinate the production of a Virtual Appliance for the system with NCBO and deploy it on the Amazon cloud, and 3) outreach to the ESIP community to solicit participation, populate the repository, and develop new use cases.  Phase 2 is nearing completion and Phase 3 is underway.  Ontologies were gathered during updates to the ESIP cluster.  Discussion points included the criteria for a shareable ontology and how to determine the best size for an ontology to be re-usable.  Outreach highlighted that the system can start addressing an integration of discovery frameworks via linking data and services in a pull model (data and service casting), a key issue of the Discovery cluster (2).  This work thus presents several contributions: 1) technology injection from another domain into the earth sciences, 2) the deployment of a mature knowledge platform on the EC2 cloud, 3) the successful engagement of the community through the ESIP clusters and testbed model.

(1)   Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey MA, Chute CG, Musen MA. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul 1;37


Attachments for download: 
Creative Common License: 
Creative Commons Attribution 3.0 License