Data Management for Information Access


This half-day session focuses on data management in support of information delivery. The first session highlights Earth Science data management and information delivery technologies, models and strategies. Presentations will focus on the specific models and/or strategies implemented for data management and facilitating access to information based on those data as developed by the member groups.

Part I - Presentations


The development of  data management and services architecture in support of a geospatial clearinghouse and research data portal. Karl Benedict, Mike Camponovo, Soren Scott, Su Zhang - University of New Mexico


Since the initial release in 2001 of the New Mexico Resource Geographic System (NM RGIS, interactive data portal, the Earth Data Analysis Center has continued to evolve the capabilities of the systems underlying RGIS and other applications in respose to a broadening set of requirements from multiple user communities. The recently released version of EDAC's data management and services platform (Gstore V3), related metadata development and enhancements, and near-term planned expansion of the system all build upon a longstanding emphasis on enabling flexible data discovery, access and information delivery for the diverse data supported by the system. This presentation will define the driving access requirements, describe the tiered architectural model and underlying data management approach, and review the metadata development and improvement strategy developed in support of long term data and information disocvery, access, and use - all in the context of supporting diverse end user applications and use cases. 


GeoSearch: A cloud based lightweight brokering middleware for geospatial resources discovery. Phil Yang - George Mason University


Efficient and accurate geospatial resource discovery is a big challenge for earth science research and applications because of the large volume, heterogeneity, complexity, and decentralization of geospatial resources. To address these issues, we developed a lightweight brokering middleware GeoSearch for efficient geospatial resource discovery. GeoSearch is based on Microsoft Azure Cloud platform, GEOSS clearinghouse, and also leverage other existing Geospatial Cyberinfrastructure (GCI) components to reduce integration costs. Specifically, (1) the framework provides integration capability and flexibility by adopting the brokering approach, implementing a ‘plug-in’-based framework for metadata processing and proposing a dynamically configurable search workflow; (2) the asynchronous messaging and batch processing-based metadata record retrieval mode enhances the search performance and user interactivity; (3) an embedded semantic support system improves the discovery recall level and precision by providing semantic-based search rule creation and result similarity evaluation functions and (4) the engine assists user decision-making by integrating a service quality monitoring and evaluation system, data/service visualization tools, multiple views and additional information. Experiments and a search example show that the proposed engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.


Common Information Management Principles Among Earth Science and Defense/Intelligence Communities. Stefan Falke - Northrop Grumman Information Systems


Many of the challenges faced in earth science data management are the same challenges faced by the defense and intelligence communities. How to get the right information to the right people at the right time and in the right context is an objective these communities have in common, as both a general vision and in specific aspects of implementation. This objective is pursued by tackling issues in sharing and using information, interoperating across systems, and applying the latest best practices and technologies. This presentation provides an overview of information architectures and infrastructure approaches, strategies, and trends from the defense/intelligence community perspective and relates them to the earth science community perspective.


The Architecture of IOOS: Lessons learned from attempting to implement a data management framework for the ocean observing community. Derrick Snowden - US IOOS Program Office, NOAA


The Integrated Ocean Observing System community recently celebrated a milestone.  Ten years after the organizational beginnings of IOOS, the community gathered to celebrate the accomplishments of the last ten years and lay the groundwork for the next ten years.  A robust data system has always been central to the IOOS mission and is seen as one of the key elements of a strategy that brings activities spread across 17 federal agencies, eleven regional association and an unknown number of commercial entities and local and tribal governments together into a single functioning system of systems.  This presentation will review the current state of the Data Management and Communications (DMAC) subsystem of IOOS, and make some statements about the lessons learned in trying to standardize and harmonize across a diverse community.  Some of these lessons relate to information systems technologies, but the most important ones are focused on the organizational issues inherent in a collaborative system of systems.  The initial capabilities of DMAC exist and have proven successful, but much work remains to solidify the infrastructure and make it an integral component of all relevant steps in the data stewardship and information lifecycle in which the ocean observing community participates.

Part II - Facilitated Discussion and Development of Data Management for Information Access working group

The final session consists of a roundtable discussion focusing on the challenges faced by the models presented earlier, areas of improvement and future work to streamline data management to accelerate and enhance data usability in research, education and decision-making. In addition, the roundtable will include the establishing a Data Management for Information Access working group with the initial goal of developing a white paper for presentation at the Summer 2013 meeting. The white paper will document the experiences and recommendations for data management for information access that other organizations can use a starting point for their own data management and information access implementations.




Notes on session 1:

UNM 20 years of providing geospatial data clearinghouse services


 RGIS 1.0

FGDC WebMap CAP award: to start experimenting

Public health

PHAiRS: end- to- end services architecture (Using full OGC suite and SOAP services)

SOAP was not strategy of choice

EPHT- interacting with external providers/ using rest-based system


Current System 220,000 records

3- Tiered system

building additional services interfaces on top of the foundation platform


Metadata – Data  provider interaction

1st Step in the lifecycle of NM EPSCOR


Process: PI contact Information

Request contact Information

Attempt to contact and provide initial documentation



E-mails + Phone

In-house tracking table

Which day contacted


Show excel table

JIRA_ detailed researcher status can be accessed by all metadata team members


Candidate Technical Solutions:



Platform independent


Jargon free


Integrate with current state website



Simple Excel forms

ESIP Generic Metadata Editor

(use of a platform called ISLANDORA- very helpful)

Islandora on top of Drupal


Comprehensive Excel workbooks


Generation and Processing:


Have to generate FGDC forms

Modify on the way out to provide other elements needed




(graphic of process)


Technical solution:


Search request, metadata request JSON Response, Interface Integration, Metadata harvesting,


Data Access

Metadata API/Data API/Services API



Personal Hype-cycle

Visibility and Time:


slowly climbing slope of enlightenment


If you need to share data


less effort

more potential users

streamlined ops for integration


If you need someone else’s data


CS-W- geoportal


The IOOS Architecture

Derrick Snowden

(ICOOS Act 2009)


interagency program- seated in NOAA – development of a data management architecture: federal and nonfederal landscape

7 goals 1 system

Observations/Data Management/ Modeling and Analysis


Weather and climate

Maritime operations

Natural hazards

Homeland security

Public health risks

Diagram timeline program to aggregate disparate subsystem


Moored Buoys, time series, time series profile

High frequency Radar, Radials, grids

Profiling gliders, trajectory and trajectory profile

(not just mapping)

MARACOOS in position during Storm Sandy to monitor

System of Systems:



Operational Separation



Info is primary artifact

Advances are incremental and interative

Interoperability is paramount


Defining cooperative integration (JPL)

Looking at costs


Push all the data through the National Buoy center (as role of regional association DAC)

(working towards web services to the rest of the world)


3 major components of this data service

data as: consumer/ service registry/ service- overlay this diagram with the tools and standards used

also experimenting with in situ data

using different tools for this type of data


What levels of interoperability must we achieve?


Using example of music mp3 collection overtime:

Missing metadata= duplicates, gaps in information


Most of the time analogy?

Interoperability: between who? And when are we done?

Good list of the clients we serve


Example of NCTOOLBOX for Matlab: showing how far we have come

(developing compelling use cases for why)

List of tools that IOOS supports



SOS Parser


Environmental Data Connectoe


NetCDF Java Library Unstructured

(does not fully any one of there but leveraging)


statement that beyond interoperability the importance of stewardship

Ted does not agree wholly- need both




Computer enhanced searching interface for geospatial research discovery

Same User experience to global users: interactive, fast, responsive,

Share with 140+ countries


Showing Geoss Clearinghouse and architecture

Both remote and local search to search dispatcher

Vocabularies and semantics: to kimprove accuracy

Example using  search query water (CiSC gmu Data Discovery)

Categorization of search results:

·      Based on performance

·      Based on who is providing

·      Based on relevance

Data Exploration

Time series animation


Next Step: concurrent intensive

Location of cloud (using Azure)

Spatiotemporal distribution

Several detailed papers available for info.


Questions: share software details?

Yes- available


What users have you seen – or are you targeting a specific set of users.

Most for the public with high graphics


Harvesting data? Several different locations for populating


----------Second Half --------- Notes on Session 2-----------


Stefan Falke

Common Information Vision Systems


NGA (National Geospatial Intelligence Agency)


Paradigm shifts

1995 NRC recommendation for NASA Eosdis


fuel for ESIP


Leticia Long @ GEOINT symposium 2010: putting power into the user’s hands

What can NGA due to empower the user?


Vision Integration 2.0

Strategic initiatives

Online GEOINT services

Open information technology environment


NSG Community Model (a community oriented framework)


Improving user access

Self/Assisted/Full Service

Model shows expansion of user/data relationship


Cloud-based Infrastructures

“cross-pollination of analysis”


Unique Identifiers: Structured & Unstructured Data

 (focus on provenance)

All data is stamped with entity ID



Entity based identifiers Parent to child relationship is not lost


Webservices Choreography

NGA- geospatial intelligence working group

ITSA Focus Group Standardization Activities


Collaborative Analytics

Spending less time finding and getting ready and more time working with and synthesizing


Statement- Erin Robinson @ AGU retweeting tweets (collaborative analytics)

Identifying events faster


Questioning whether scientists are interested in using data from social media (not citable/ peer-reviewed)

Semantic aspects of social media data, but can be used

Flu outbreaks- Google analytics

Multi-source rules

“source vetting”


question about whether anyone is using UNCERT


Spending last hour in discussion and break out groups

Capture dimensions in 3 categories for white paper


1.Data Management (alternative models)

 Matrix- talk about the specific data needs

How can we characterize the strengths and weaknesses as they relate to these external information needs

This is what we need to manage… these are the alternative models



Dimensions of Discovery

Dimensions of use

(understanding too)

end up with multiple matrices

identify strengths and weaknesses in each


3.Data to information

Translate objects into actionable information and knowledge


Data Requirements

Information Requirements

(How do they relate to each other?)

less of a matrix, developing into scenerios


What do we mean by data management?

Who is the audience for the document?

 Core question: What is the difference between data and information.


Self-organizing into groups---10 minutes—15 minutes to talk and wrap up and move forward…


Discussion continues surrounding Data Management models


Lifecycle management systems


Moving on to Core Questions

Question about intent with the white paper:

Targeted at more technically oriented looking at implementing – targeted at system implementers

Emphasis on defining the overall framework

Envision focusing on a more technical audience

Thinking about mining the wiki for past thoughts and ideas.

White Paper can be published on the commons as a stable cite-able resource


Wiki content may not provide the structure that a white paper can.

Versions of the white paper can provide update on the status of the work being done

"Too often we don't start with the user perspective and start with the tool perspective"

Data Management "planning"

audience for the white paper may be someone who is looking for language or guidance as they plan or build a template

continuously changing suite of users

Next Steps:

creating a cluster to be working on this 

connection to the IT&I committee

contribute further to translate into a white paper that could go into the commons

4-5 months- looking towards summer meeting in mid July 2013

keeping in contact through email









Benedict, K.; Scott, S.; Data Management for Information Access; Winter Meeting 2013. ESIP Commons , October 2012