Data Management for Information Access

Abstract/Agenda: 

This half-day session focuses on data management in support of information delivery. The first session highlights Earth Science data management and information delivery technologies, models and strategies. Presentations will focus on the specific models and/or strategies implemented for data management and facilitating access to information based on those data as developed by the member groups.

Part I - Presentations

 

The development of  data management and services architecture in support of a geospatial clearinghouse and research data portal. Karl Benedict, Mike Camponovo, Soren Scott, Su Zhang - University of New Mexico

 

Since the initial release in 2001 of the New Mexico Resource Geographic System (NM RGIS, http://rgis.unm.edu) interactive data portal, the Earth Data Analysis Center has continued to evolve the capabilities of the systems underlying RGIS and other applications in respose to a broadening set of requirements from multiple user communities. The recently released version of EDAC's data management and services platform (Gstore V3), related metadata development and enhancements, and near-term planned expansion of the system all build upon a longstanding emphasis on enabling flexible data discovery, access and information delivery for the diverse data supported by the system. This presentation will define the driving access requirements, describe the tiered architectural model and underlying data management approach, and review the metadata development and improvement strategy developed in support of long term data and information disocvery, access, and use - all in the context of supporting diverse end user applications and use cases. 

 

GeoSearch: A cloud based lightweight brokering middleware for geospatial resources discovery. Phil Yang - George Mason University

 

Efficient and accurate geospatial resource discovery is a big challenge for earth science research and applications because of the large volume, heterogeneity, complexity, and decentralization of geospatial resources. To address these issues, we developed a lightweight brokering middleware GeoSearch for efficient geospatial resource discovery. GeoSearch is based on Microsoft Azure Cloud platform, GEOSS clearinghouse, and also leverage other existing Geospatial Cyberinfrastructure (GCI) components to reduce integration costs. Specifically, (1) the framework provides integration capability and flexibility by adopting the brokering approach, implementing a ‘plug-in’-based framework for metadata processing and proposing a dynamically configurable search workflow; (2) the asynchronous messaging and batch processing-based metadata record retrieval mode enhances the search performance and user interactivity; (3) an embedded semantic support system improves the discovery recall level and precision by providing semantic-based search rule creation and result similarity evaluation functions and (4) the engine assists user decision-making by integrating a service quality monitoring and evaluation system, data/service visualization tools, multiple views and additional information. Experiments and a search example show that the proposed engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.

 

Common Information Management Principles Among Earth Science and Defense/Intelligence Communities. Stefan Falke - Northrop Grumman Information Systems

 

Many of the challenges faced in earth science data management are the same challenges faced by the defense and intelligence communities. How to get the right information to the right people at the right time and in the right context is an objective these communities have in common, as both a general vision and in specific aspects of implementation. This objective is pursued by tackling issues in sharing and using information, interoperating across systems, and applying the latest best practices and technologies. This presentation provides an overview of information architectures and infrastructure approaches, strategies, and trends from the defense/intelligence community perspective and relates them to the earth science community perspective.

 

The Architecture of IOOS: Lessons learned from attempting to implement a data management framework for the ocean observing community. Derrick Snowden - US IOOS Program Office, NOAA

 

The Integrated Ocean Observing System community recently celebrated a milestone.  Ten years after the organizational beginnings of IOOS, the community gathered to celebrate the accomplishments of the last ten years and lay the groundwork for the next ten years.  A robust data system has always been central to the IOOS mission and is seen as one of the key elements of a strategy that brings activities spread across 17 federal agencies, eleven regional association and an unknown number of commercial entities and local and tribal governments together into a single functioning system of systems.  This presentation will review the current state of the Data Management and Communications (DMAC) subsystem of IOOS, and make some statements about the lessons learned in trying to standardize and harmonize across a diverse community.  Some of these lessons relate to information systems technologies, but the most important ones are focused on the organizational issues inherent in a collaborative system of systems.  The initial capabilities of DMAC exist and have proven successful, but much work remains to solidify the infrastructure and make it an integral component of all relevant steps in the data stewardship and information lifecycle in which the ocean observing community participates.

Part II - Facilitated Discussion and Development of Data Management for Information Access working group

The final session consists of a roundtable discussion focusing on the challenges faced by the models presented earlier, areas of improvement and future work to streamline data management to accelerate and enhance data usability in research, education and decision-making. In addition, the roundtable will include the establishing a Data Management for Information Access working group with the initial goal of developing a white paper for presentation at the Summer 2013 meeting. The white paper will document the experiences and recommendations for data management for information access that other organizations can use a starting point for their own data management and information access implementations.

 

Notes: 

GOOGLE DOC - https://docs.google.com/document/d/1laoMX9l7lXeqVKXyUqZVtvk6aML4ZYreJoT7BPxNVSs/edit

Notes on session 1:

UNM 20 years of providing geospatial data clearinghouse services

Context:

 RGIS 1.0

FGDC WebMap CAP award: to start experimenting

Public health

PHAiRS: end- to- end services architecture (Using full OGC suite and SOAP services)

SOAP was not strategy of choice

EPHT- interacting with external providers/ using rest-based system

 

Current System 220,000 records

3- Tiered system

building additional services interfaces on top of the foundation platform

 

Metadata – Data  provider interaction

1st Step in the lifecycle of NM EPSCOR

 

Process: PI contact Information

Request contact Information

Attempt to contact and provide initial documentation

 

Tools:

E-mails + Phone

In-house tracking table

Which day contacted

 

Show excel table

JIRA_ detailed researcher status can be accessed by all metadata team members

http://www.atlassian.com/software/jira/overview?_mid=6622558b0394cb90307ba36aaead440d&gclid=CMr0_Mb427QCFYKK4Aod5EMAPQ

 

Candidate Technical Solutions:

 

Goals:

Platform independent

Dynamic

Jargon free

Validation

Integrate with current state website

 

Solutions

Simple Excel forms

ESIP Generic Metadata Editor

(use of a platform called ISLANDORA- very helpful)

Islandora on top of Drupal

http://islandora.ca/

 

Comprehensive Excel workbooks

 

Generation and Processing:

Generate

Have to generate FGDC forms

Modify on the way out to provide other elements needed

 

Processing;

Sluicebox

(graphic of process)

 

Technical solution:

GSTore API

Search request, metadata request JSON Response, Interface Integration, Metadata harvesting,

 

Data Access

Metadata API/Data API/Services API

 

Conclusions

Personal Hype-cycle

Visibility and Time:

 

slowly climbing slope of enlightenment

 

If you need to share data

Standards=

less effort

more potential users

streamlined ops for integration

 

If you need someone else’s data

 

CS-W- geoportal

http://webhelp.esri.com/geoportal_extension/9.3.1/index.htm#geoportal_csw_cmpnts.htm

 

The IOOS Architecture

Derrick Snowden

(ICOOS Act 2009)

http://www.ioos.gov/

 

interagency program- seated in NOAA – development of a data management architecture: federal and nonfederal landscape

7 goals 1 system

Observations/Data Management/ Modeling and Analysis

 

Weather and climate

Maritime operations

Natural hazards

Homeland security

Public health risks

Diagram timeline program to aggregate disparate subsystem

“OO”

Moored Buoys, time series, time series profile

High frequency Radar, Radials, grids

Profiling gliders, trajectory and trajectory profile

(not just mapping)

MARACOOS in position during Storm Sandy to monitor

System of Systems:

Geographical

Managerial

Operational Separation

 

Implications:

Info is primary artifact

Advances are incremental and interative

Interoperability is paramount

 

Defining cooperative integration (JPL)

Looking at costs

 

Push all the data through the National Buoy center (as role of regional association DAC)

(working towards web services to the rest of the world)

 

3 major components of this data service

data as: consumer/ service registry/ service- overlay this diagram with the tools and standards used

also experimenting with in situ data

using different tools for this type of data

 

What levels of interoperability must we achieve?

 

Using example of music mp3 collection overtime:

Missing metadata= duplicates, gaps in information

 

Most of the time analogy?

Interoperability: between who? And when are we done?

Good list of the clients we serve

 

Example of NCTOOLBOX for Matlab: showing how far we have come

(developing compelling use cases for why)

List of tools that IOOS supports

Nctoolbox

52NorthSOS

SOS Parser

ncISO

Environmental Data Connectoe

Pyoos

NetCDF Java Library Unstructured

(does not fully any one of there but leveraging)

 

statement that beyond interoperability the importance of stewardship

Ted does not agree wholly- need both

 

Phil

Geosearch

Computer enhanced searching interface for geospatial research discovery

Same User experience to global users: interactive, fast, responsive,

Share with 140+ countries

 

Showing Geoss Clearinghouse and architecture

Both remote and local search to search dispatcher

Vocabularies and semantics: to kimprove accuracy

Example using  search query water (CiSC gmu Data Discovery)

Categorization of search results:

·      Based on performance

·      Based on who is providing

·      Based on relevance

Data Exploration

Time series animation

 

Next Step: concurrent intensive

Location of cloud (using Azure)

Spatiotemporal distribution

Several detailed papers available for info.

 

Questions: share software details?

Yes- available

WMS, WFS, WCS no DAP or SES

What users have you seen – or are you targeting a specific set of users.

Most for the public with high graphics

 

Harvesting data? Several different locations for populating

 

----------Second Half --------- Notes on Session 2-----------

 

Stefan Falke

Common Information Vision Systems

 

NGA (National Geospatial Intelligence Agency)

 

Paradigm shifts

1995 NRC recommendation for NASA Eosdis

 

fuel for ESIP

 

Leticia Long @ GEOINT symposium 2010: putting power into the user’s hands

What can NGA due to empower the user?

 

Vision Integration 2.0

Strategic initiatives

Online GEOINT services

Open information technology environment

 

NSG Community Model (a community oriented framework)

 

Improving user access

Self/Assisted/Full Service

Model shows expansion of user/data relationship

 

Cloud-based Infrastructures

“cross-pollination of analysis”

 

Unique Identifiers: Structured & Unstructured Data

 (focus on provenance)

All data is stamped with entity ID

GUIDs

UUIDs

Entity based identifiers Parent to child relationship is not lost

 

Webservices Choreography

NGA- geospatial intelligence working group

ITSA Focus Group Standardization Activities

 

Collaborative Analytics

Spending less time finding and getting ready and more time working with and synthesizing

 

Statement- Erin Robinson @ AGU retweeting tweets (collaborative analytics)

Identifying events faster

 

Questioning whether scientists are interested in using data from social media (not citable/ peer-reviewed)

Semantic aspects of social media data, but can be used

Flu outbreaks- Google analytics

Multi-source rules

“source vetting”

uncert http://www.uncert.com/

question about whether anyone is using UNCERT

 

Spending last hour in discussion and break out groups

Capture dimensions in 3 categories for white paper

 

1.Data Management (alternative models)

 Matrix- talk about the specific data needs

How can we characterize the strengths and weaknesses as they relate to these external information needs

This is what we need to manage… these are the alternative models

 

2.Interoperability

Dimensions of Discovery

Dimensions of use

(understanding too)

end up with multiple matrices

identify strengths and weaknesses in each

 

3.Data to information

Translate objects into actionable information and knowledge

 

Data Requirements

Information Requirements

(How do they relate to each other?)

less of a matrix, developing into scenerios

 

What do we mean by data management?

Who is the audience for the document?

 Core question: What is the difference between data and information.

 

Self-organizing into groups---10 minutes—15 minutes to talk and wrap up and move forward…

 

Discussion continues surrounding Data Management models

 

Lifecycle management systems

 

Moving on to Core Questions

Question about intent with the white paper:

Targeted at more technically oriented looking at implementing – targeted at system implementers

Emphasis on defining the overall framework

Envision focusing on a more technical audience

Thinking about mining the wiki for past thoughts and ideas.

White Paper can be published on the commons as a stable cite-able resource

 

Wiki content may not provide the structure that a white paper can.

Versions of the white paper can provide update on the status of the work being done

"Too often we don't start with the user perspective and start with the tool perspective"

Data Management "planning"

audience for the white paper may be someone who is looking for language or guidance as they plan or build a template

continuously changing suite of users

Next Steps:

creating a cluster to be working on this 

connection to the IT&I committee

contribute further to translate into a white paper that could go into the commons

4-5 months- looking towards summer meeting in mid July 2013

keeping in contact through email

END.

 

 

 

 

 

 

 

Citation:
Benedict, K.; Scott, S.; Data Management for Information Access; Winter Meeting 2013. ESIP Commons , October 2012