Data Management for Information Access
This half-day session focuses on data management in support of information delivery. The first session highlights Earth Science data management and information delivery technologies, models and strategies. Presentations will focus on the specific models and/or strategies implemented for data management and facilitating access to information based on those data as developed by the member groups.
Part I - Presentations
The development of data management and services architecture in support of a geospatial clearinghouse and research data portal. Karl Benedict, Mike Camponovo, Soren Scott, Su Zhang - University of New Mexico
Since the initial release in 2001 of the New Mexico Resource Geographic System (NM RGIS, http://rgis.unm.edu) interactive data portal, the Earth Data Analysis Center has continued to evolve the capabilities of the systems underlying RGIS and other applications in respose to a broadening set of requirements from multiple user communities. The recently released version of EDAC's data management and services platform (Gstore V3), related metadata development and enhancements, and near-term planned expansion of the system all build upon a longstanding emphasis on enabling flexible data discovery, access and information delivery for the diverse data supported by the system. This presentation will define the driving access requirements, describe the tiered architectural model and underlying data management approach, and review the metadata development and improvement strategy developed in support of long term data and information disocvery, access, and use - all in the context of supporting diverse end user applications and use cases.
GeoSearch: A cloud based lightweight brokering middleware for geospatial resources discovery. Phil Yang - George Mason University
Efficient and accurate geospatial resource discovery is a big challenge for earth science research and applications because of the large volume, heterogeneity, complexity, and decentralization of geospatial resources. To address these issues, we developed a lightweight brokering middleware GeoSearch for efficient geospatial resource discovery. GeoSearch is based on Microsoft Azure Cloud platform, GEOSS clearinghouse, and also leverage other existing Geospatial Cyberinfrastructure (GCI) components to reduce integration costs. Specifically, (1) the framework provides integration capability and flexibility by adopting the brokering approach, implementing a ‘plug-in’-based framework for metadata processing and proposing a dynamically configurable search workflow; (2) the asynchronous messaging and batch processing-based metadata record retrieval mode enhances the search performance and user interactivity; (3) an embedded semantic support system improves the discovery recall level and precision by providing semantic-based search rule creation and result similarity evaluation functions and (4) the engine assists user decision-making by integrating a service quality monitoring and evaluation system, data/service visualization tools, multiple views and additional information. Experiments and a search example show that the proposed engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.
Common Information Management Principles Among Earth Science and Defense/Intelligence Communities. Stefan Falke - Northrop Grumman Information Systems
Many of the challenges faced in earth science data management are the same challenges faced by the defense and intelligence communities. How to get the right information to the right people at the right time and in the right context is an objective these communities have in common, as both a general vision and in specific aspects of implementation. This objective is pursued by tackling issues in sharing and using information, interoperating across systems, and applying the latest best practices and technologies. This presentation provides an overview of information architectures and infrastructure approaches, strategies, and trends from the defense/intelligence community perspective and relates them to the earth science community perspective.
The Architecture of IOOS: Lessons learned from attempting to implement a data management framework for the ocean observing community. Derrick Snowden - US IOOS Program Office, NOAA
The Integrated Ocean Observing System community recently celebrated a milestone. Ten years after the organizational beginnings of IOOS, the community gathered to celebrate the accomplishments of the last ten years and lay the groundwork for the next ten years. A robust data system has always been central to the IOOS mission and is seen as one of the key elements of a strategy that brings activities spread across 17 federal agencies, eleven regional association and an unknown number of commercial entities and local and tribal governments together into a single functioning system of systems. This presentation will review the current state of the Data Management and Communications (DMAC) subsystem of IOOS, and make some statements about the lessons learned in trying to standardize and harmonize across a diverse community. Some of these lessons relate to information systems technologies, but the most important ones are focused on the organizational issues inherent in a collaborative system of systems. The initial capabilities of DMAC exist and have proven successful, but much work remains to solidify the infrastructure and make it an integral component of all relevant steps in the data stewardship and information lifecycle in which the ocean observing community participates.
Part II - Facilitated Discussion and Development of Data Management for Information Access working group
The final session consists of a roundtable discussion focusing on the challenges faced by the models presented earlier, areas of improvement and future work to streamline data management to accelerate and enhance data usability in research, education and decision-making. In addition, the roundtable will include the establishing a Data Management for Information Access working group with the initial goal of developing a white paper for presentation at the Summer 2013 meeting. The white paper will document the experiences and recommendations for data management for information access that other organizations can use a starting point for their own data management and information access implementations.
GOOGLE DOC - https://docs.google.com/document/d/1laoMX9l7lXeqVKXyUqZVtvk6aML4ZYreJoT7BPxNVSs/edit
Notes on session 1:
UNM 20 years of providing geospatial data clearinghouse services
Context:
RGIS 1.0
FGDC WebMap CAP award: to start experimenting
Public health
PHAiRS: end- to- end services architecture (Using full OGC suite and SOAP services)
SOAP was not strategy of choice
EPHT- interacting with external providers/ using rest-based system
Current System 220,000 records
3- Tiered system
building additional services interfaces on top of the foundation platform
Metadata – Data provider interaction
1st Step in the lifecycle of NM EPSCOR
Process: PI contact Information
Request contact Information
Attempt to contact and provide initial documentation
Tools:
E-mails + Phone
In-house tracking table
Which day contacted
Show excel table
JIRA_ detailed researcher status can be accessed by all metadata team members
Candidate Technical Solutions:
Goals:
Platform independent
Dynamic
Jargon free
Validation
Integrate with current state website
Solutions
Simple Excel forms
ESIP Generic Metadata Editor
(use of a platform called ISLANDORA- very helpful)
Islandora on top of Drupal
Comprehensive Excel workbooks
Generation and Processing:
Generate
Have to generate FGDC forms
Modify on the way out to provide other elements needed
Processing;
Sluicebox
(graphic of process)
Technical solution:
GSTore API
Search request, metadata request JSON Response, Interface Integration, Metadata harvesting,
Data Access
Metadata API/Data API/Services API
Conclusions
Personal Hype-cycle
Visibility and Time:
slowly climbing slope of enlightenment
If you need to share data
Standards=
less effort
more potential users
streamlined ops for integration
If you need someone else’s data
CS-W- geoportal
http://webhelp.esri.com/geoportal_extension/9.3.1/index.htm#geoportal_csw_cmpnts.htm
The IOOS Architecture
Derrick Snowden
(ICOOS Act 2009)
interagency program- seated in NOAA – development of a data management architecture: federal and nonfederal landscape
7 goals 1 system
Observations/Data Management/ Modeling and Analysis
Weather and climate
Maritime operations
Natural hazards
Homeland security
Public health risks
Diagram timeline program to aggregate disparate subsystem
“OO”
Moored Buoys, time series, time series profile
High frequency Radar, Radials, grids
Profiling gliders, trajectory and trajectory profile
(not just mapping)
MARACOOS in position during Storm Sandy to monitor
System of Systems:
Geographical
Managerial
Operational Separation
Implications:
Info is primary artifact
Advances are incremental and interative
Interoperability is paramount
Defining cooperative integration (JPL)
Looking at costs
Push all the data through the National Buoy center (as role of regional association DAC)
(working towards web services to the rest of the world)
3 major components of this data service
data as: consumer/ service registry/ service- overlay this diagram with the tools and standards used
also experimenting with in situ data
using different tools for this type of data
What levels of interoperability must we achieve?
Using example of music mp3 collection overtime:
Missing metadata= duplicates, gaps in information
Most of the time analogy?
Interoperability: between who? And when are we done?
Good list of the clients we serve
Example of NCTOOLBOX for Matlab: showing how far we have come
(developing compelling use cases for why)
List of tools that IOOS supports
Nctoolbox
52NorthSOS
SOS Parser
ncISO
Environmental Data Connectoe
Pyoos
NetCDF Java Library Unstructured
(does not fully any one of there but leveraging)
statement that beyond interoperability the importance of stewardship
Ted does not agree wholly- need both
Phil
Geosearch
Computer enhanced searching interface for geospatial research discovery
Same User experience to global users: interactive, fast, responsive,
Share with 140+ countries
Showing Geoss Clearinghouse and architecture
Both remote and local search to search dispatcher
Vocabularies and semantics: to kimprove accuracy
Example using search query water (CiSC gmu Data Discovery)
Categorization of search results:
· Based on performance
· Based on who is providing
· Based on relevance
Data Exploration
Time series animation
Next Step: concurrent intensive
Location of cloud (using Azure)
Spatiotemporal distribution
Several detailed papers available for info.
Questions: share software details?
Yes- available
WMS, WFS, WCS no DAP or SES
What users have you seen – or are you targeting a specific set of users.
Most for the public with high graphics
Harvesting data? Several different locations for populating
----------Second Half --------- Notes on Session 2-----------
Stefan Falke
Common Information Vision Systems
NGA (National Geospatial Intelligence Agency)
Paradigm shifts
1995 NRC recommendation for NASA Eosdis
fuel for ESIP
Leticia Long @ GEOINT symposium 2010: putting power into the user’s hands
What can NGA due to empower the user?
Vision Integration 2.0
Strategic initiatives
Online GEOINT services
Open information technology environment
NSG Community Model (a community oriented framework)
Improving user access
Self/Assisted/Full Service
Model shows expansion of user/data relationship
Cloud-based Infrastructures
“cross-pollination of analysis”
Unique Identifiers: Structured & Unstructured Data
(focus on provenance)
All data is stamped with entity ID
GUIDs
UUIDs
Entity based identifiers Parent to child relationship is not lost
Webservices Choreography
NGA- geospatial intelligence working group
ITSA Focus Group Standardization Activities
Collaborative Analytics
Spending less time finding and getting ready and more time working with and synthesizing
Statement- Erin Robinson @ AGU retweeting tweets (collaborative analytics)
Identifying events faster
Questioning whether scientists are interested in using data from social media (not citable/ peer-reviewed)
Semantic aspects of social media data, but can be used
Flu outbreaks- Google analytics
Multi-source rules
“source vetting”
uncert http://www.uncert.com/
question about whether anyone is using UNCERT
Spending last hour in discussion and break out groups
Capture dimensions in 3 categories for white paper
1.Data Management (alternative models)
Matrix- talk about the specific data needs
How can we characterize the strengths and weaknesses as they relate to these external information needs
This is what we need to manage… these are the alternative models
2.Interoperability
Dimensions of Discovery
Dimensions of use
(understanding too)
end up with multiple matrices
identify strengths and weaknesses in each
3.Data to information
Translate objects into actionable information and knowledge
Data Requirements
Information Requirements
(How do they relate to each other?)
less of a matrix, developing into scenerios
What do we mean by data management?
Who is the audience for the document?
Core question: What is the difference between data and information.
Self-organizing into groups---10 minutes—15 minutes to talk and wrap up and move forward…
Discussion continues surrounding Data Management models
Lifecycle management systems
Moving on to Core Questions
Question about intent with the white paper:
Targeted at more technically oriented looking at implementing – targeted at system implementers
Emphasis on defining the overall framework
Envision focusing on a more technical audience
Thinking about mining the wiki for past thoughts and ideas.
White Paper can be published on the commons as a stable cite-able resource
Wiki content may not provide the structure that a white paper can.
Versions of the white paper can provide update on the status of the work being done
"Too often we don't start with the user perspective and start with the tool perspective"
Data Management "planning"
audience for the white paper may be someone who is looking for language or guidance as they plan or build a template
continuously changing suite of users
Next Steps:
creating a cluster to be working on this
connection to the IT&I committee
contribute further to translate into a white paper that could go into the commons
4-5 months- looking towards summer meeting in mid July 2013
keeping in contact through email
END.