Access and use of NASA and other federal Earth science data
This session will address the access and use of Earth science data that is offered by NASA and other federal agencies. Data users will describe their experiences in accessing and using data to analyze environmental phenomena or create mobile applications. The session will feature a panel of federal data users and a discussion with data users and providers.
Access and use of NASA and other federal Earth science data
Lead: Ethan McMahon
Update: IT&I Rant & Rave will be Thursday, October 2, 3 ET and will follow up from the summer meeting panel on this topic.
Rant & Rave Title: Access and use of federal Earth science data
Are you curious about what data users want? How we can help new audiences for Earth science data, like the interested public and developers, answer scientific questions or create decision support tools? Call in to the Rant and Rave session on Thursday at 3 PM Eastern time to explore these questions in more depth.
At the ESIP summer meeting, a panel offered their views about the access and use of federal Earth science data (see http://commons.esipfed.org/node/2419 for notes and presentations). On Thursday we can follow up on that discussion and talk about these questions. Feel free to edit this page with your suggestions and please bring your experience and ideas to the discussion!
- What do new users of ES data (interested public and developers) want?
- How can we find out what they want?
- What audiences should the data provider community focus on?
- What can the data provider community do to help them?
Notes from the panel at the summer meeting:
Session is focused from the data user perspective
introduction, panelists describe experiences, followed by discussion.
There are many earth science data users. Researchers, interested public, software designers
Increasing interest from public and software developers. Preform their own analyses/citizen scientists. And those making green apps or have desire to help but do not have the context.
Working with developers on apps, they posted problem statements and data. Then asked them to see what they could come up with, with open source code and systems. Two good outcomes, and many prototypes that can become more robust products. The developers have lots of questions, and that means waiting for emails or calls, and it is often not what the producers have expertise or knowledge to answer. Also developers might not be successful because of poor documentation, not experts in this area, or unusable data.
Mentioned a few examples, see slides.
White house initiative - Big Earth Data Initiative. Common approaches to discoverability, accessibility, and usability. Hoping to figure out how people can do their work so we can get to the analysis stages.
Tamara Ledley of TERC
Janet Fredericks of Woods Hole Oceanographic Institution
Margaret Mooney of UW-Madison
Rob Carver the weather channel
Ethan introduced each speaker before their presentation.
Making geoscience data accessible and usable in education.
Earth Exploration toolbook (EET) - each chapter provides step by step instructions for access, use and analysis for the data. It is for the teachers so they have enough experience to create an activity for their students. There are 44 chapters.
How do we get to creating these chapters? ONce we have the format? Started with DLESE. Found a data provider and then bridge the gap between the creators and educators. Avaailbe in digital libraries.
Each team for data access - has 5-6 people. Data provider, tool specialist, scientists, curriculum developer, educator. THese vary depending on the topic of the chapter. The areas of expertise were treated as peers with equal weight in the conversations. For many educators this was the first time they were treated at this level.
They held 6 access data workshops from 2004-2009. They had ESIP funding for one through the funding friday events. There were 240 participants and 10 teams per workshop. They conducted an evaluation with primary and secondary roles. In the evaluation, they asked what people found most valuable. What was striking, was people wanted more team break out time. Also networking with those in other fields was also important. Educators and scientists or technologists.
They conducted a longitudinal study of the science/tech and educational communities. Meeting science standard and pattern recognition was higher in the educators.
Other results related to “what do you do with the data?”. Depended on educational community as well.
Longitudinal impacts survey - gave some examples from the results related to data specialists. Gave some examples of how the workshops impacted their process in managing data and making it available.
Review criteria for data sets - gave examples of some of these points. Ex: Curriculum developers can find and use data easily - for high school vs. undergrad etc.
Data sheets - as published in EOS in 2008, Ledley et al. Gave example of the educationally relevant metadata standards for data sets.
Data sheets website has more examples of collections etc. We looked at one of the websites http://serc.carleton.edu/usingdata/browse_sheets.html walked through selecting a data set and exploring data found.
EET utilisation metrics - looking over 7.5 year range. Visit duration, page depth, returing vs. new visitors.
Experiences Accessing Federal Data
Talking about two stories - 1) data accessed to look at a coastal breach, and 2) another looking at satellite data - elevations etc.
1) went to google and looked for time breach information - when did the breach happen? Found it happened between July 2006-July 2007. Google did not help so went to Data.gov. Found the viewer she had used 3 years ago, updated browser, logged in, downloaded data needed, found April 1st and May 3rd as a more specific date range. Then went back to the data and was able to be more detailed/direct.
2) Looking at features on the Mid Atlantic continental shelf, and if the conditions were changing. Talked to people in the community. The links that worked in 2010 no longer work now when checking the sources. Then she looked at the coastal relief model. Tried a few others seafloor etc. but found the right one. In the past had to download the individual blocks of data, but now you could get the whole thing. In blocks this was hard to reconnect. Had to deal with two volumes and multiple blocks. But new features let her just put in coordinates. But finding the tool was difficult and hard to find again. In 2010 used Rich Signell’s query you could use in Matlab. Had to update many things when retrying in 2014, but used the service on his site. To get actual data instead of a map.
Confessions of a data Hoarder
Open data and the Weather Company - took open data and becoming a big business. Usually from weather service, to create forecasts and tell stories. Over 100 TB of data in many locations. Model data, radar, shapefile data (FEMA, census bureau etc).
Locating data - google and literature search, ????, Data! Had to make evaluations about the project to pick sources that showed up in the results.
Most data accessed in unidata manager. and ECMWF pushes down data. This is ingested in to forecast system and GRADS. Then it is archived on local disk arrays and Amazon 3S.
NCSC archives - order data, pull down two year sets because it is easier than asking for a subset.
FEMA flood maps - problematic acquisition. DVD per state, large shapefiles. Had to parse them and break it down by county.
Suggestions - data in a difficult/proprietary format is just wasted disk space; use well supported open source software packages for data formats; instead of complex CSV files use self describing formats like JSON, XML, etc.; data/navigation files should use same naming process; don’t use overly large archive files, data pools attached to large disk arrays are awesome; for really large data sets bittorrent would be useful.
applications of satellite data used in the SatCam and WxSat mobile apps
SSEC data center has various data, which is funded by soft money and it is free. To push it out is a fee. Two apps, WxSat and SatCam. In SatCam there are 4 different screens. Walked through how the app works, with satellite images. Working to add more data on air quality to SatCam.
Matt - How was visualisation used while verifying that you have the data you want before you download in ascertaining data quality and or in the metadata standards? What worked and what more would have been wanted? Tamara said those were suggestions and not guidelines. To ask them how the workshops impacted and these were suggestions of things that would be needed.
Jeff from NOAA - question to Rob, working on ways to make things better, but specifically asked about GRIB a good or bad format? Rob - I am used to GRIP so maybe stockholm syndrome. Obvious GRIP one is bad but GRIP two get better. and it is open.
Question - given satellite data becoming bigger, should we push grabbing data vs central servers? Maybe ship algorithm and then we can send you the data rather than asking for all of it? So bandwidth is not tied up? Rob - that might have to be a community process in the cloud.
Follow up to that - does most of Carver's data get processed by WSI? Follow up to follow up - does he see WSI be the baseline, will WSI put base stations down for others? Answer from audience - SMAP made decision to not make direct broadcast capability But other venues and such may Two issues - latency vs bandwidth
Answer to Ethan's question to group - construct services that people can build on, that everyone can use.
Rama adds, you can't have one set of services - because you have different types of users.
Follow up from Ge - should we make datasets catered to different users, or have datasets follow standards to then use services for the users?
From Rama: What about the value chain? Others adding to value chain will be on the backs of agencies.
Answer: First and foremost - original has value, get that metadata. Then move toward interoperable.
There is a way to have the data to be available, and then you might later offer them in other "easier" formats.
World moving towards data-services…
The DAACs get metrics from how many times their data sets are downloaded. Two questions for panel - How can we get more information that would be useful? What are the biggest stumbling point?