Finding a "ToolMatch" for your Data Collection

Abstract/Agenda: 

This breakout session will briefly describe the initial use cases, conceptual model, and assumptions underlying the ToolMatch service being built by Semantic Web Cluster participants in collaboration with the ESIP Energy & Climate Cluster, and the ESIP Products & Services Committee using Semantic Web technologies. The initial phase of the ToolMatch service will be demonstrated from both the tool developer and the data collection user perspectives. Following the demonstration, workshop participants will be asked to input specific information about visualization tools appropriate for Earth Science-focused data collections, and / or specific information about data collections appropriate for the use of Earth Science-appropriate visualization tools. Subsequent successes and failures of the service will be discussed among the ToolMatch development team and workshop participants for purposes of refining the use cases, clarifying assumptions and improving the ToolMatch service.

Notes: 

Finding a "ToolMatch" for your Data Collection

 

ToolMatch Service: Finding Tools for your Data and Ata for your tools

A collaboration between ESIP’s: Semantic Web Cluster and Product and Services Committee

Presenters Chris Lynnes, Patrick West, Matthew Ferritto, Nancy Hoebelheinrich

 

·         Toolmatch is an outgrowth of the semantic web cluster

·         Point is to find tools for you data and data for your tools

·         Use cases – ex. Accessible in the OPENDAP hyrax – how to use result tool OR need specific data to run a model (ex. Need rainfall data for model)

·         Show difference capability or different tools

o    HDF view – little bit displayed

o   Ferret – better view of data – see 2D

o   Panoply – can project on map property

o   Giovanni – want to know what data can I view – review tool case

·         Put proposal for funding for at least 1 use case – semantic web cluster funded 1 of them

·         Outcome of toolmatch

o   Proof of concept

o   Refine/augment

o   Integrate

§  These 2 are part of DSTCCP which isn’t moving forward… looking for new partner

o   Facilitate service into a center (JPL or Goddard)

·         BUT need by Sept 30

o   Ontology/inference rule

o   Knowledge store

o   Entering tool/data descript

o   Tools to match

o   Products/services on website

·         http://Toolmatch.esipfed.org

·         Currently have 7 tools – includes a description which takes you to the tool page.  Eventually add additional information on the tool

o   Hope to expand ontology today

·         Dataset – add a doi – pull the metadata and pull the features  or put GCMD-DIF to pull from service,  or use access url

o   When you click on link, get a list of tools that match the datasets

o   Dataset, can also be a collection – if have tons of data and it looks the same, want to represent in a simple way – any of the datasets or files you could use the tool with

·         Second use case on tool list – add information on a tool and get data collections that you know about that work with that tool

·         This is the browser based interface and want to provide a web service – we do the match and return information – then view it in the

o   Whether it is a restful service or a browser based service

·         Q would it help to take this to other clusters

o   Yes & possibility feature it at AGU

o   Have a collection of tools and the tools are misrepresented – would like to explain how to project a plot with Ferret

o   Interacting with the tool maker will help facilitate – also need to use updated version

o   Include the version

·         Q for tool makers, is the steps documented on your site

o   YES – do you want to sass that out yourself or contact the group that makes the tool

·         Q what kinds of inputs do we need to get and then for more information go to

o   Do you want to link recipes to toolmaker

o   Want to point back to website, looking to collect existing tools

o   Ex. Wouldn’t have thought to look up curvilinear to map ferret on lat/long

·         http://bit.ly/1rY95sO

·         Q have you looked at GCMD for the tools

o   Yes, many are outdated

o   Is there an effort to update this list

o   Q challenge is you don’t know the status of the tools – need to know if tool is appropriate for use

·         Presenters are not going to be the ones that create the database of tool use – need tool makers or expert users to add information

o   Q do you have  need for provenance – hadn’t thought of it before

·         Idea is that once the information is entered, we apply inference rules that provide list of tools or data collection

·         Prospects for collaboration

o   USGS is thinking of a similar project but don’t yet have an ontology

§  By Sept only able to talk about

§  Hopefully have the service set up by then and they can add more use cases

o   EarthCube – would like to include tools that are prototyping tools- possibly adding release data

§  Moving forward by task governance IT team – have started to have vocabulary and inventory for EarthCube

o   Geosoft/turbosoft stuff – about documenting key information about reuse of the software – turbosoft has an intelligent advisor

§  Push ahead in a prototyping kind of way

§  We provide key real life links

o   USGRCP – also interested in the data match effort

o   Matt’s paper at funding Friday – extending the second use case – have tool – look at the semantic (meaning – content)

·         Hackathon – originally looking at website, but it isn’t ready – will want to add a hackathon in the future and online

o   Need knowledge store for tools and data

o   Want to reach out for tool developers

·         NOAA ERDAP might be an interesting tool to add

·         Q…  had to have opensearch service

o   Wonder if you are seeing a granularity issue with doi – don’t map because of collection – have not yet started to populate that field yet

·         Q how does the process of building out the visualization types build out – what do they represent in terms of graphical output – more science based stuff like taylor diagrams

o   Right now, simple model in terms of capability

§  Raster gridding, raster mapping, vector mapping ,vector gridding

§  Question is should it remain simple or

·         Q what about size of data capability

o   Not yet – because sometimes it is a configuration limit

o   Only look at bugs in software

o   Might think about an annotation – may not work with files over 2GB – textual information about the capability that is not already described in the model

·         Q would it be worth keeping track of different installation of tools

o   This is one of the complexity – lets deal with desktop based tools first

·         Q what about plug-in or framework – example use tool but need plugin to do this

o   Pathfinder tool is arcGIS – not a lot of arcGIS users know what can be used

o   Q Plugins have version and some might have issues with core software

o   Only look at simple part

·         Collection model, because of too much metadata

o   All you needed to know the format of the data

o   Distingue between plotting and mapping

o   Past that to CFDFG (discrete sample geometry – point or swath or…)

o   Additionally a lot of data that wouldn’t normally work if it was presented through opendap – then enables more functionality

o   CF is tricky & fertile – think of it as 3 (4 if data set discovery) – CF coordinate conventions, CF DSG (related to coordinate), CF standard name which gives semantic meaning

·         Want help populating tool capability

o   This will be the first shot and ask developers to help

·         People who are keen will help populate

·         This is going to be an interesting time because of difference between NetCDF classic vs. enhanced

·          

From Nancy Hoebelheinrich:

Question: should we go to other domain clusters to get more information about tools?
Answers: Yes, e.g., HDF group, Discovery cluster, toolmakers in any group; take question to AGU

Suggestion made to provide more information about versions and capabilities associated: E.g., in case of Ferret, there are different "recipes" for different tasks associated with different versions of this tool. We'd considered how to add version information, but have not included as yet. Would like guidance from the community on this issue.

Question: Have we looked to the GCMD tool directory for information about tools? We'd need SURF (sp?) IDs to access. (from Tyler Stevens)
Answer: Have not done so because information is not necessarily up to date. In order to use, we'd need to know that the source of info was correct. Info about tools might best come from tool developers or tool mavens who know the tools very well.

Comment: There will be lots of edge cases where tools can't easily be categorized to visualize data a certain way. Might be valuable to include the capability of adding annotations and/or crowdsourcing to the tool so that tool users can more fully describe a situation (or collection) for which a tool does or does not work. May need to have IDs associated with the input on the tools so that people can better judge. Or, perhaps have more than one input entry on a given tool.

Annotations should help describe the contextual information to the characteristics of a tool that will provide limitations or constraints on the tools. We plan to deal with desktop tools first, before tackling the larger problem of server or web based tools.

Note: We should specifically invite tool developers for the online hackathon when we hold it.

Suggestion: from Matt Austin: include the NOAA ERDDAP tool to the list.

Suggestion: Add Open Search as a basis for web services in addition to REST based services.

Question: Have we dealt with granularity issues with respect to the use of DOIs?
Answer: As we have not yet collected much information about datasets, we have not yet dealt with this issue. Hopefully, granules will be similar enough to each other, and to a given data collection, that a DOI for a collection will suffice. We'll see!

Comment: Regarding the delineation of visualization types; very sparse. Will need more specificity with respect to the process of visualization. (from Roland Schweitzer?) He can help!

Suggestion (Stefan from RPI): Add a way to put in the need or capability of adding other kinds of software or plugins, especially when linked with versions.

We'll need / want help filling out the dataset conceptual model from the community as that's next.

Suggestion: Add discrete geometries to the data convention property (attribute), e.g., CD-DSG.

Suggestion: Put out a call for help filling out the knowledge store for both tools and datasets, and for determining what kinds of capabilities should be added to tools.

Volunteers: Tyler Stevens, Roland Schweitzer (sp?), Matt Austin.

Citation:
Hoebelheinrich, N.; Finding a "ToolMatch" for your Data Collection; Summer Meeting 2014. ESIP Commons , April 2014