Do you want your datasets to show up on the first screen of search engine results? Of course you do! This hands-on session is for people who would like to learn more about the extension to the schema.org vocabulary for datasets and data catalogs. We will begin with the basics, introducing schema.org and then highlighting the Dataset and DataCatalog classes. Many data providers generate dataset landing pages on-the-fly by pulling information from a relational database perhaps enhanced with content from a semantic knowledge base. The original structured data is often obfuscated if the dataset landing pages are formatted in HTML, but recent vocabularies provided by schema.org can help restore the structure through shared markup vocabulary. Proper use of the schema.org vocabulary encoded in microdata, RDFa, or JSON-LD formats exposed structured and parsable information to content published out as HTML, and the marked up pages are recognized by search engines.
Several ESIP partners have successfully implemented schema.org Dataset for their dataset landing pages and will guide participants through the process, including a review of tools available from Google Webmaster Tools for validation and testing of search results. Participants in this session are encouraged to bring a sample of their dataset landing pages (e.g. the HTML source code for a representative dataset landing page) and receive guidance on how to add schema.org Dataset markup.
Related information and examples: http://logd.tw.rpi.edu/
Co-conveners: Doug Fils and Adam Shepherd
Code example (jsbin)
People want a box, discovery occurs when people know where to look. Current practices don't do a good job of allowing data discovery
Motivation: expression by community to being able to access data
Connections are lost in the code, portability/sustainability is decreased, becomes dark knowledge
What can we do about it?
APIs: difficult to use and not a good way to drive discovery
Linked open data. Doesn't necessarily drive discovery.
Current search is not enough
Current tools, why the hate?
Schema.org: placing machine readable information into xml
The resolution,: make the web representation of your data smarter
What is involved?: Dataset is the only "science data" focused vocabulary at this time.
-extend your own vocabulary
-no guarantee that others will use it
-there is a formal extension pattern (see FAQ)
shema.org is not a majic bullet, there are other ways. It's not focused on geospatial
It is in active development, new developments in the works shema.org/actions
There are tools( http://goo.gl/Ro5kDv ), any23 for apache
Live demo: paleo-search.appspot.com/#gsc.tab=0
Crafting a scientific knowledge graph
virtuoso, kaylee's (from google, sp?)
Bring forth the affordances:
need to be used to be sustainable.....
Needs to be a data tab on google.com, but to do that data sets need to be marked up with shcema.org
If we mark up our dataset with information about tools, uses for the data, then new tools will be built that leverage that information
use data catalog is you have multiple dataset in a group
few examples, see notes for addresses....
tips for implementation....
Need to put notes in ESIP system...
How Adam and Doug implemented....ask for help if you need it.
Q: Is schema.org code viewable in pages