Speeding Adoption of Web Services in Place of FTP
Purpose:
To explore and perhaps enhance the future of remote data access, highlighting the merits of accessing data via specialized Web services in contrast to FTP, identifying obstacles to (speedier) adoption of such services, and recommending practical actions (perhaps by ESIP) that would address those obstacles.
Panel Discussion with Audience ParticipationModerator: Dave FulkerPanel Members: Jim Frew. Hook Hua and others TBD (hopefully including Peter Cornillon)
In the ESIP community, adoption of Web-based data-access services (in lieu of FTP, e.g.) appears to be lagging related trends in other contexts, as evidenced at the recent Extremely Large Databases conference. This lag may become especially problematic for the ESIP community as:
- Data volumes outpace the growth of network bandwidths;
- Server functions extend the usual notions of "query" to encompass computations (regridding or binning operations, e.g.) that are best executed near the data;
- Multi-disciplinary studies create needs for complex data-discovery workflows, wherein data access occurs only after a sequence of preliminary queries to determine the suitability of a dataset under consideration;
- Trends toward data citation create demands for ancillary data-access services, such as the provision of citation strings and/or persistent Digital Object Identifiers (DOIs).
Attendees of the Winter Meeting are ideally positioned to provide feedback on the existence/extent of the problem, on obstacles to the adoption of Web-based data-access services, and on actions to overcome these obstacles.
-
4 Panel introduction, with a focusing (controversial?) assertion:
- "move data if and only if you can't query them"
-
20 Panel members (5 mins each):
- reactions to the assertion
- 18 Audience comments and/or questions (for panel members)
-
16 Panel members (4 mins each):
- obstacles to adoption
- actions (by ESIP?) to overcome these
- 20 Audience comments and/or questions (for panel members)
-
12 Panel members (3 mins each):
- concluding remarks
Dave Fulker begins, goes through slides
Panel: Jim Frew, Peter Baumann, Hook Hua, Peter Cornillon
Frew: Move data if and only if you can't query them. Most users grab data, then do analysis. It's a big transition to think ask your questions on the server.
Hook: FTP ranks as #1 service used. Matlab-like users typically bring up an app and work in a particular directory all day.
Comment: file name is still the most popular metadata discovery mechanism
Peter C: (had slides, should be uploaded) General idea: FTP prevails because it's easier.
Peter B: Big V's in data, Volume, Velocity (again, the drinking from firehose analogy), Variety, Voracity. 10x? More data is downloaded than needed. DB development has not adapted to GIS well. Table-centric, rasters are considered "unstructured" data. In favor of: Data/Metadata integration
Floor discussion...
Kristine: level of effort to expose your data favors FTP
Comment: possession aspect, people like to "own" a copy of the data
Hook: KISS philosophy - humans tend to gravitate toward simplicity out of necessity.
Peter B: Web Services add filtering and processing capabilities, above the simple "get the whole file" nature of FTP.
Comment: regardless of implementation, end-users prefer a "filesystem-like" view of the data
Frew: drivers of service adoption: 1) bandwidth, 2) he called it "fixed schema" issue, but I say interoperability
Peter B: in Europe, the datacenters are the impasse - they do not want to change their present models
Peter B: For scientists today, they spend 80% of time massaging data, 20% for analysis, a measure of success is to reverse this.
Frew: Need to insure versioning confidence - assurance what we get out of web service pipe is the same each day, or exactly what the diffs are. Just as you trust each time you read a file off disk it is the same.