Evaluating “data-intensive capability” across the Environmental & Earth Sciences – introducing a new profiling tool for community self-assessment


Evaluating “data-intensive capability” across the Environmental & Earth Sciences – introducing a new profiling tool for community self-assessment

Joint Workshop: School of Information Sciences University of Pittsburgh, UKOLN Informatics University of Bath, Microsoft Research Connections, Research Data Alliance

Overview: The partner organisations have collectively developed a Community Capability Model Framework (CCMF) for Data-Intensive Research [1], building on the principles described in The Fourth Paradigm [2]. The CCMF explores readiness for data-intensive science and we have developed a profiling tool for applying to different communities and domains. We have chosen environmental and earth sciences as a “deep-dive” area and are now seeking views from ESIP members to create and collect data profiles. The work is also part of the global Research Data Alliance initiative and we have formed a CCM Interest Group [3].

Objectives: This breakout session will include a scene-setting presentation, an opportunity for hands-on testing of the profiling tool, and time to give feedback and discuss ESIP community views. We aim to collect completed Capability Profiles from participants during the course of the ESIP meeting.

Outcomes: Participants will gain an understanding of the CCMF and its aims; participants will also complete a capability assessment of their own community using the CCMF Capability Profile Tool.  They will have the opportunity to discuss findings and provide feedback on the CCMF and Capability Profile Tool, as well as consider further applications.

 [1] CCMF White Paper, http://communitymodel.sharepoint.com

[2] The Fourth Paradigm, http://research.microsoft.com/en-us/collaboration/fourthparadigm/

[3] RDA CCM-IG, http://rd-alliance.org/working-groups/community-capability-model-wg.html/


Taking the group through a tool that has been developing with Microsoft research connections

The speaker begins with talking through the background context of the program

Reference to the Fourth Paradigm

Data is at the center of the transformative research practices

Self-assessment: systems to diagnosis to action

This can be done at different levels:

  • PI

  • Funding Level

  • Federation Level

  • Project Level

Title: CCM Framework http://www.communitymodel.sharepoint.com/

Developed over a year 6- international workshops

Produced case studies

Culminated in a white paper (available from the web page)

Model is comprehensive: not just technical

8 capability factors:

Research Culture


Skills and Training


technical infrastructure

common practices

3 case studies in 2012

Lists funding bodies, institutions, and researchers

in 2013 there is an RDA interest group

Lists aims and activities of the interest group

Developed the tool and testing in “deep dive areas” as well as light touch areas

CCM-IG Capability Profile Template

Scorecard based tool- categories for each of a series of the characteristics

Series of assessments to find where your system or project exists

Walking through the Administrative- about you and your data

Collaboration: Disciplinary - from your project - from your institution - in terms of your data

address collaboration across sectors, disciplines, within disciplines, with public

Q: addressing how to define the public collaboration

Skills and Training:

Thinking about your discipline and instruction with data management, data collection and description, data description and identification...copyright… etc…

Q: about who is the audience in evaluation ?

Discussion issues of semantics  (student - could be rephrased as user)

  • Suggestion that try application to a domain

  • Phrased in a way that is appropriate to the domain of research

  • Phrases that help you recognize yourself - Additional phrases that refers to users


related to data

in the course of research, published literature, data specifically, methodologies, reuse of existing data

Q: Why would reuse of data be related to openness?

A: discussion of classification- choices of categories- may disagree..

Technical Infrastructure

Range of categories: tools, tools support, curation, discovery and access, integration and collaboration platforms, visualizations and platforms for citizen science

Q: discussion of wording on category 5 relating to tools

Common Practices

in terms of your discipline; data formats, data collection methods, standard vocabularies, semantics, data packing and transfer

Economic and business models:

sustainability of funding for research

geographic scale of funding, physical size, funding for infrustruture, size and geographic scale of infrastructure funding, ROI

Q: potential issue of one component canceling another out in regards to the eccentricities of funding- worth looking at other potential pairs of responses that would offset each other

Suggestion: might add some other examples that people are familiar with

Legal Ethical and Commercial:

In respect to data

legal and regulatory framework, management of ethical responsibilities, management of commercial constraints

Q: Asking about clarification of commercial constraints

A: Thinking about a consortium project with both academic and private funding partners

Research Culture:

Entrepreneurship, innovation, and risk; reward models for researchers; quality and validation frameworks (expressed significance of this last row)

Ask if willing to work as a group to create a profiles for specific areas


Could use this for a center or for a project

Karl - could use for center at the University of New Mexico

Use for center at George Mason University

Q: curiosity about weighting categories depending on the scale: as project or center, data intensive, research intensive

Suggestion from Liz for people trying the framework out to make notes as they fill it in

Carol: sees this as a tool for ESIP as a whole -  to uncover certain strengths and weaknesses - gap analysis

- Perhaps have everyone fill it out and see what consistency

addresses struggle to assess the effects internally - or to capture the communities effectiveness

-May not yield publishable results, but as a tool it is simple and effective

Karl: the ability to roll up or characterize an organization as a whole - ability to see the range based on perspectives - where you sit on the value chain

Acknowledging the idea of evaluation perspective of the individual

Also interesting for measuring impacts or changes - do repeat analysis in the federation or elsewhere to see how programs are changing

Mentions discussions about whether this tool is indeed a scorecard

Rebecca mentions that it is important to do this longitudinally as opposed to past anonymous assessments

Q: Carol asks how can ESIP help you?

A: Executive committee (some selection bias) could evaluate this

or could you do this on a larger scale using the mailing list to see these perspectives statistically

mentions the importance of being sensitive to how information and responses would be used.

there would need to be a statement up front about safety of information

thinking about IRBs (the person who has the instrument must go through the IRB board)

Speaking of timeframe it may be something that could be reintroduced at the summer medium

Q: delineating between pre and post project assessment

A: discussions of interest with NSF use and Microsoft

Carol: Has a hunch that depending on who you ask there will be very different responses- domain communities within the organization

Where the members spend their time will be helpful to get full perspective

Speaker ends session and provides email address for further discussion.

[email protected]

Lyon, L.; Evaluating “data-intensive capability” across the Environmental & Earth Sciences – introducing a new profiling tool for community self-assessment; Winter Meeting 2014. ESIP Commons , November 2013