A Framework to Evaluate the Return on Investment (ROI) of a Data Repository


This working session will continue the efforts initiated at the Tempe Workshop in November, 2015 and continued at the ESIP 2016 Winter Meeting.   For background information, please see below.  All are invited.

This effort seeks to develop a framework to evaluate the return on investment (ROI) of a data repository, providing help in determining the value of data and services around them to stakeholders of varying perspectives.  

Our agenda for this working session is to:

1) BRIEFLY recap the summaries of our discussions of the references provided on this page: http://wiki.esipfed.org/index.php/Return_on_Investment_ROI_References.   This is only a recap as we expect to more thoroughly discuss these references via telecons leading up to this session.   We request that participants in the session either participate in those telecons or peruse the references beforehand in order to arrive up to speed at the session.

The summaries are very short reviews of the references with respect to these issues and questions:
·   How is value (to stakeholders) defined / discussed?
·   What are the definitions / explanations / categories of repository stakeholders?
·   How were similarities among data repositories defined / discussed?
·   How were differences among data repositories defined / discussed?
·   What, if any, were the metrics used to measure the value(s) returned to  repository stakeholders?
·   What do the references say about the reason(s) for caring about this topic?

b) Discuss potential funding possibilities for a planning grant to develop this idea and scope some work.   We will lay groundwork for this part of the discussion beforehand by investigating possible funding opportunities and their requirements and goals, also via our telecons.

c) Target one of these possibilities and develop an outline for a proposal that takes into account their requirements, schedule, project scope, etc.


In November 2015 representatives from various data repositories, data service providers, and others participated in a two and one half day workshop in Tempe, AZ, sponsored and funded by NSF, to discuss collaborative strategies for sustained environmental data management.  As an introduction, see the following quote from a briefing document participants received before the workshop:

“Many environmental data repositories were initiated to fulfill specific needs or objectives, i.e.  archiving and disseminating data from a project, network of research sites, institution, funding source, to accompany paper publications, or more recently, as data papers. This initiative was funded with the goal of exploring how we might develop this network of repositories in a way that will produce new collaboration and curation strategies that also cater to the currently underserved single investigators and move environmental data from ‘available’ to ‘usable’, in order to accelerate scientific inquiry.

With this goal in mind we are bringing together data curators from a range of environmental research fields, data aggregators, tool developers, computer scientists and environmental scientists (both data providers and users) for an informed dialog which draws on our collective experience managing data and repositories.”

Several of the topics that surfaced during the workshop garnered enough interest from participants to request that discussions continue under the auspices of a new ESIP cluster  Those topics have coalesced to be:

    ⁃    Defining a Return on Investment (ROI) of Data Repositories for Society
    ⁃    Conducting a Landscape Analysis and Gaps for Environmental Data Repositories and Describing a Common Technical Vision

The ESIP cluster, Sustainable Data Management, was recently formed.  The wiki page is http://wiki.esipfed.org/index.php/Sustainable_Data_Management.   

The ROI working group exists under this cluster.  Notes from the ROI session at the ESIP Winter Meeting and other information are available at http://wiki.esipfed.org/index.php/Return_on_Investment_Subgroup_%28ESIP_....



  • Bob provided a brief introduction regarding the Sustainable Data Management cluster (please see attached slides for further details).
  • Using JISC's "The Value and Impact of Data Sharing and Curation" (attached as a PDF file), Anne provided an overview of the information that could be leveraged from this report to evaluate the ROI of data archives/repositories.
    • Key portions highlighted from the report include the following:
      • Figure 1: Methods for exploring the economic value and impacts of research data centres on Page 9 - provides 5 different areas of the method.
      • Table 2: Data and approaches used in the three studies on Page 11 - examples of how ROI is evaluated.
        • Figure 3: The value and impacts of the three UK data centres - summarizes the study using the areas from Figure 1.
          • This comparison indicates that while general guidelines could be used, each evaluation case needs to develop its own specific details.  However, overall, all studies benefitted from the ROI evaluations.
      • Qualitative benefits of the ROI evaluation are demonstrated through Figure 2: The KRDS Benefits Framework on page 10.
      • Additional recommendations are also provided; review of the attached PDF files is recommended.
  • Bob also presented the following - "Review of the Beagrie and Houghton Report: The Value and Impact of the European Bioinformatics EBI Institute" (presentation file is attached).
    • Access value is defined as the value perceived by those who accessed the data.
    • The presentation showed the following from Beagrie, N. & Houghton, J. 2016. The Value and Impact of the European Bioinformatics Institute. Charles Beagrie Ltd: Salisbury: 
      • Methodology used for the report (including analysis of log data, user surveys, and literature reviews).
        • There were several caveats related to the methodology, such as annual expense based on 3-year average minus research costs, limited statistics for some services, and unique accesses based on monthly reports across services.
          • These caveats suggested that additional attention/improvement could be added if the Sustainable Data Management cluster were to perform its own study.
        • There were also specific key characteristics associated with the methods used for deriving the usage-related value, contingent value, and research production value.
          • Again, similar to the caveats, the information relating to the methods for the above values indicated that these are areas for further review when devising one's own ROI studies.
      • Key quantitative and qualitative findings from the report were also shown.
      • In general, when designing ROI studies, it is important to consider and plan for potential pitfalls and limitations of the methods used for the studies.
  • Next, Ruth presented "ROI: Why a Framework is needed?" (Ruth's presentation is attached as a PDF file).
    • The following terms are defined in Ruth's presentation for the purpose of this session's discussions:
      • Storage, archiving, preservation, and curation.
    • The value of a repository and the value of a data/dataset both influence the discussion of ROI of a data repository.
      • Maturity assessment and analytic potential are two complementary frameworks that could help in informing the values of repositories and data/dataset.
      • Metadata could also play a role in differentiating repositories. Likewise, the activities relating preservation and curation could also differentiate repositories.
      • Ultimately, who should be paying in order to ensure high quality for these characteristics could be achieved?
        • This directly impacts ROI, so it is important to take all these differentiators into account for ROI framework.
  • As Sophie was moderating for the last portion of the meeting and could not take notes simultaneously, please refer to the recording for further details regarding the final portion of the session.
      Select: “July 21 PM Junior Ballroom A” (this session will start at the second half of the recording).


