Back to top

OECD Report on Business Models for Sustainable Data Repositories

Today the OECD Global Science Forum has released a new report on Business Models for Sustainable Research Data Repositories, available on the OECD iLibrary (

This report is the culmination of two years of work by the OECD-CODATA's high-level expert group. The group includes DRI's Director Natalie Harrower, and is chaired by CODATA's Executive Director Simon Hodson and DANS' Deputy Director Ingrid Dillo. To prepare the recommendations, which are aimed at policy-makers, the group conducted in-depth surveys with a wide range of research data repositories from different domains across the globe, as well as extensive workshops and consultations with a range of stakeholders.

The publication is a contribution to the OECD #GoingDigital project, which aims to provide policymakers with the tools they need to help their economies and societies prosper in an increasingly digital and data-driven world. Its findings are relevant to governments, organisations, and funders involved in the areas of open science, open research, open data, research data management (RDM), digital preservation, business models, repositories, and sustainability.



The report makes the following recommendations:

Recommendation 1: All stakeholders should recognise that research data repositories are an essential part of the infrastructure for open science.

Research data repositories provide for the long-term stewardship of research data, thus enabling verification of findings and the re-use of data. They bring considerable economic, scientific, and social benefits. Hence, it is important to ensure the sustainability of research data repositories.

Sustainability depends, inter alia, on a clearly articulated value proposition and the development of a “business model” (See Recommendation 2).

  • Policy makers and research funders should take a strategic view of the data landscape and seek to ensure the appropriate provision of repositories. They can do this by ensuring that the researchers they fund have access to suitable and sustainable research data infrastructure, so that the research community can meet expectations for data preservation and sharing, and comply with open data mandates.

  • Research data repository operators and managers need to study and understand the value proposition of their repositories, and clearly articulate it for all stakeholders in the research system.

  • Research data repository operators and managers should continually review their business model as a repository evolves, and revise it accordingly.

Recommendation 2: All research data repositories should have a clearly articulated business model.

Actions needed to develop and maintain a successful business model include (Figure ES.1):

  • Understanding the lifecycle phase of the repository's development (e.g. the need for investment funding, development funding, ongoing operational funding, or transitional funding).

  • Developing the product/service mix (e.g. basic data, value-added data, value-added services and related facilities, or contract and research services).

  • Understanding the cost drivers and matching revenue sources (e.g. scaling with demand for data ingest, data use, the development and provision of value-adding services or related facilities, research priorities, and policy mandates).

  • Identifying revenue sources (e.g. structural funding, host institutional funding, deposit-side charges, access charges, and value-added services or facilities charges; Identifying who the stakeholders are (e.g. data depositors, data users, research institutions, research funders, policy makers).

  • Making the value proposition to stakeholders (e.g. measuring impacts and making the research case, measuring value and making the economic case, informing, and educating.

Because the context is dynamic, these actions should be revisited regularly throughout a data repository's lifecycle.


Recommendation 3: Policy makers, research funders, and other stakeholders need to consider the ways in which data repositories are funded, and the advantages and disadvantages of various business models in different circumstances.

It is important to consider what system of allocation will best ensure that the optimal level of funding will be made available for research data repositories. For example:

  • Structural funding typically involves a trade-off between funding for data repositories and funding for other research infrastructure or for research itself. That allocation will best be made by informed actors making choices, such as through a funding allocation process involving widespread research stakeholder participation, expert consultation, and “road-mapping”.

  • Funding models depending on deposit or access fees bring the trade-off closer to the researchers, but their success in optimising allocation will depend on the extent to which the actors are informed and on their freedom of choice. The latter may be constrained by open data mandates (regulation).

  • Host institutional funding may divorce informed actors from the funding decisions or require additional processes to ensure greater stakeholder understanding of the value of the repository services.

Project funding often provides a mechanism to test the need for a data repository and the initial capacity to create one. However, as the repository matures and scales to provide an ongoing, reliable and quality service, a different funding model is likely to be needed.

  • From an economic perspective, this is the distinction between investment funding to establish a business, and an ongoing revenue source during the operational phase.

  • This distinction is not yet well made in the research data repository environment, but should form an important part in the design and evolution of repository business models.

Research data repository costs will change over time. As the global data repository infrastructure evolves there will be increasing learning and scale economies, which have the potential to reduce repository costs, although this needs to be balanced against increased data flows.

  • Consequently, policy makers and funders should be wary of allocating a fixed percentage of research funding for research data repository infrastructure, as it would be very difficult to establish the appropriate level and very difficult to change it once established.

  • The allocation of funds is likely to be better made when left up to those closest to their application (e.g. allocating funding to science and letting researchers and research managers meet open data requirements as best suits their needs).

Recommendation 4: Research data repository business models are constrained by, and need to be aligned with, policy regulation (mandates) and incentives (including funding).

  • Policy makers should be cautious of “un-funded mandates”. They should combine regulation and incentives thoughtfully to achieve best results.

  • Some business models depend on willingness to fund the repository in recognition of a strong value proposition (e.g. structural or host funding). Other business models are heavily dependent on strong policy incentives and regulation (e.g. deposit-side charges). Still other business models may limit data re-use and reduce the overall benefits that could be derived from research data curation and sharing (e.g. access-side charges).

  • A key issue is matching funding and revenue sources to the main cost drivers, to ensure that revenue scales with demand and repository costs. These cost drivers can relate to the level of activities (e.g. deposits, access, and use), and/or to the level of curation (e.g. basic versus enhanced).

Recommendation 5: In the context of financial sustainability, opportunities for cost optimisation should be explored in order to be able to effectively manage digital assets over time.

Therefore, policy makers, research funders, and repository managers should:

  • Obtain greater clarity concerning costs, in order to fully understand and manage them.

  • Consider cost optimisation system-wide (throughout the whole data lifecycle), rather than simply focus on cost savings at the repository level, as there is a risk that repository cost saving may only lead to cost shifting and/or a reduction in overall access to data.

  • Consider the effect a funding model has on cost constraints, as the more a funding model depends on or creates low price elasticity of demand, the lower the incentive for cost constraints will be.

  • Monitor the research landscape for emerging opportunities. As data repository activities grow and develop, there will be increasing opportunities to buy services from specialist providers, potentially enabling greater cost optimisation.

  • Take advantage of economies of scale. For example:

− By encouraging or funding the establishment of lead organisations for open research data at the national level, and encouraging those organisations to collaborate globally.

By encouraging or funding collaboration and federation. Not all research data repositories need to perform specialised curation and preservation tasks. Similarly, not all institutions or organisations need to create individual repositories. Collaboration and federation can help to manage and reduce costs.