Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Why is data stewardship useful?

In our current 🔰 Information Age, almost all :curly_loop: data is digital. Additionally, the amount of data is ever increasing. The simple fact that a lot of time, investment and research is being spent on the management and analysis of so-called big data is the perfect example of this.

Additionally, the rise of data-intensive 🔰 Artificial Intelligence (AI) models, which require massive amounts of curated data, placed a lot of focus on the way data is gathered, managed and ultimately used.

Data stewardship is a broad term which covers the way data---and in particular research data---is gathered, handled while being used, and ultimately shared to the public. In short, a data steward develops and implements data governance guidelines and policies for a certain institution. See the page on ➰ data steward competences to learn more. The DSK covers specifically the topic of research data management in the context of public research performing institutions, but these ideas can also apply to privately-owned companies.

Having good data governance policies, especially for a large institution, has deep, rippling benefits:

  • The amount of data created and their quality is monitored and this monitoring data can be used for a variety of purposes, such as the overview of costs associated with data handling;
  • Data loss and corruption is prevented via data protection and backup policies;
  • The risk of forgetting or misinterpreting data created in the institution is minimized, thanks to the creation and maintenance of specific (meta)data standards and quality assurance procedures.
  • Sharing data, especially following stringent requirements from founders, is first of all possible, and can be done quickly and easily;
  • Risks of data duplication are minimized, as old, properly stewarded data can be reused in new applications again and again, avoiding repeated costs stemming from data acquisition;
  • Sensitive data, such as that subjected to privacy policies such as the ➰ General Data Protection Regulation, is properly managed and protected, and correctly anonimized before being shared (if shared at all).
  • Creation of in-house data storage and management solutions can give the institution better control over their assets, so that its output can be better appreciated and protected. Additionally, creation of these resources allows researches to fulfill the requirements set out by founders and national policies.
  • For the public at large, robust (meta)data standards---particularly when implementing the FAIR principles---can aid in the creation of semantics-aware and copyright-compliant large-scale data orchestration tools, such as meta-analyses and AI tools.

For individual researches, the basics of data stewardship and administration can help in the management of day-to-day data, preventing data loss, confusion or errors during the analysis. Finally, data management policies are essential for the success of large-scale, multi-laboratory projects.

From the economical point of view, improper data stewardship causes a direct and indirect loss of more than 10 billion euro per year in Europe alone (see 🏢 this European union report) due to wasted researcher time and staunched economical growth.

For this point, it is often said that researchers lose about 80% of their time for data handling and management. This claim is repeated often: see, e.g. :speech_baloon: here, :speech_baloon: here, :speech_baloon: here, :speech_baloon: here or the book from Barend Mons "Data Stewardship for Open Science" (ISBN: 978-1-032-09570-7). Looking in depth at these sources, some of these claims may be overstated. A smaller, but still crucially very bad figure could realistically be lower, from 40% to 50%.

Barend Mons proposes that 5% of all project budgets should be devoted to hiring and paying data experts (and specifically data stewards) to reduce the friction for using data inside the project and re-use it outside of the project.

It is also important to remember that analysis processes, for data stewards, are simply another type of data - one what can be "executed". Therefore, data management policies and solutions also increase the quality, reproducibility and robustness of data analysis processes and therefore their results.

Finally, the conscious, purposeful management of data gives importance to it as the most fundamental scientific product, giving value and recognition to researchers which produce it and preventing fraud and other forms of academic misconduct.


Data stewardship can improve researcher efficiency, enhance or even enable their ability to collaborate, promote the transparency and robustness of results and has direct benefits, also in terms of economic opportunities and growth, to the institutions that implement it.