Introduction

Over the last decade, the supply of socio-economic data available to researchers and policy makers has increased considerably, along with advances in the tools and methods available to exploit these data. This provides the research community and development practitioners with unprecedented opportunities to increase the use and value of existing data.

#Note: Data that were initially collected with one intention can be reused for a completely different purpose. (…) Because the potential of data to serve a productive use is essentially limitless, enabling the reuse and repurposing of data is critical if data are to lead to better lives. (World Bank, World Development Report 2021)

But data can be challenging to find, access, and use, resulting in many valuable datasets remaining underutilized. Data repositories and libraries, and the data catalogs they maintain, play a crucial role in making data more discoverable, visible, and usable. But many of these catalogs are built on sub-optimal standards and technological solutions, resulting in limited findability and visibility of their assets. To address such market failures, a better market place for data is needed.

A better market place for data can be developed on the model of large e-commerce platforms, which are designed to effectively and efficiently serve both buyers and sellers. In a market place for data, the “buyers” are the data users, and the “sellers” are the organizations who own or curate datasets and seek to make them available to users – preferably free of charge to maximize the use of data. Data platforms must be optimized to provide data users with convenient ways of identifying, locating, and acquiring data (which requires the implementation of a user-friendly search and recommendation system), and to provide data owners with a trustable mechanism to make their datasets visible and discoverable and to share them in a cost-effective, convenient, and safe manner.

Achieving such objectives requires detailed and structured metadata that properly describe the data products. Indeed, search algorithms and recommender systems exploit metadata, not data. Metadata are essential to the credibility, discoverability, visibility, and usability of the data. Adopting metadata standards and schemas is a practical and efficient solution to achieve completeness and quality of the metadata. This Guide presents a set of recommended standards and schemas covering multiple types of data along with guidance for their implementation. The data types covered include microdata, statistical tables, indicators and time series, geographic datasets, text, images, video recordings, and programs and scripts.

Chapter 1 of the Guide outlines the challenges associated with finding and using data. Chapter 2 describes the essential features of a modern data catalog, and Chapter 3 explains how rich and structured metadata, compliant with the metadata standards and schemas we describe in the Guide, can enable advanced search algorithms and recommender systems. Finally, Chapters 4 to 13 present the recommended standards and schemas, along with examples of their use.

This Guide was produced by the Office of the World Bank Chief Statistician as a reference guide for World Bank staff and for partners involved in the curation and dissemination of data related to social and economic development. The standards and schemas it describes are used by the World Bank in its data management and dissemination systems, and for the development of systems and tools for the acquisition, documentation, cataloguing, and dissemination of data. Among these tools is a specialized Metadata Editor designed to facilitate the documentation of datasets in compliance with the recommended standards and schemas, and a cataloguing application (“NADA”). Both applications are openly available.