Metadata Schema Guide
Preface
Acknowledgments
Introduction
I RATIONALE AND OBJECTIVES
1
The challenge of finding, accessing, and using data
1.1
Finding data
1.2
Accessing data
1.3
Using data
1.4
A FAIR solution
2
The features of a modern data dissemination platform
2.1
Features for data users
2.1.1
Browser
2.1.2
Simple search interface
2.1.3
Document as a query
2.1.4
Suggested queries
2.1.5
Advanced search
2.1.6
Geographic search
2.1.7
Query user interface
2.1.8
Semantic search and recommendations
2.1.9
Latest additions and history
2.1.10
Customized views
2.1.11
Data and metadata as a service
2.1.12
Ranking results
2.1.13
Filtering results
2.1.14
Sorting results
2.1.15
Collections
2.1.16
Linking results
2.1.17
Organized results
2.1.18
Saving and sharing results
2.1.19
Personalized results
2.1.20
Metadata display and formats
2.1.21
Variable-level comparison
2.1.22
Transparency in access policies
2.1.23
Data and metadata API
2.1.24
Online data access forms
2.1.25
Data preview
2.1.26
Data extraction
2.1.27
Data visualizations
2.1.28
Permanent URLs
2.1.29
Archive / tombstone
2.1.30
Catalog of citations
2.1.31
Reproducible and replicable scripts
2.1.32
Notifications or alerts
2.1.33
Providing feedback
2.1.34
Getting support
2.1.35
Web content accessibility
2.2
Features for data providers
2.2.1
Safety
2.2.2
Visibility
2.2.3
Low burden
2.2.4
Real time information on usage
2.2.5
Feedback from users
2.3
Features for catalog administrators
2.3.1
Data deposit
2.3.2
Privacy protection
2.3.3
Free software
2.3.4
Security
2.3.5
IT affordability
2.3.6
Ease of maintenance
2.3.7
Interoperability
2.3.8
Flexibility on access policies
2.3.9
API based system for automation and efficiency
2.3.10
Featuring tools
2.3.11
Usage monitoring and analytics
2.3.12
Multilingual capability
2.3.13
Embedded SEO
2.3.14
Widgets and plugins
2.3.15
Feedback to developers
2.4
Machine learning for a better user experience
2.4.1
Improved discoverability
2.4.2
Improved results ranking
3
The power of rich, structured metadata
3.1
Rich metadata
3.1.1
Benefits for data users
3.1.2
Benefits for data producers
3.1.3
Scope of the metadata
3.1.4
Controlled vocabularies
3.1.5
Tags
3.2
Structured metadata
3.2.1
What structure?
3.2.2
Formats for structured metadata: JSON and XML
3.2.3
Benefits of structured metadata
3.3
Augmenting metadata
3.4
Recommended standards and schemas
3.4.1
Documents
3.4.2
Microdata
3.4.3
Geographic datasets, data structures, and data services
3.4.4
Time series, indicators
3.4.5
Statistical tables
3.4.6
Images
3.4.7
Audio
3.4.8
Videos
3.4.9
Programs and scripts
3.4.10
External resources
3.5
Search engine optimization: schema.org
3.5.1
The basics of search engine optimization
3.6
Where to find the schemas’ documentation
3.7
Generating structured metadata
3.7.1
Generating compliant metadata using a metadata editor
3.7.2
Generating compliant metadata using R
3.7.3
Generating compliant metadata using Python
II STANDARDS AND SCHEMAS
4
Documents
4.1
MARC 21, Dublin Core, and BibTex
4.2
Schema description
4.2.1
Metadata information
4.2.2
Document description
4.2.3
Provenance
4.2.4
Tags
4.2.5
LDA topics
4.2.6
Embeddings
4.2.7
Additional fields
4.3
Complete examples
4.3.1
Example 1: Working Paper
4.3.2
Example 2: Book
4.3.3
Example 3: Importing from a list of documents
5
Microdata
5.1
Definition of microdata
5.2
The Data Documentation Initiative (DDI) metadata standard
5.2.1
DDI-Codebook
5.2.2
DDI-Lifecycle
5.3
Some practical considerations
5.4
Schema description: DDI-Codebook 2.5
5.4.1
Document description
5.4.2
Study description
5.4.3
Description of data files
5.4.4
Variable description
5.4.5
Variable groups
5.4.6
Provenance
5.4.7
Tags
5.4.8
LDA topics
5.4.9
Embeddings
5.4.10
Additional
5.5
Generating and publishing DDI metadata
5.5.1
Using the World Bank Metadata Editor
5.5.2
Using R or Python
6
Geographic data and services
6.1
Background
6.2
Geographic information metadata standards
6.2.1
Documenting geographic datasets - The ISO 19115 standard
6.2.2
Describing data structures - The ISO 19115-2 and ISO 19110 standards
6.2.3
Describing data services - The ISO 19119 standard
6.2.4
Unified metadata specification - The ISO/TS 19139 standard
6.3
Schema description
6.3.1
Introduction to ISO19139
6.3.2
Common sets of elements
6.3.3
Core metadata properties
6.3.4
Main metadata sections
6.4
ISO 19110 Feature Catalogue (
feature_catalogue
)
6.5
Provenance
6.6
Tags
6.7
LDA topics
6.8
Embeddings
6.9
Additional
6.10
Complete examples
6.10.1
Example 1 (vector - shape files): Bangladesh, Outline of camps of Rohingya refugees in Cox’s Bazar, January 2021
6.10.2
Example 2 (vector, CSV data): Syria Refugee Sites (OCHA)
6.10.3
Example 3 (vector, with Feature Catalogue) - The GDIS (beta) dataset
6.10.4
Example 4 (raster): Spatial distribution of the Ethiopian population in 2020
6.10.5
Example 5 (service): The United Nations Geospatial website
6.11
Useful tools
7
Databases of indicators
7.1
Database vs indicators
7.2
Schema description
7.2.1
Provenance
7.2.2
Tags
7.2.3
LDA topics
7.2.4
Embeddings
7.2.5
Additional
8
Indicators and time series
8.1
Indicators, time series, database, and scope of the schema
8.2
Schema description
8.2.1
The time series (indicators) schema
8.2.2
Provenance
8.2.3
Tags
8.2.4
Additional
8.3
Generating and publishing compliant metadata - Complete example
8.3.1
Use of AI for metadata augmentation
8.3.2
Using R
8.3.3
Using Python
9
Statistical tables
9.1
Introduction
9.2
Anatomy of a table
9.3
Schema description
9.3.1
Cataloguing parameters
9.3.2
Metadata information
9.3.3
Table description
9.3.4
Provenance
9.3.5
Tags
9.3.6
Additional (custom) elements
9.4
Complete examples
9.4.1
Example 1
9.4.2
Example 2
9.4.3
Example 3
10
Images
10.1
Image metadata
10.1.1
Embedded metadata: EXIF
10.1.2
IPTC and Dublin Core standards
10.1.3
Augmenting image metadata
10.2
Schema description
10.2.1
Common elements
10.2.2
IPTC option
10.2.3
Dublin Core option
10.2.4
Additional elements (IPTC and DCMI)
10.2.5
LDA topics
10.2.6
Embeddings
10.3
Examples
10.3.1
Example 1 - Using the IPTC option
10.3.2
Example 2 - Using the DCMI option
11
Videos
11.1
Augmenting video metadata
11.2
Schema description
11.2.1
Metadata information
11.2.2
Video description
11.3
Complete example
11.3.1
In R
11.3.2
In Python
12
Research projects and scripts
12.1
Rationale
12.2
Motivation for open analytics
12.3
Goal: discoverable code
12.4
Schema description
12.4.1
Document description
12.4.2
Project description
12.4.3
Provenance
12.4.4
Tags
12.4.5
Additional
12.5
Generating compliant metadata
12.5.1
Full example, using a metadata editor
12.5.2
Full example, using R
12.5.3
Full example, using Python
13
External resources
13.1
Example of use of external resources
ANNEXES
Annex 1: References and links
Annex 2: Mapping standards and schemas to schema.org
1.1
Microdata
1.2
Geographic data
1.3
Indicators (and database)
1.4
Tables
1.5
Images
Annex 3: Mapping the microdata schema to the DDI Codebook 2.5
Annex 4: Mapping the geographic schema to DCAT/schema.org
Annex 5: Mapping the indicator/time series schema to schema.org
Annex 6: Mapping the table schema to schema.org
Annex 7: Mapping the image schema to Dublin Core, IPTC, and schema.org
Annex 8: Mapping the audio schema to Dublin Core and schema.org
Annex 9: Mapping the video schema to Dublin Core and schema.org
Annex 10: Mapping the research/script schema to Dublin Core and schema.org
Published with bookdown
[DRAFT - WORK IN PROGRESS] Metadata Standards and Schemas for Improved Data Discoverability and Usability
Annex 9: Mapping the video schema to Dublin Core and schema.org
[to do]