• Metadata Schema Guide
  • Preface
    • Acknowledgments
  • Introduction
  • I RATIONALE AND OBJECTIVES
  • 1 The challenge of finding, accessing, and using data
    • 1.1 Finding data
    • 1.2 Accessing data
    • 1.3 Using data
    • 1.4 A FAIR solution
  • 2 The features of a modern data dissemination platform
    • 2.1 Features for data users
      • 2.1.1 Browser
      • 2.1.2 Simple search interface
      • 2.1.3 Document as a query
      • 2.1.4 Suggested queries
      • 2.1.5 Advanced search
      • 2.1.6 Geographic search
      • 2.1.7 Query user interface
      • 2.1.8 Semantic search and recommendations
      • 2.1.9 Latest additions and history
      • 2.1.10 Customized views
      • 2.1.11 Data and metadata as a service
      • 2.1.12 Ranking results
      • 2.1.13 Filtering results
      • 2.1.14 Sorting results
      • 2.1.15 Collections
      • 2.1.16 Linking results
      • 2.1.17 Organized results
      • 2.1.18 Saving and sharing results
      • 2.1.19 Personalized results
      • 2.1.20 Metadata display and formats
      • 2.1.21 Variable-level comparison
      • 2.1.22 Transparency in access policies
      • 2.1.23 Data and metadata API
      • 2.1.24 Online data access forms
      • 2.1.25 Data preview
      • 2.1.26 Data extraction
      • 2.1.27 Data visualizations
      • 2.1.28 Permanent URLs
      • 2.1.29 Archive / tombstone
      • 2.1.30 Catalog of citations
      • 2.1.31 Reproducible and replicable scripts
      • 2.1.32 Notifications or alerts
      • 2.1.33 Providing feedback
      • 2.1.34 Getting support
      • 2.1.35 Web content accessibility
    • 2.2 Features for data providers
      • 2.2.1 Safety
      • 2.2.2 Visibility
      • 2.2.3 Low burden
      • 2.2.4 Real time information on usage
      • 2.2.5 Feedback from users
    • 2.3 Features for catalog administrators
      • 2.3.1 Data deposit
      • 2.3.2 Privacy protection
      • 2.3.3 Free software
      • 2.3.4 Security
      • 2.3.5 IT affordability
      • 2.3.6 Ease of maintenance
      • 2.3.7 Interoperability
      • 2.3.8 Flexibility on access policies
      • 2.3.9 API based system for automation and efficiency
      • 2.3.10 Featuring tools
      • 2.3.11 Usage monitoring and analytics
      • 2.3.12 Multilingual capability
      • 2.3.13 Embedded SEO
      • 2.3.14 Widgets and plugins
      • 2.3.15 Feedback to developers
    • 2.4 Machine learning for a better user experience
      • 2.4.1 Improved discoverability
      • 2.4.2 Improved results ranking
  • 3 The power of rich, structured metadata
    • 3.1 Rich metadata
      • 3.1.1 Benefits for data users
      • 3.1.2 Benefits for data producers
      • 3.1.3 Scope of the metadata
      • 3.1.4 Controlled vocabularies
      • 3.1.5 Tags
    • 3.2 Structured metadata
      • 3.2.1 What structure?
      • 3.2.2 Formats for structured metadata: JSON and XML
      • 3.2.3 Benefits of structured metadata
    • 3.3 Augmenting metadata
    • 3.4 Recommended standards and schemas
      • 3.4.1 Documents
      • 3.4.2 Microdata
      • 3.4.3 Geographic datasets, data structures, and data services
      • 3.4.4 Time series, indicators
      • 3.4.5 Statistical tables
      • 3.4.6 Images
      • 3.4.7 Audio
      • 3.4.8 Videos
      • 3.4.9 Programs and scripts
      • 3.4.10 External resources
    • 3.5 Search engine optimization: schema.org
      • 3.5.1 The basics of search engine optimization
    • 3.6 Where to find the schemas’ documentation
    • 3.7 Generating structured metadata
      • 3.7.1 Generating compliant metadata using a metadata editor
      • 3.7.2 Generating compliant metadata using R
      • 3.7.3 Generating compliant metadata using Python
  • II STANDARDS AND SCHEMAS
  • 4 Documents
    • 4.1 MARC 21, Dublin Core, and BibTex
    • 4.2 Schema description
      • 4.2.1 Metadata information
      • 4.2.2 Document description
      • 4.2.3 Provenance
      • 4.2.4 Tags
      • 4.2.5 LDA topics
      • 4.2.6 Embeddings
      • 4.2.7 Additional fields
    • 4.3 Complete examples
      • 4.3.1 Example 1: Working Paper
      • 4.3.2 Example 2: Book
      • 4.3.3 Example 3: Importing from a list of documents
  • 5 Microdata
    • 5.1 Definition of microdata
    • 5.2 The Data Documentation Initiative (DDI) metadata standard
      • 5.2.1 DDI-Codebook
      • 5.2.2 DDI-Lifecycle
    • 5.3 Some practical considerations
    • 5.4 Schema description: DDI-Codebook 2.5
      • 5.4.1 Document description
      • 5.4.2 Study description
      • 5.4.3 Description of data files
      • 5.4.4 Variable description
      • 5.4.5 Variable groups
      • 5.4.6 Provenance
      • 5.4.7 Tags
      • 5.4.8 LDA topics
      • 5.4.9 Embeddings
      • 5.4.10 Additional
    • 5.5 Generating and publishing DDI metadata
      • 5.5.1 Using the World Bank Metadata Editor
      • 5.5.2 Using R or Python
  • 6 Geographic data and services
    • 6.1 Background
    • 6.2 Geographic information metadata standards
      • 6.2.1 Documenting geographic datasets - The ISO 19115 standard
      • 6.2.2 Describing data structures - The ISO 19115-2 and ISO 19110 standards
      • 6.2.3 Describing data services - The ISO 19119 standard
      • 6.2.4 Unified metadata specification - The ISO/TS 19139 standard
    • 6.3 Schema description
      • 6.3.1 Introduction to ISO19139
      • 6.3.2 Common sets of elements
      • 6.3.3 Core metadata properties
      • 6.3.4 Main metadata sections
    • 6.4 ISO 19110 Feature Catalogue (feature_catalogue)
    • 6.5 Provenance
    • 6.6 Tags
    • 6.7 LDA topics
    • 6.8 Embeddings
    • 6.9 Additional
    • 6.10 Complete examples
      • 6.10.1 Example 1 (vector - shape files): Bangladesh, Outline of camps of Rohingya refugees in Cox’s Bazar, January 2021
      • 6.10.2 Example 2 (vector, CSV data): Syria Refugee Sites (OCHA)
      • 6.10.3 Example 3 (vector, with Feature Catalogue) - The GDIS (beta) dataset
      • 6.10.4 Example 4 (raster): Spatial distribution of the Ethiopian population in 2020
      • 6.10.5 Example 5 (service): The United Nations Geospatial website
    • 6.11 Useful tools
  • 7 Databases of indicators
    • 7.1 Database vs indicators
    • 7.2 Schema description
      • 7.2.1 Provenance
      • 7.2.2 Tags
      • 7.2.3 LDA topics
      • 7.2.4 Embeddings
      • 7.2.5 Additional
  • 8 Indicators and time series
    • 8.1 Indicators, time series, database, and scope of the schema
    • 8.2 Schema description
      • 8.2.1 The time series (indicators) schema
      • 8.2.2 Provenance
      • 8.2.3 Tags
      • 8.2.4 Additional
    • 8.3 Generating and publishing compliant metadata - Complete example
      • 8.3.1 Use of AI for metadata augmentation
      • 8.3.2 Using R
      • 8.3.3 Using Python
  • 9 Statistical tables
    • 9.1 Introduction
    • 9.2 Anatomy of a table
    • 9.3 Schema description
      • 9.3.1 Cataloguing parameters
      • 9.3.2 Metadata information
      • 9.3.3 Table description
      • 9.3.4 Provenance
      • 9.3.5 Tags
      • 9.3.6 Additional (custom) elements
    • 9.4 Complete examples
      • 9.4.1 Example 1
      • 9.4.2 Example 2
      • 9.4.3 Example 3
  • 10 Images
    • 10.1 Image metadata
      • 10.1.1 Embedded metadata: EXIF
      • 10.1.2 IPTC and Dublin Core standards
      • 10.1.3 Augmenting image metadata
    • 10.2 Schema description
      • 10.2.1 Common elements
      • 10.2.2 IPTC option
      • 10.2.3 Dublin Core option
      • 10.2.4 Additional elements (IPTC and DCMI)
      • 10.2.5 LDA topics
      • 10.2.6 Embeddings
    • 10.3 Examples
      • 10.3.1 Example 1 - Using the IPTC option
      • 10.3.2 Example 2 - Using the DCMI option
  • 11 Videos
    • 11.1 Augmenting video metadata
    • 11.2 Schema description
      • 11.2.1 Metadata information
      • 11.2.2 Video description
    • 11.3 Complete example
      • 11.3.1 In R
      • 11.3.2 In Python
  • 12 Research projects and scripts
    • 12.1 Rationale
    • 12.2 Motivation for open analytics
    • 12.3 Goal: discoverable code
    • 12.4 Schema description
      • 12.4.1 Document description
      • 12.4.2 Project description
      • 12.4.3 Provenance
      • 12.4.4 Tags
      • 12.4.5 Additional
    • 12.5 Generating compliant metadata
      • 12.5.1 Full example, using a metadata editor
      • 12.5.2 Full example, using R
      • 12.5.3 Full example, using Python
  • 13 External resources
    • 13.1 Example of use of external resources
  • ANNEXES
  • Annex 1: References and links
  • Annex 2: Mapping standards and schemas to schema.org
    • 1.1 Microdata
    • 1.2 Geographic data
    • 1.3 Indicators (and database)
    • 1.4 Tables
    • 1.5 Images
  • Annex 3: Mapping the microdata schema to the DDI Codebook 2.5
  • Annex 4: Mapping the geographic schema to DCAT/schema.org
  • Annex 5: Mapping the indicator/time series schema to schema.org
  • Annex 6: Mapping the table schema to schema.org
  • Annex 7: Mapping the image schema to Dublin Core, IPTC, and schema.org
  • Annex 8: Mapping the audio schema to Dublin Core and schema.org
  • Annex 9: Mapping the video schema to Dublin Core and schema.org
  • Annex 10: Mapping the research/script schema to Dublin Core and schema.org
  • Published with bookdown

[DRAFT - WORK IN PROGRESS] Metadata Standards and Schemas for Improved Data Discoverability and Usability

Annex 1: References and links

Documents

  • Asian Development Bank (ADB). 2001. Mapping the Spatial Distribution of Poverty Using Satellite Imagery in Thailand ISBN 978-92-9262-768-3 (print), 978-92-9262-769-0 (electronic), 978-92-9262-770-6 (ebook) Publication Stock No. TCS210112-2. DOI: http://dx.doi.org/10.22617/TCS210112-2

  • Balashankar, A., L.Subramanian, and S.P. Fraiberger. 2021. Fine-grained prediction of food insecurity using news streams

  • British Ecological Society. 2017. Guide to Reproducible Code in Ecology and Evolution

  • Google. Google’s Search Engine Optimization (SEO) Starter Guide

  • Jurafsky, Daniel; H. James, Martin. 2000. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0-13-095069-7

  • Mikolov, T., K.Chen, G.Corrado, and J.Dean. 2013. Efficient Estimation of Word Representations in Vector Space

  • Min, B. and Z.O’Keeffe. 2021. http://www-personal.umich.edu/~brianmin/HREA/index.html

  • Priest, G.. 2010. The Struggle for Integration and Harmonization of Social Statistics in a Statistical Agency - A Case Study of Statistics Canada

  • Stodden et al. 2013. Setting the Default to Reproducible - Reproducibility in Computational and Experimental Mathematics

  • Turnbull, D. and J. Berryman. 2016. Relevant Search: With applications for Solr and Elasticsearch

Links (standards, schemas, controlled vocabularies)

  • American Psychological Association (APA): APA Style (example of specific publications styles for a table)

  • Consortium of European Social Science Data Archives (CESSDA)

  • US Census Bureau, CsPro Users Guide: Parts of a Table

  • Data Documentation Initiative (DDI) Alliance

  • DDI Alliance, Data Documentation Initiative (DDI) Codebook

  • Dublin Core Metadata Initiative (DCMI)

  • eMathZone: Construction of a Statistical Table

  • GoFair (Findable, Accessible, Interoperable and Reusable (FAIR))

  • International Household Survey Network (IHSN)

  • International Press Telecommunications Council (IPTC)

  • International Organization for Standardization (ISO) 19139: Geographic information — Metadata — XML schema implementation

  • LabWrite: Designing Tables

  • schema.org

  • Microsoft Bing: Bing Webmaster Tools Help & How-To Center, Bing Webmaster Guidelines

  • Vedantu: Tabulation

Links (tools)

  • CKAN open-source data management system
  • ElasticSearch
  • GeoNetwork
  • Milvus)
  • NADA cataloguing application, web page
  • NADA cataloguing application, demo page
  • NADA cataloguing application, GitHub repository
  • NADAR package
  • Nesstar Publisher (DDI 1.n Metadata Editor
  • R: The R Project for Statistical Computing
  • R Bookdown: Write HTML, PDF, ePub, and Kindle books with R Markdown
  • R geometa: Tools for Reading and Writing ISO/OGC Geographic Metadata
  • Solr

Links (others)

  • WorldPop: https://www.worldpop.org/