innovation in metadata design, implementation & best practices

Using Dublin Core

Title:

Using Dublin Core

Creator:
Diane I. Hillmann
Project Manager & Metadata Specialist
National Science Digital Library Project at Cornell
Department of Computer Science
Cornell University
Ithaca, New York, USA
Date Issued:
2000-07-16
Identifier:
Replaces:
Is Replaced By:
Not applicable
Latest Version: http://dublincore.org/documents/usageguide/
Translations:
Status of Document:
This is a DCMI Working Draft.
Description of Document: This document is intended as an entry point for users of Dublin Core. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.

TABLE OF CONTENTS

1. Introduction

2. Which Syntax?

3. Basic Principles of Descriptive Elements

4. The Core Elements

5. Qualifiers

6. Examples

8. Glossary

9. Background Reading and References


1. INTRODUCTION

1.1. What is Metadata?

Metadata describes an information resource. The term "meta" comes from a Greek word that denotes something of a higher or more fundamental nature. Metadata, then, is data about other data. It is the Internet-age term for information that librarians traditionally have put into catalogs, and it most commonly refers to descriptive information about Web resources. However, metadata can serve a variety of purposes, from identifying a resource that meets a particular information need, to evaluating their suitability for use, to tracking the characteristics of resources for maintenance or usage over time. Different communities of users meet such needs today with a wide variety of metadata standards.

A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author, title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf.

The linkage between a metadata record and the resource it describes may take one of two forms:

  1. elements may be contained in a record separate from the item, as in the case of the library's catalog record; or
  2. the metadata may be embedded in the resource itself.

Examples of embedded metadata that is carried along with the resource itself include the Cataloging In Publication (CIP) data printed on the verso of a book's title page; or the TEI header in an electronic text. Many metadata standards in use today, including the Dublin Core standard, do not prescribe either type of linkage, leaving the decision to each particular implementation.

Although the concept of metadata predates the Internet and the Web, worldwide interest in metadata standards and practices has exploded with the increase in electronic publishing and digital libraries, and the concomitant "information overload" resulting from vast quantities of undifferentiated digital data available online. Anyone who has attempted to find information online using one of today's popular Web search services has likely experienced the frustration of retrieving hundreds, if not thousands, of "hits" with limited ability to refine or make a more precise search. The wide scale adoption of descriptive standards and practices for electronic resources will improve retrieval of relevant resources from the "Internet commons." As noted by Weibel and Lagoze, two leaders in the field of metadata development:

The association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself." (Weibel and Lagoze, 1997)

It is this need for "standardized descriptive metadata" that the Dublin Core addresses.

1.2. What is the Dublin Core?

The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core standard comprises fifteen elements, the semantics of which have been established through consensus by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship.

The Dublin Core element set is outlined in Section 4. Each element is optional and may be repeated. Each element also has a limited set of qualifiers, attributes that may be used to further refine (not extend) the meaning of the element. The Dublin Core Metadata Initiative (DCMI) has defined standard ways to "qualify" elements with various types of qualifiers. A registry of qualifiers conforming to DCMI "best practice" is in progress.

Although the Dublin Core favors document-like objects (because traditional text resources are fairly well understood), it can be applied to other resources as well. Its suitability for use with particular non-document resources will depend to some extent on how closely their metadata resembles typical document metadata and also what purpose the metadata is intended to serve.

Dublin Core has as its goals the following characteristics:

Simplicity of creation and maintenance

The Dublin Core element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment.

Commonly understood semantics

Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive practices from one field of knowledge to the next. The Dublin Core can help the 'digital tourist' -- a non-specialist searcher -- find his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For example, scientists concerned with locating articles by a particular author, and art scholars interested in works by a particular artist, can agree on the importance of a "creator" element. Such convergence on a common, if slightly more generic, element set increases the visibility and accessibility of all resources, both within a given discipline and beyond.

International scope

The Dublin Core Element Set was originally developed in English, but versions are being created in many other languages. As of November 1999, there were versions in over 20 languages, including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The Working Group on Dublin Core in Multiple Languages is coordinating efforts to link these versions in a distributed registry using the Resource Description Framework technology being developed by the World Wide Web Consortium ( W3C).

Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin Core development community, the involvement of representatives from almost every continent has ensured that the development of the standard considers the multilingual and multicultural nature of the electronic information universe.

Extensibility

While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs. It is expected that other communities of metadata experts will create and administer additional metadata sets. Metadata elements from these sets could be linked with Dublin Core metadata to meet the need for extensibility. This model allows different communities to use the DC elements for core descriptive information which will be usable across the Internet, while allowing domain specific additions which make sense within a more limited arena.

1.3. The Purpose and Scope of This Guide

This document is intended to an entry point for users of Dublin Core. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.

The guide will show in a non-technical fashion how Dublin Core metadata may be used by anyone to make their material more accessible. This guide discusses the layout and content of Dublin Core metadata elements, how to use them in composing a complete Dublin Core metadata record, as well as how to qualify elements to support use by a wide variety of communities.

Another important goal of this document is to promote "best practices" for describing resources using the Dublin Core element set. The Dublin Core community recognizes that consistency in creating metadata is an important key to achieving complete retrieval and intelligible display across disparate sources of descriptive records. Inconsistent metadata effectively hides desired records, resulting in uneven, unpredictable or incomplete search results.

2. Which Syntax?

In this guide, we have chosen to represent Dublin Core examples in several different syntaxes, including: HTML, the Web's Hypertext Markup Language format, RDF/XML (The Resource Description Framework using eXtensable Markup Language) and in a generic form (Element="value"). HTML provides an easily understood format for demonstrating Dublin Core's underlying concepts, but more complex applications using qualification may find that using RDF/XML makes more sense. When considering an appropriate syntax, it is important to note that Dublin Core concepts are equally applicable to virtually any file format, as long as the metadata is in a form suitable for interpretation both by search engines and by human beings.

2.1. HTML

HTML has two tags that can be used to capture metadata. These are the "" and "" tags. If creating metadata that will be embedded, or appear alongside, an actual document these tags must appear within the HEAD section of the HTML document. For example:



Mating Habits of the Northern Hairy Nosed Wombat



Northern Hairy Nosed Wombats


The Northern Hairy Nosed Wombat is an animal native to Australia....



Indexing programs understand that the metadata record starts after the "" line and ends before the "" line, and are thus able to extract metadata automatically. The metadata does not appear during normal document formatting or printing, and metadata-aware Web browsers may even be able to exploit it. A number of the current search engines have begun to include the ability to make use of the HTML tag in Web documents.

In HTML, each record element definition begins with "". Within the META tag, two attribute/value pairs (as found in other HTML tags) are used to define the metadata. The first is NAME, the second, CONTENT. These two work together to define the metadata within the META tag.

This document will not cover the use of the LINK tags.

2.1.1. Using HTML Syntax

Each descriptive element definition has a NAME attribute and a CONTENT attribute, as in:

Any metadata element may be omitted or repeated. When repeating elements, it is recommended best practice to list each element definition separately, as in:


However, it is also valid to express repeated elements using a single NAME attribute with multiple semi-colon delimited values for the CONTENT attribute, as in:

A Proposed Convention for Embedding Metadata in HTML agreed upon a convention for identifying and grouping metadata schemes in HTML. This convention relies on the use of a prefix to indicate that the elements used are from Dublin Core or another metadata scheme. For increased readability the prefix "DC" should be written in upper case letters and element names should be capitalized. For example:

META NAME="DC.Title"
META NAME="DC.Creator"

NOT

DC.CREATOR or dc.CREATOR or DC.creator

If non-ASCII characters are required, use the same conventions as in the body of the document. For example:

2.2. RDF/XML

[Text still needed here]

Below are some examples of how the META tag might be used in stand-alone and embedded metadata. Note that each metadata definition happens to fit on one line, but in general a definition can span several lines.

2.3. Stand-Alone Metadata

Stand-alone metadata can exist in any kind of database. This example describes a photograph in another file that has a location given by a Uniform Resource Locator (URL). The entire record file looks like this:






2.4. Metadata Contained in a Resource

The next example is of a metadata record contained in a file alongside the document that it describes. The document is a short poem expressed in HTML, the Web's Hypertext Markup Language [3].



Song of the Open Road








I think that I shall never see
A billboard lovely as a tree.
Indeed, unless the billboards fall
I'll never see a tree at all.

3. Basic Principles of Descriptive Elements

Each element is optional and repeatable. Metadata elements may appear in any order. The ordering of multiple occurrences of the same element (e.g., Creator) may have a significance intended by the provider, but ordering is not guaranteed to be preserved in every user environment. For instance, RDF supports ordering, but HTML does not.

3.2. Element Content and Controlled Vocabularies

Content data for some elements may be selected from a "controlled vocabulary," which is a limited set of consistently used and carefully defined terms. This can dramatically improve search results because computers are good at matching words character by character but weak at understanding the way people refer to one concept using different words, i.e. synonyms. Without basic terminology control, inconsistent or incorrect metadata can profoundly degrade the quality of search results. For example, without a controlled vocabulary, "candy" and "sweet" might be used to refer to the same concept. Controlled vocabularies may also reduce the likelihood of spelling errors when recording metadata.

One cost of a controlled vocabulary is in needing an administrative body to review, update and disseminate the vocabulary. For example, the US Library of Congress Subject Headings (LCSH) and the US National Library of Medicine Medical Subject Headings (MeSH) are formal vocabularies, indispensable for searching rigorously cataloged collections. However, both require significant support organizations. Another cost is having to train searchers and creators of metadata so that they know when using MeSH, for example, to enter "myocardial infarction"' instead of the more colloquial "heart attack."

Using controlled vocabularies can be done most effectively using qualifiers.

4. The Core Elements

This section lists each Core element by its full name and label. For each element there is a reference description (taken from the RFC) and there are guidelines to assist in creating metadata content, whether it is done "from scratch" or by converting an existing record in another format. Links to examples and to recommended Dublin Core Qualifiers for each element are also provided.

The elements are listed in the order they were developed, but there are other useful ways to group them. In the following table, you can see that some elements relate to the content of the item, some to the item as intellectual property, still others to the particular instantiation, or version, of the item.

5. Qualifiers

In July of 2000, the Dublin Core Metadata Initiative issued its list of recommended Dublin Core Qualifiers. At the time of the ratification of these qualifiers, the DCMI recognized two broad classes of qualifiers:

6. Glossary

7. Background Reading and References