innovation in metadata design, implementation & best practices

Recording qualified Dublin Core metadata in HTML meta elements

Title:

Recording qualified Dublin Core metadata in HTML meta elements

Creator:
Creator:
Creator:
Date Issued:
2000-08-15
Identifier:
Replaces:
Is Replaced By:
Latest Version:
Status of Document:
This is a DCMI Working Draft.
Description of Document: The Dublin Core Metadata Element Set (DCMES) allows much descriptive information about resources to be expressed. However, in some applications it is desirable to refine the meanings of the DCMES metadata. A method for refining DCMES is encompassed in an extended model known as Qualified Dublin Core metadata, which requires additional labels and data, known generically as qualifiers. In this recommendation, we describe the methods provided directly by HTML, and explain how to record Qualified DCMES metadata in HTML using the <meta> element in the document <head>.

Table of Contents

  1. Introduction
  2. HTML notation
  1. Discussion
  1. Recommendation
  2. Examples
  3. References

1. Introduction

The Dublin Core metadata Element Set (DCMES) [DCMES] allows much descriptive information about resources to be expressed. However, in some applications it is desirable to refine the meanings of the DCMES metadata. A method for refining DCMES is encompassed in an extended model known as Qualified Dublin Core metadata [qDC], which requires additional labels and data, known generically as qualifiers.

DCMES metadata may be recorded in many ways. These include, but are not restricted to, tables, database systems, and serialisations in XML [DCMES-XML] and HTML [DCMES-HTML]. Though popular, HTML is syntactically limited, particularly for recording more complex information models. Nevertheless, by using suitable conventions most of the requirements of the Qualified DC model may be covered. In this recommendation we describe the methods provided directly by HTML, and explain how to record Qualified DCMES metadata in HTML using the <meta > element in the document <head >.

2. HTML notation

Two elements from the <head> of a HTML document are used in recording metadata: <link > and <meta >. All data must be contained within the values of the attributes of these elements.

A HTML <link > element allows a relationship with another document to be recorded. The HTML specification [HTML] defines the attributes of <link > elements, of which the following are useful to us here:

href %URI; (CDATA) identifies the related resource  
rel %linktypes; (CDATA) type of link forward from this document to the related resource  
rev %linktypes; (CDATA) type of link back from this document to the related resource  

Usually only one of rel and rev will apply to a single link. We use <link > to indicate the location of schemas or definitions of the terms used elsewhere in the document, and then apply a shorthand notation similar to XML namespaces [XMLnames]. The href attribute records the location of the schema, and the value of the rel attribute indicates that the link is to a schema, and establishes a prefix which can be used with terms elsewhere in the document as in the following example:

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/">

which ties the prefix DC to the schema located at http://purl.org/dc/elements/1.1/.

The HTML specification [HTML] describes another method for indicating the schema location, using the profile attribute of the <head> element. That method effectively generates a default namespace for the terms used, without requiring a prefix. While the profile method may be used in many instances, care must be taken when mixing terms from multiple schemas - see Which Schema?.

2.2 element

HTML <meta > elements allow a simple list of metadata to be recorded. The HTML specification [HTML] defines the attributes of elements, of which the following are useful to us here:

name name the metadata element label
content CDATA the metadata value
schema CDATA indicates an encoding scheme used for the value
lang %Languagecode indicates the natural language
dir LTR | RTL indicates the text direction of the value

2.2.1 Metadata Elements

The DCMES element-name and value are recorded in the name and content attributes respectively [DCMES-HTML], in the following pattern:

<meta name="DC.Element" content="Value">

where Element is one of the 15 DCMES Elements and Value is the value of this element for the resource of interest. The prefix to the element name "DC." refers to the schema indicated in a <link > element in the same document, as described above.

In order to clearly record the text and labels used for qualification of DC [qDC] additional positions in the notation are required. For qualified DC we use different positions for qualifiers corresponding to the different classes of qualification.

2.2.2 Element Refinements

Element Refinements are not supported directly in HTML <meta > elements.

To refine the meaning of an element, an Element Refinement may be appended to the DCMES element name separated by a dot (.), and stored as the name attribute:

<meta name="DC.Element.ER" content="Value">

where ER is an element refinement.

This follows much existing practice [DCMES-HTML].

2.2.3 Value Encoding Schemes

Value Encoding Schemes are supported directly in HTML <meta > elements, using the attributes scheme and lang.

lang is used in cases where the value is in plain-text, and scheme otherwise:

<meta name="DC.Element" scheme="schemeA" content="Value coded according to schemeA">
<meta name="DC.Element" scheme="listB" content="Value selected from listB">
<meta name="DC.Element" lang="langC" content="Value expressed in language langC">

where:

If a scheme or lang is specified, then the value in the content must be encoded according to that scheme, including the use of any structure and punctuation.

2.3 qDC-HTML

The complete syntax for expressing qualified DC elements in HTML may be summarised:

<link

    rel="schema.DC"

    href="http://dublincore.org/qdcmes/1.0/"

    title="DCMES plus DCMI recommended qualifiers">

<meta name="DC.Element" content="Unqualified value">

<meta name="DC.Element.ER" scheme="schemeA" content="Value coded according to schemeA">

<meta name="DC.Element.ER" scheme="listB" content="Value selected from listB">

<meta name="DC.Element.ER" lang="langC" content="Value expressed in language langC">

where the codewords are:

In actual instances of DC metadata each of these codewords is replaced by tokens or strings defined in a qDC registry, indicating conformant qualifiers (see examples below).

2.4 qDC-XHTML

XHTML is a reformulation of HTML using XML [XML]. The XHTML recommendation [XHTML] describes a number of changes that are necessary to make documents valid XHTML. Two of these concern us here:

  1. all attributes must be quoted
  2. empty elements, such as <meta > and <link >, must be properly closed with a "/" before the closing >.

Qualified DCMES metadata [qDC] can be recorded using the <meta > element in XHTML as follows:

<link

    rel="schema.DC"

    href="http://dublincore.org/qdcmes/1.0/"

    title="DCMES plus DCMI recommended qualifiers" />

<meta name="DC.Element" content="Unqualified value" />

<meta name="DC.Element.ER" scheme="schemeA" content="Value coded according to schemeA" />

<meta name="DC.Element.ER" scheme="listB" content="Value selected from listB" />

<meta name="DC.Element.ER" lang="langC" content="Value expressed in language langC" />

The space before the "/" is not strictly necessary for XHTML, but is recommended since it allows most HTML clients to treat the XHTML document correctly, thus allowing a single XHTML document to be used for both cases.

3. Discussion

3.1 "Dumb-down" - recovering unqualified DC metadata

A client system may be unable to process qDC metada presented to it for several reasons, particularly:

  1. it is only configured to support the basic 15 DCMES
  2. a particular qualifier is encountered that is not supported by client.

In such cases it is necessary to consider how the information degrades to a simpler form.

For qDC metadata recorded in HTML according to the method described here a simple rule may be applied: discard any qualifiers that are not understood.

For encoding schemes the result is straightforward. While the full meaning of an encoded value requires that the client understands the notation, a client system may still process the value found in the content while ignoring the scheme or lang attribute. Any notation based on character-strings will not inconvenience existing software.

Furthermore, parsing of a value may not even be necessary for resource discovery. The string-matching methods used in most search operations should still find the target text-strings from within extended values. Systems will harvest a text value into an index, regardless of notation, and the resource may still be located by users who will often have a knowledge of specialised notations, independent of the indexing software, and thus will be able to construct sensible and successful queries.

For refined elements the unqualified ("dumb") version is recovered by removing the part of the name following the DCMES element name. This requires more sophistication on the part of client software. Nevertheless, as the hierarchical dot (.) notation includes the DCMES element name earlier within the token, the dumb element should always be clear.

3.2 Which schema?

It is conventional, but not mandatory, for the prefix to use the character string "DC" when recording DC metadata, although any other string could be substituted. A different prefix might particularly be desirable in order to refer to a local schema that modifies or extends DCMES, and when using multiple schemas within the same document. For example:

<link

    rel="schema.DC"

    href="http://purl.org/dc/elements/1.1/"

    title="The Dublin Core metadata Element Set">

<link

    rel="schema.AGLS"

    href="http://www.naa.gov.au/recordkeeping/gov_online/agls/summary.html"

    title="The Australian Government Locator Service metadata element set">

<meta name="DC.Creator" content="Andrew Wilson">

<meta name="AGLS.Function" scheme="AGIFT" content="NT:Information Management Standards">

(here we have used the optional title attribute to record additional annotation).

When qualifiers are used, then these should also be defined in the schema linked to the prefix used on the element instance.

3.3 Structure and Grouping

A significant limitation of HTML is that there is no explicit, recursive, grouping mechanism for elements. This means that general recording of fully structured metadata in HTML elements is not possible.

Nevertheless, there are two methods for listing repeated values for DC metadata elements, which may be important in particular cases:

  1. repeating the entire <meta name="DC.Element" ... > for the particular element, with different values
  2. putting the values in a list in a single element.

These two different grouping methods may be used by metadata providers to indicate structure distinguishing between values that need to be grouped (e.g. information identifying a single location in a Coverage element) and values that are distinct (e.g. identifiers for several different locations relevant to a single resource).

4. Recommendation

The syntax described here contains the components required to represent the qualified DC model, while remaining fully conformant with HTML [HTML]. It offers a recording method compatible with HTML tools such as browsers and metadata harvesters.

While tools to make full use of the qualified information may not yet be widely available, metadata providers may neverthless use the syntax described here to record rich information. Since the requirements of the semantic model for qDC are captured by the notation described here, users may be confident that software can be built to extract the qualified metadata for migration into other notations preserving full semantics, so an investment in capturing rich information in this way will not be wasted.

A set of qualifiers for general use has been issued by DCMI [qDC], to encourage interoperability and to illustrate exemplary practice. Other qualifiers might be developed for use in local situations or by particular communities. They can be recorded using the same mechanisms as for the DCMI approved qualifiers, providing they follow the guidelines of the qDC model.

5. Examples

The following examples would appear within the element of HTML documents.

<link rel="schema.DC"
href="/qdcmes/1.0/"
title="DCMES plus DCMI recommended qualifiers">

<meta name="DC.Language" scheme="RFC1766" content="en-AU">

<meta name="DC.Date.created" scheme="W3CDTF" content="1999-04-21">
<meta name="DC.Date.modified" scheme="W3CDTF" content="1999-04-28">

<meta name="DC.Relation.requires" scheme="URI" content="http://www.foo.bar/stylesheet.css">
<meta name="DC.Relation.isFormatOf" scheme="URI" content="http://www.foo.bar/explanation.doc">
<meta name="DC.Relation.hasFormat" scheme="URI" content="http://www.foo.bar/explanation.pdf">

<meta name="DC.Format.medium" scheme="IMT" content="text/html">
<meta name="DC.Format.extent" content="27 kB">

<meta name="DC.Format.medium" scheme="IMT" content="image/gifl">
<meta name="DC.Format.extent" content="27 kB">

An extended example using all the qualifying and structuring components discussed here, and using two distinct schemas, is:

<link
    rel="schema.DC"
    href="http://dublincore.org/qdcmes/1.0/"
    title="DCMES plus DCMI recommended qualifiers">
<link
    rel="schema.AGCRC"
    href="http://www.agcrc.csiro.au/4dgm/metadata_schema/"
    title="AGCRC metadata schema">
<meta name="DC.Identifier"
        scheme="URI"
        content="http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/figure1.gif">
<meta name="DC.Title"
        lang="en"
        content="A simple RDF assertion">
<meta name="DC.Type"
        scheme="DCMIType"
        content="image">
<meta name="DC.Date.created"
        scheme="W3CDTF"
        content="1999-04-27">
<meta name="DC.Coverage.temporal"
        scheme="DCMIPeriod"
        content="start=1999-04-27">
<meta name="AGCRC.Creator"
        scheme="DCMIDCSV"
        content="name.Given=Eric;
                name.Family=Miller;
                Employer=OCLC;
                Address=6565 Frantz Road, Dublin, Ohio, 43017-3395">
<meta name="DC.Creator"
        content="Miller, Paul">
<meta name="DC.Creator"
        content="Brickley, Dan">
<meta name="DC.Format.extent"
        content="4033 bytes">
<meta name="AGCRC.Format.extent"
        scheme="DCMIDCSV"
        content="cols=344; rows=82">
<meta name="DC.Format.media"
        scheme="IMT"
        content="image/gif">
<meta name="DC.Relation.isVersionOf"
        lang="en"
        content="Figure 1 from RDF Model and Syntax">
<meta name="DC.Relation.isVersionOf"
        scheme="URI"
        content="http://www.w3.org/TR/REC-rdf-syntax/fig1.gif">

6. References

[DCMES]
DCMI 1999 Dublin Core Metadata Element Set, Version 1.1: Reference Description http://purl.org/dc/elements/1.1/

[DCMES-HTML]
J. Kunze 1999 Encoding Dublin Core metadata in HTML http://www.ietf.org/rfc/rfc2731.txt

[DCMES-XML]
D. Beckett, E. Miller, D. Brickley, 2000. Using Dublin Core in XML http://dublincore.org/documents/dcmes-xml/

[DCMI]
Dublin Core Metadata Initiative, OCLC, Dublin Ohio. http://dublincore.org/

[HTML]
Dave Raggett, Arnaud Le Hors, Ian Jacobs, 1999, HTML 4.01 Specification http://www.w3.org/TR/html40/

[qDC]
DCMI 2000. Dublin Core Qualifiers. http://dublincore.org/documents/dcmes-qualifiers/

[XHTML]
Steven Pemberton and many others, 2000 XHTML 1.0: The Extensible HyperText Markup Language http://www.w3.org/TR/xhtml1

[XML]
W3C 1998, Extensible Markup Language http://www.w3.org/XML/

[XMLnames]
W3C 1999, Namespaces in XML http://www.w3.org/TR/REC-xml-names