http://purl.org/dc 
  Dublin Core Metadata Initiative
 
 
Home Search Site Map What's New Feedback
 
  Documents / Notes / Recording qualified Dublin Core metadata in HTML  
 

 

Title:

Recording qualified Dublin Core metadata in HTML

Creator:
Simon Cox, CSIRO
Date Issued:
1999-08-16
Identifier:
Replaces:
Not Applicable
Is Replaced By:
Not Applicable
Latest version:
Not Applicable
Status of document:

This document is a NOTE made available by the Dublin Core Metadata Inititive Directorate for discussion only. The publication of a NOTE by the Dublin Core implies no endorsement of any kind.

Description of document: We describe a notation for recording qualified Dublin Core metadata in HTML meta elements. The syntax includes recommended usage of the standard HTML syntax to record the different classes of qualification needed to represent the model.
Document
metadata:

  1. Introduction
  2. Qualified Dublin Core model
  3. HTML notation
  4. Recommendation
  5. Examples
  6. Acknowledgments
  7. References

1. Introduction

The 15 elements from Dublin Core version 1 [DCRFC] allow much descriptive information about resources to be recorded. The most widespread method for transporting Dublin Core metadata is probably by embedding it in <meta > elements in the <head> of HTML documents [DCHTML]. For example, this document might be described in part with the metadata

<html>      
   <head>  
         <title>qDC in HTML</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.0/">
<meta name="DC.Title" content="Recording qualified Dublin Core metadata in HTML">
<meta name="DC.Description" content="We describe a notation ... ">
<meta name="DC.Creator" content="Simon Cox">
<meta name="DC.Contributor" content="Renato Ianella">
<meta name="DC.Contributor" content="Kim Covil">
<meta name="DC.Publisher" content="DCMI">
<meta name="DC.Subject" content="metadata, Dublin Core, HTML">
<meta name="DC.Relation" content="http://purl.org/dc/">
<meta name="DC.Relation" content="http://www.w3.org/TR/REC-html40/">
<meta name="DC.Language" content="en-AU">
<meta name="DC.Format" content="text/html">
<meta name="DC.Date" content="1999-08-09"> [...] </head>
<body> <h1>Recording qualified Dublin Core metadata in HTML</h1> [...] </body> </html>

Each <meta > element effectively captures a statement that the present resource (i.e. the page containing this metadata) has a metadata value (labelled "content") for a property indicated by the metadata element (labelled "name").

However it is often desirable to refine the meanings of the elements in particular instances. For example, it may be desirable to restrict the semantics of the property or relationship between the resource and the metadata value, for example by specifying the exact type of contribution that a contributor made to the resource. It may be useful to indicate that the value has been selected from a particular controlled vocabulary, such as a list of keywords, or is encoded using a particular convention - the format for dates is an important case - or in a particular natural language. The metadata value may itself be usefully represented as a compound object, such as addresses where components like street, locality and post-code, can be clearly recorded separately. These refinements are encompassed in an extended datamodel for qualified Dublin Core metadata (qDC) [DC-datamodel]. A simplified guide to qDC, including many diagrams and examples, is also available [DCmodel-guide].

There are a variety of ways of recording qualified DC metadata. The canonical form used by the datamodel working group uses the XML expression of the RDF datamodel [qDC-RDF]. However, given the popularity of HTML and the widespread availability of tools for preparing and processing it, including software to harvest and use metadata transported in this way, it is desirable to have a way to record qDC within this document format.

HTML is syntactically limited in comparison to XML [XML]. Nevertheless, suitable conventions regarding the content of attributes of <meta > elements permit the recording of most aspects of qualified DC. In this note we describe the methods provided directly by HTML-4 [HTML4], and propose some conventions which extend the built-in syntax to encompass most of the requirements of the qDC model. These conventions concern a notation using certain punctuation characters in both the NAME and CONTENT attributes.

2. Qualified Dublin Core model

In the Dublin Core datamodel [DC-datamodel] qualification may occur in several ways.

A. Element Qualifiers
These modify the property by using additional terms to refine the element.
Examples:

  • Contributor might also have a "role" to indicate the nature of the contribution (illustrator, editor, collator, etc)
  • Relation will usefully have a "type" to indicate the nature of the relationship between the resources (isBasedOn, isFormatOf, hasPart, etc)

B. Value Qualifiers
These indicate how the value is to be interpreted, by referring to

  1. an authority for the terms chosen as values of DC metadata elements. This is normally through a controlled list of valid terms.
    Examples:
    • MIME types [MIME] for Format
    • LCSH [LCSH] for Subject
    • RFC 1766 [RFC1766] for Language
  2. a notation or language (ie an encoding syntax) used for the value.
    Examples:
    • ISO8601 [ISO8601] for Date
    • a term selected from RFC 1766 [RFC1766] as the language for any of the element-values recorded in plain-text, such as Title, Description
    • URIs [URI] or ISBNs [ISBN] for resource identifiers used in Identifer, Relation
    • vCard [vCard] for information about people or organisations for used for values of Creator, Contributor, Publisher

C. Value Components
A metadata value may itself have structure, typically through components which are labelled either explicitly or implicitly (e.g. according to position within the string).
Examples:

  • dates and times have components corresponding to year, month, day, hour, minute, second
  • the dimensions of an image or 3D object contain measurements on multiple axes (e.g. height, breadth, depth)
  • information about people and organisations is commonly split into components, such as the name of the agent, and their contact address broken up into street, locality, region, postcode, country, telephone, email etc.


3. HTML notation

HTML <meta > elements allow a simple list of metadata to be recorded, which accomodate the basic version of Dublin Core. All data must be contained in text strings within the values of the attributes of these elements.

The DC element-name and value are recorded in the NAME and CONTENT attributes [DCHTML]. In order to have the text and labels that we use for the different classes of qualification for DC [above] clearly distinguished within the metadata statements, we must find three additional positions in the notation.

A. Element Qualifiers
Element Qualifiers are not supported directly in HTML <meta > elements.

To accomodate Element Qualifiers, dots (.) are used to append qualifiers to DC element names, which are stored in the NAME attribute text string. Multiple qualifiers may be appended separated by dots to create a hierachical qualification scheme. This follows much existing practice [DCHTML].

B. Value Qualifiers
Value Qualifiers are supported directly in HTML-4, using two additional attributes of the <meta > element - SCHEME and LANG.

LANG is used in cases where the value is in plain-text, and SCHEME otherwise. Where a SCHEME or LANG is specified, then the value must be encoded in the CONTENT according to that scheme, including the use of any structure or punctuation.

C. Value Components
Values encoded according to many schemes have a semantic structure. This is often indicated using punctuation within a text-string, which can therefore be used directly in the value recorded in the CONTENT of the HTML <meta >.

For example, a colon terminates the protocol label, and slashes, question-marks, ampersands and hashes are used to separate other fields in identifiers coded using the URI scheme [URI]. A colon separates each value from its label, and semi-colons and commas separate components within each of these, in descriptions of parties given in a common text notation for the vCard scheme [vCard]. Hyphens separate the components of a date according with the profile of the ISO8601 scheme commonly used for DC [ISO8601]. These structuring devices are provided explicitly within the selected schemes.

To allow the recording of Value Components generically, we recommend the use of Dublin Core Structured Values (DCSV) syntax [DCSV]. DCSV provides a self-describing value-structuring method, to be used when no other suitable scheme is available. It uses punctuation characters as follows:

  • colons (:) separate plain-text labels of structured value-components from the values themselves
  • semi-colons (;) separate (optionally labelled) values within a list
  • dots (.) indicate hierachical structure in value-component labels, if required.

Where no SCHEME is specified, then the DC Element value, recorded in the CONTENT attribute, has no parsing rules required, and thus no structure implied.

3.1 qDC-HTML

The complete syntax for expressing qualified DC elements in HTML-4 may be summarised:

<META NAME="DC.Element" CONTENT="Unqualified value">
<META NAME="DC.Element.EQ" SCHEME="schemeA" CONTENT="Value coded according to schemeA">
<META NAME="DC.Element.EQ" SCHEME="listB" CONTENT="Value selected from listB">
<META NAME="DC.Element.EQ" LANG="langC" CONTENT="Value expressed in language langC">
<META NAME="DC.Element.EQ" SCHEME="DCSV" CONTENT="u1; u2; u3">
<META NAME="DC.Element.EQ" SCHEME="DCSV" CONTENT="cA:v1">
<META NAME="DC.Element.EQ" SCHEME="DCSV" CONTENT="cA:v1; cB.part1:v2; cB.part2:v3">

where the codewords are:

  • Element is one of the 15 DC Elements,
  • EQ represents one of the Element Qualifiers,
  • schemeA is a coding scheme,
  • listB is a controlled vocabulary,
  • langC is a language-code,
  • u1, u2 and u3 are an unlabelled list of values,
  • cA and cB are the labels of Structured Value components,
  • part1 and part2 are sub-components of cB,
  • v1, v2 and v3 are values of the components.

In actual instances of DC metadata each of these codewords is replaced by tokens or strings drawn from official DC vocabularies or elsewhere (see examples below).

3.2 Features of the HTML notation

An important feature of the HTML syntax for qualified DC is that the association of qualifiers with the thing that they are qualifying is preserved. There are three positions for qualifiers corresponding to the three classes of qualification introduced above, with element and value qualifiers clearly attached to the element-name or value-string as appropriate.

The dot-syntax for element qualifiers is compatible with the sense that refining the element usually consists of selecting one type or flavour of the element from a list of alternatives. Qualified elements do not normally need grouping.

Value components are normally aspects of a value and are complete when considered as a set rather than being mutually exclusive. Using the DCSV semi-colon list separator, these may be grouped by recording them as a list, in order to associate them with the same value. The label tokens in both NAME and CONTENT may be structured to any depth in a simple hierarchical scheme using the dot-syntax.

An apparent disadvantage of embedding structured values in HTML, for example using DCSV [DCSV], is that parsing of the element value is now implied. However, the string syntaxes should not inconvenience any existing software since (a) the encoding is valid for HTML, (b) parsing of the value is generally not essential for resource discovery, since the IR/string-matching methods used in most search operations would still find the target text-strings from within the extended values. Existing systems can harvest the complete colon- and semi-colon-syntax structured value from DCSV into an index and the resource could still be located using reasonable queries.

3.3 Limitations and Suggestions

We have not discussed how to identify the reference definition of each Value Qualifier indicated in the SCHEME attribute. It is possible that a method using the <link rel= ...> may be developed. In this document we have focussed on the representation of the qualified DC datamodel and we leave it to users, or to future work, to complete the details needed particularly to permit automatic processing of qualified metadata. Note, however, that many "schemes" have several variant notations. For example, each numeric code in the Dewey subject classification also corresponds to a standard text string as well. Indicating which variant of a scheme is to be used may require the development of profiles similar to the one developed for ISO8601.

A significant limitation of HTML is that there is no explicit, recursive, grouping mechanism for <meta > elements. This means that general recording of fully structured metadata in HTML <meta > elements is not possible.

Nevertheless, there are two methods for listing repeated values for DC metadata elements:

  1. repeating the entire <META NAME="DC.Element" ... > element for the particular element with different values
  2. putting the repeated values in a list in a single <meta > element, with items separated by the DCSV ";" list separator.

These two different grouping methods might be used by metadata providers to indicate structure distinguishing between values that need to be grouped (e.g. information about a single contributor) and values that are distinct (e.g. identifiers for different contributors).

Some implementors have used numeric suffices (DC.Creator.1, DC.Creator.2 etc) and similar techniques in order to overcome the grouping limitations. In general, such methods will be very implementation specific and only useful in a local context. We do not attempt to propose a regularisation of these techniques here.

Attempting to record complex knowledge according to the fully qualified and structured DC datamodel [DC-datamodel] using HTML is not advisable. HTML has incomplete expressive ability in comparison with the complete qualified Dublin Core model, which is why new notations using RDF and XML are being developed. Note that fragments of XML can currently be embedded in HTML-4 [HTML4] documents, by using a restricted syntax in order to hide content from interpretation by current HTML clients or browsers [RDF-in-HTML]. However, there are almost no current applications which can parse and use metadata embedded in this way. Future versions of HTML are expected to overcome these limitations by allowing general XML documents to be included [XHTML].

4. Recommendation


The HTML syntax shown here contains many of the components required to represent the qualified DC model, while remaining fully compatible with the HTML-4 [HTML4] standard. It offers a recording method compatible with HTML tools such as browsers and metadata harvesters. While tools to make full use of the qualified information may not be widely available yet, metadata providers may use the qDC-HTML syntax to record rich information in the interim. Since the requirements of the semantic model for qualified DC are largely captured by the notation and usage described here, users may be confident that tools can be built to migrate the metadata into other notations preserving full semantics, so their investment in capturing rich information at this stage will be worthwhile.

This extension to DCHTML [DCHTML] adds no additional top-level elements or prefixes to those defined for use as values of the HTML <meta > element's NAME in the DCHTML document. Rather, it merely offers ways in which these might be further refined and enriched, consistent with both HTML-4 syntax and the qualified DC model, to enhance the description and discovery of resources. Resource descriptions utilising the qualification mechanisms discussed here should therefore be declared with reference to the same DC schema definition as basic Dublin Core descriptions, currently http://purl.org/dc/elements/1.0/ . A reference document for this extension to the DC schema dealing with the qualified DC model (perhaps this document) should be made available, and linked from the canonical schema. This will allow the current standard schema link in the HTML <head>, <link rel="schema.DC" href="http://purl.org/dc/elements/1.0/">, to also define documents using qDC-HTML.

5. Examples

The following examples are snippets from within the <HEAD> element of HTML-4 documents. In complete documents, these should be preceded by the schema declaration <link rel="schema.DC" href="http://purl.org/dc/elements/1.0/">

Diagrams of the models corresponding to many of the examples, together with alternative encodings using XML and XML-RDF are also available [DCmodel-guide].

<META NAME="DC.Creator" SCHEME="DCSV" CONTENT="name.given:Simon; name.family:Cox; employer:CSIRO; height:177 cm">
<META NAME="DC.Language" SCHEME="RFC1766" CONTENT="en-AU">
<META NAME="DC.Contributor.illustrator" SCHEME="vCard" CONTENT="fn:Simon Cox; org:CSIRO">

<META NAME="DC.Date.created" SCHEME="ISO8601" CONTENT="1999-04-21">
<META NAME="DC.Date.revised" SCHEME="ISO8601" CONTENT="1999-04-28">

<META NAME="DC.Relation.isBasedOn" SCHEME="URL" CONTENT="http://www.foo.bar/explication.html">
<META NAME="DC.Relation.isFormatOf" SCHEME="URL" CONTENT="http://www.foo.bar/explanation.doc">
<META NAME="DC.Relation.hasFormat" SCHEME="URL" CONTENT="http://www.foo.bar/explanation.pdf">

<META NAME="DC.Format.media" SCHEME="MIME" CONTENT="text/html">
<META NAME="DC.Format.size" CONTENT="27 kB">

<META NAME="DC.Format.media" SCHEME="MIME" CONTENT="image/gif">
<META NAME="DC.Format.size" CONTENT="14 kB">
<META NAME="DC.Format.size" SCHEME="DCSV" CONTENT="rows:200; cols:450">

An extended example using all the qualifying and structuring components discussed here is:

<META   NAME="DC.Identifier"
SCHEME="URL"
CONTENT="http://www.agcrc.csiro.au/projects/3018CO/metadata/agls/metadata_model.gif">
<META NAME="DC.Title"
LANG="en"
CONTENT="Diagram of data model for AGLS">
<META NAME="DC.Date.Created"
SCHEME="ISO8601"
CONTENT="1999-03-12">
<META NAME="DC.Creator"
SCHEME="DCSV"
CONTENT="Name.Given:Simon;
Name.Family:Cox;
Employer:CSIRO Exploration and Mining;
Contact:39 Fairway, Nedlands, W.A.">
<META NAME="DC.Contributor.reviewer"
SCHEME="vCard"
CONTENT="fn:Renato Ianella; org:DSTC; email:renato@dstc.edu.au">
<META NAME="DC.Format.size"
SCHEME="DCSV"
CONTENT="cols:600; rows:350">
<META NAME="DC.Format.media"
SCHEME="MIME"
CONTENT="image/gif">
<META NAME="DC.Relation.isBasedOn"
LANG="en"
CONTENT="Figure 1 from AGLS manual version 1.0">

 


6. Acknowledgments

Renato Ianella made the suggestion for the "colon-syntax" for structured values, and John Kunze suggested moving that into a separate spec.


7. References

[DC-datamodel]
Dublin Core Data Model Working Group mail archive http://www.mailbase.ac.uk/lists/dc-datamodel/archive.html
[DCHTML]
J. Kunze 1999 Encoding Dublin Core Metadata in HTML http://www.ietf.org/rfc/rfc2731.txt
[DCMI]
Dublin Core Metadata Initiative, OCLC, Dublin Ohio. http://purl.org/dc/
[DCmodel-guide]
S. Cox, 1999. A Guide to the Dublin Core datamodel with some notations for recording Dublin Core metadata http://www.agcrc.csiro.au/projects/3018CO/metadata/dc-guide/.
[DCRFC]
S. Weibel, J. Kunze, C. Lagoze, M. Wolf 1998. Dublin Core Metadata for Resource Discovery. RFC2413 http://info.internet.isi.edu/in-notes/rfc/files/rfc2413.txt
[DCSV]
S. Cox, R. Iannella, 1999. A syntax for writing a list of labelled values in a text string http://www.agcrc.csiro.au/projects/3018CO/metadata/dcsv/
http://purl.org/dc/documents/notes-cox-19990430.htm
[HTML4]
Dave Raggett, Arnaud Le Hors, Ian Jacobs, 1998, HTML 4.0 Specification http://www.w3.org/TR/REC-html40/
[ISBN]
International Standard Book Number for example, see http://www.nlc-bnc.ca/isbn/e-isbn.htm
[ISO8601]
M. Wolf and C. Wicksteed, 1997, Date and Time Formats, http://www.w3.org/TR/NOTE-datetime
[LCSH]
Library of Congress (USA) Subject Headings For information, follow links from http://lcweb.loc.gov/catdir/
[MIME]
List of registered content types (MIME types) ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types
as required by N. Borenstein, N. Freed, 1993 MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies RFC1521 http://info.internet.isi.edu/in-notes/rfc/files/rfc1521.txt
[qDC-RDF]
E. Miller, P. Miller, D. Brickley, 1999. Guidance on expressing the Dublin Core within the Resource Description Framework (RDF) http://www.ukoln.ac.uk/interop-focus/activities/dc/datamodel/
[RDF-in-HTML]
This uses the most compact form of XML-RDF [RDF-syntax], in which all the data occurs as attribute values. In this form several important capabilities are not available, such as multiple (repeated) values. For an example, see Figure 5 in S.J.D. Cox and K.D. Covil, "A web-based geological information system using metadata", Proc. 3rd IEEE META-DATA Conference, http://computer.org/conferen/proceed/meta/1999/papers/7/cox_covil.html
[RDF-syntax]
Ora Lassila, Ralph Swick, 1999 Resource Description Framework (RDF) Model and Syntax Specification http://www.w3.org/TR/REC-rdf-syntax/
[RFC1766]
H. Alvestrand, 1995 Tags for the Identification of Languages.
See also Codes for the representation of names of languages, ISO 639:1988 http://www.oasis-open.org/cover/iso639a.html.
See also Codes for the representation of names of countries, ISO 3166:1993 http://www.oasis-open.org/cover/country3166.html.
[URI]
T. Berners-Lee, R. Fielding, L Masinter, 1998 Uniform Resource Identifiers (URI): Generic Syntax RFC2396 http://info.internet.isi.edu/in-notes/rfc/files/rfc2396.txt
T. Berners-Lee, L. Masinter, and M. McCahill, 1994 Uniform Resource Locators, RFC1738 http://info.internet.isi.edu/in-notes/rfc/files/rfc1738.txt.
T. Berners-Lee, 1994 Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web, RFC1630 http://info.internet.isi.edu/in-notes/rfc/files/rfc1630.txt.
[vCard]
F. Dawson, T. Howes, 1998 vCard MIME Directory Profile RFC2426 http://info.internet.isi.edu/in-notes/rfc/files/rfc2426.txt
See also F. Dawson, P. Hoffman, 1998 The vCard v3.0 XML DTD http://www.ietf.org/internet-drafts/draft-dawson-vcard-xml-dtd-03.txt
[XHTML]
Steven Pemberton and many others, 1999 XHTML 1.0: The Extensible HyperText Markup Language http://www.w3.org/TR/WD-html-in-xml/
See also Dave Raggett, HyperText Markup Language Activity Statement http://www.w3.org/MarkUp/Activity.html
[XML]
Extensible Markup Language http://www.w3.org/XML/

 

 
 
Home | Search | Site Map | What's New | Feedback | About the Dublin Core | News and Publications | Documents | Questions and Answers | Projects | Tools | Working Groups | Workshop Series
 
     
© 2000 DCMI