> DCAMDCHTML

You are not allowed to edit this page.

Clear message

Notes on expressing Dublin Core metadata in HTML and XHTML

Introduction

This document discusses the use of the meta and link elements of HTML/XHTML for expressing Dublin Core metadata. More specifically, it focuses on the use of these elements to represent a DC metadata description set, as defined by the "Description Set Model" of the DCMI Abstract Model [DCAM]. In terms of the draft Interoperability levels for Dublin Core metadata [DC-LEVELS], it focuses on "DCAM-based syntactic interoperability" ("Level 3" interoperabilty) , with some reference to "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model.

In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form according to the rules of a format or syntax. The DCMI Abstract Model itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".

Such a specification performs three functions:

The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2007 conference [SYNTAXTUT].

Encoding DC metadata using HTML/XHTML

For encoding DC metadata in an HTML/XHTML document, the constructs of a DC metadata description set are represented in the document header as HTML/XHTML elements and attributes and as element content and attribute values. The conventions used for defining these meta and link elements and their attributes are described in what the HTML specification calls a "meta data profile" [HTML-PROFILE]. This "meta data profile" is identified by a URI and specifically declared in the document header using a 'profile' element, as in:

<head profile="http://dublincore.org/documents/2007/11/05/dc-html/">

The presence of this URI in the profile attribute indicates that the meta data profile should be applied in order to interpret the given HTML or XHTML instance.

DCMI currently defines two such meta data profiles:

Documents encoded using HTML/XHTML can make use of one or more meta data profiles, and it discloses the URIs of the profile(s) used as the value of the profile attribute of the HTML/XHTML head element.

Comparison between the DC-HTML-2003 and DC-HTML-2008 HTML/XHTML meta data profiles

The DC-HTML-2003 profile and the DC-HTML-2008 profile are two different HTML meta data profiles. The DC-HTML-2008 profile is specified in terms of the DCAM description set model and all features of the profile have a well-defined mapping to the constructs of the DCAM description set. The DC-HTML-2003 profile was not defined in terms of the DCAM description set model, which did not exist in today's form. Although a retrospective mapping to the DCAM description set can be constructed, only some features of the profile have a mapping to the constructs of the description set. (For a full explanation of how the DCAM interpretation of the DC-HTML-2003 profile is constructed, see Appendix A.)

The features of the DCAM description set supported by the two meta data profiles are summarised in the following table:

DCAM Description Model feature Supported in DC-HTML-2003 Supported in DC-HTML-2008
description set One description set One description set
description One description One description
described resource URI Document URI/Base URI Document URI/Base URI
statement Multiple statements Multiple statements
property URI Supported Supported
literal value surrogate Partly supported Supported
literal value surrogate / value string Supported Supported
literal value surrogate / value string language Supported Supported
literal value surrogate / SES URI Not supported Supported
non-literal value surrogate Partly supported Partly supported
non-literal value surrogate / value string Not supported Max one value string supported
non-literal value surrogate / value string language Not supported Supported
non-literal value surrogate / SES URI Not supported Not supported
non-literal value surrogate / value URI Supported Supported
non-literal value surrogate / VES URI Not supported Not supported

In terms of the features of the DCAM description set model supported, the differences between them are:

Note that neither the DC-HTML-2003 profile nor the DC-HTML-2008 profile supports the encoding of vocabulary encoding scheme URIs.

There are also differences in the syntactic features themselves:

In any HTML/XHTML document, the value of the profile attribute of the head element specifies which meta data profiles are used in that document. A document with a profile value of http://dublincore.org/documents/dcq-html/ is intended to be interpreted using the DC-HTML-2003 profile; and a document with a profile value of http://dublincore.org/documents/2008/mm/dd/dc-html/ is intended to be interpreted using the DC-HTML-2008 profile. Note that the presence of the URI of a profile licenses the interpretation of the document in accordance with the rules of that profile.

If both DCMI profile URIs are present, then a processor may apply both interpretations. However, metadata providers should use this combination with caution. It is important to note that some of the conventions used in the DC-HTML-2003 profile will generate quite different sets of statements when interpreted using the DC-HTML-2008 profile. This is the case for "composite prefixed names", for example. Consider the following example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head profile="xxx yyy">
<title>My Document</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
<meta name="DC.date.modified" content="2007-07-22" >
</head>
<body>
</body>
</html>

According to the DC-XHTML-2003 profile, this should be interpreted as encoding a single statement with a property URI http://purl.org/dc/terms/modified ; interpreted acording to the DC-XHTML-2008 profile, it generates a single statement with a property URI http://purl.org/dc/elements/1.1/date.modified. So if the document signals the use of both profiles, or if the value of the profile attribute is simply changed from http://dublincore.org/documents/dcq-html/ to http://dublincore.org/documents/2008/mm/dd/dc-html/ without changing the content of the meta/@name attribute, then unexpected interpretations of the data will result.

If neither DCMI profile URI is present, then no interpretation is licensed by DCMI specifications. An application may apply an interpretation of such a document as a DC description set, either as the result of the use of another profile defined by an agency other than DCMI, or as the result of some other agreement between provider and consumer.

The choice of profile depends on the requirements of the application: as the table above indicates, the DC-HTML-2008 profile supports some features of the DCAM description set model which are not supported by the DC-HTML-2003 profile (syntax encoding scheme URIs for literal value surrogates and value strings for non-literal value surrogates). It also simplifies the mechnism for encoding property URIs. The use of the profile attribute ensures that there is no question of ambiguity or confusion over how the provider of any document intends that it should be processed.

Recommendations

A provider of DC description sets encoded in the header of an HTML/XHTML document:

A consumer of DC description sets encoded in the header of an HTML/XHTML document :

Appendix A: DC-HTML-2003 and the DCAM

Expressing Dublin Core in HTML/XHTML `meta` and `link` elements (2003)

The DCMI Recommendation, Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003] pre-dates the development of the DCAM, so it does not perform the functions described in the introduction to this document: it does not describe either how components of (a subset of) the DCAM description set model are to be "encoded", or how features of the format are to be interpreted as representing a DC metadata description set.

However, DC-HTML-2003 does broadly follow the general approach described above, of making a distinction between an information structure (which it calls a "DC record") and the way that record is represented. Essentially, it defines its own "description model", based on the concept of the "DC record", and describes how instances of that information structure are to be represented in HTML/XHTML documents. The DC-HTML-2003 concept of the "DC record" is not based on the DCAM description set model, and indeed it uses some of the same terminology used in the DCAM, but with different meanings.

So any attempt to provide an interpretation of the DC-HTML-2003 recommendation in terms of the DCAM description is - must be - a retrospective exercise. It depends on a two stage process:

If the first step reveals that some components of a "DC record" can not be mapped to components of the DCAM description set, then there will be aspects of the syntax which, while they do have an interpretation as representing components of a "DC record", do not have an interpretation as representing components of the DCAM description set. And similarly, the first step may show that there are constructs and components of the DCAM description set which have no correspondence in the "DC record", in which case there will be no syntactic representation of those constructs and components in the current (DC-HTML-2003) meta data profile.

Mapping the "DC record" to the ''description set''

Two approaches might be taken to constructing such a mapping

The first thing to note is that, unfortunately, the concept of the "DC record" in the DC-HTML-2003 document is highly underspecified. The introduction refers to a "record" as

    some structured metadata about a resource, comprising one or more properties and their associated values.

In the context of DC-HTML-2003, the term "value" is used to refer to a literal. However the document goes on to discuss concepts such as "element", "element refinement", "encoding scheme" and "language", and how instances of these concepts should be represented using the HTML/XHTML profile without ever explaining how the relationship of these concepts to that of the "record". For the purpose of this discussion, we assume that (using these terms as they are used in DC-HTML-2003, not as they are used in the DCAM):

Such an interpretation seems consistent with the use of those terms in the DCMI Recommendation Guidelines for implementing Dublin Core in XML [DC-XML-2003], which provides more explicit "abstract models" for the data being represented.

The "informal" approach

The following table is an attempt to specify a mapping between the "DC record" described by DC-HTML-2003 and the description set described by the DCAM, such that the assertions made by the description set correspond to - or at least do not contradict - the assertions made by the "DC record".

DC-HTML-2003 DCAM
"DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string or non-literal value surrogate/value URI
"Language" value string language

There are several points worth noting:

Using this mapping in conjunction with the DC-HTML-2003 profile, the following DCAM interpretation for DC-HTML-2003 might be inferred.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


dc-extract.xsl

Dan Connolly of W3C produced an XSLT stylesheet which generates an RDF/XML representation of the encoded metadata from an XHTML document using the DC-HTML-2003 profile. In terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


Embedded RDF

Embedded RDF [ERDF], designed by Ian Davis (Talis), is a set of conventions for embeddimg RDF triples into HTML/XHTML. There is no formal association between Embedded RDF and the DC-HTML-2003 profile, but the documentation for Embedded RDF notes that it was designed to be compatible with the DC-HTML-2003 profile, so an Embedded RDF interpretation can be made for an instance of the DC-HTML-2003 profile. Again, in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions, which are a subset of those used by dc-extract.xsl:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


A DCAM interpretation of DC-HTML-2003

The following is a "conservative" DCAM interpretation of the DC-HTML-2003 profile which is supported by all three of the approaches above. Note that this interpretation does not provide a mapping for the scheme attribute.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


Appendix B: DC-HTML-2008 and the DCAM

In contrast to the case of DC-HTML-2003, the Proposed DCMI Recommendation, Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] is designed to support the encoding of a DC description set and the document describes explicitly a mapping between a subset of the features of the DCAM description set model and the X/HTML meta and link elements

An X/HTML document using the DC-HTML-2008 profile encodes a description set containing

References

[DCAM]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/

[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/

[DC-HTML-2003]
Expressing Dublin Core in HTML/XHTML meta and link elements DCMI Recommendation. 2003-11-30
http://dublincore.org/documents/2003/11/30/dcq-html/

[DC-HTML-2008]
Expressing Dublin Core using HTML/XHTML meta and link elements DCMI Proposed Recommendation. 2007-11-05
http://dublincore.org/documents/2008/mm/dd/dc-html/

[DC-EXTRACT]
Dublin Core Extraction Service
http://www.w3.org/2000/06/dc-extract/form.html

[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels

[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[ERDF]
Embedded RDF
http://purl.org/NET/erdf/profile

[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Recommendation 11 September 2007
http://www.w3.org/TR/2007/REC-grddl-20070911/

[HTML-PROFILE]
Meta data profiles in HTML 4.01 Specification W3C Recommendation 24 December 1999.
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3

[RFC3986]
Uniform Resource Identifier (URI): Generic Syntax.
http://www.ietf.org/rfc/rfc3986.txt

[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2007, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf