> Test

Notes on expressing Dublin Core metadata in HTML and XHTML

Introduction

This document discusses the use of the meta and link elements of HTML/XHTML for expressing Dublin Core metadata. More specifically, its primary focus is on the use of these elements to represent a DC metadata description set, as defined by the "Description Set Model" of the DCMI Abstract Model [DCAM], i.e. in the terms of the document Interoperability levels for Dublin Core metadata [DC-LEVELS], it focuses on "DCAM-based syntactic interoperability" ("Level 3" interoperabilty) , with some reference to "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model.

In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form, according to the rules of a format or syntax. The DCAM itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".

Such a specification performs three functions:

The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2008 conference [SYNTAXTUT].

Encoding DC metadata using HTML/XHTML

For the case of encoding DC metadata in the header of an HTML/XHTML document, the constructs of the DC metadata description set have to be represented as components in that HTML/XHTML document header, i.e. as HTML/XHTML elements and attributes and as element content and attribute values. This involves the definition of what the HTML specification calls a "meta data profile", which describes conventions used in meta and link elements and their attributes [HTML-PROFILE].

Each "meta data profile" is identified by a URI. DCMI currently defines two such meta data profiles:

Each HTML/XHTML document can make use of one or more meta data profiles, and it discloses the URIs of those profiles as the value of the profile attribute of the HTML/XHTML head element.

Comparison between the DC-HTML-2003 and DC-HTML-2008 HTML/XHTML meta data profiles

The DC-HTML-2003 profile and the DC-HTML-2008 profile are two different HTML meta data profiles. The DC-HTML-2008 profile is specified in terms of the DCAM description set model and all features of the profile have a well-defined mapping to the constructs of the DCAM description set. The DC-HTML-2003 profile was not defined in terms of the DCAM description set model and although a retrospective mapping to the DCAM description set can be constructed, only some features of the profile have a mapping to the constructs of the description set. (For a full explanation of how the DCAM interpretation of the DC-HTML-2003 profile is constructed, see Appendix A)

The features of the DCAM description set supported by the two meta data profiles are summarised in the following table:

DCAM Description Model DC-HTML-2003 DC-HTML-2008
description set One description set One description set
description One description One description
described resource URI Document URI/Base URI Document URI/Base URI
statement Multiple statements Multiple statements
property URI Supported Supported
literal value surrogate Partly supported Supported
literal value surrogate / value string Supported Supported
literal value surrogate / value string language Supported Supported
literal value surrogate / SES URI Not supported Supported
non-literal value surrogate Partly supported Partly supported
non-literal value surrogate / value string Not supported Max one value string supported
non-literal value surrogate / value string language Not supported Supported
non-literal value surrogate / SES URI Not supported Not supported
non-literal value surrogate / value URI Supported Supported
non-literal value surrogate / VES URI Not supported Not supported

In terms of the features of the DCAM description set model supported, the differences between them are:

There are also differences in the syntactic features themselves:

Note that neither the DC-HTML-2003 profile nor the DC-HTML-2008 profile supports the encoding of vocabulary encoding scheme URIs.

In any HTML/XHTML instance, the value of the profile attribute of the head element specifies which meta data profiles are used in that instance. An instance with a profile value of http://dublincore.org/documents/dcq-html/ is intended to be interpreted using the DC-HTML-2003 profile; and an instance with a profile value of http://dublincore.org/documents/2008/mm/dd/dc-html/ is intended to be interpreted using the DC-HTML-2008 profile.

If both DCMI profile URIs are present, then a processor may apply both interpretations. However, metadata providers should use this combination with caution. It is important to note that some of the conventions used in the DC-HTML-2003 profile will generate quite different sets of statements when interpreted using the DC-HTML-2008 profile. This is the case for "composite prefixed names", for example. Consider the following example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head profile="xxx yyy">
<title>My Document</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
<meta name="DC.date.modified" content="2007-07-22" >
</head>
<body>
</body>
</html>

According to the DC-XHTML-2003 profile, this should be interpreted as encoding a single statement with a property URI http://purl.org/dc/terms/modified ; interpreted acording to the DC-XHTML-2008 profile, it generates a single statement with a property URI http://purl.org/dc/elements/1.1/date.modified. So if the document signals the use of both profiles, or if the value of the profile attribute is simply changed from http://dublincore.org/documents/dcq-html/ to http://dublincore.org/documents/2008/mm/dd/dc-html/ without changing the content of the meta/@name attribute, then unexpected interpretations of the data will result.

If neither DCMI profile URI is present, then no interpretation is licensed by DCMI specifications. An application may apply an interpretation of such a document as a DC description set, either as the result of the use of another profile defined by an agency other than DCMI, or as the result of some other agreement between provider and consumer.

The use of the profile attribute ensures that there is no question of ambiguity or confusion over how the provider of any single instance intends that it should be processed.

Recommendations

A provider of DC metadata encoded in the header of an HTML/XHTML document:

A consumer of DC metadata encoded in the header of an HTML/XHTML document :

Appendix A: DC-HTML-2003 and the DCAM

Expressing Dublin Core in HTML/XHTML `meta` and `link` elements (2003)

The DCMI Recommendation, Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003] pre-dates the development of the DCAM, so it does not perform the functions described in the introduction to this document: it does not describe either how components of (a subset of) the DCAM description set model are to be "encoded", or how features of the format are to be interpreted as representing a DC metadata description set.

However, DC-HTML-2003 does broadly follow the general approach described above, of making a distinction between an information structure (which it calls a "DC record") and the way that record is represented. Essentially, it defines its own "description model", based on the concept of the "DC record", and describes how instances of that information structure are to be represented in HTML/XHTML documents. The DC-HTML-2003 concept of the "DC record" is not based on the DCAM description set model, and indeed it uses some of the same terminology used in the DCAM, but with different meanings.

So any attempt to provide an interpretation of the DC-HTML-2003 recommendation in terms of the DCAM description is - must be - a retrospective exercise. It depends on a two stage process:

If the first step reveals that some components of a "DC record" can not be mapped to components of the DCAM description set, then there will be aspects of the syntax which, while they do have an interpretation as representing components of a "DC record", do not have an interpretation as representing components of the DCAM description set. And similarly, the first step may show that there are constructs and components of the DCAM description set which have no correspondence in the "DC record", in which case there will be no syntactic representation of those constructs and components in the current (DC-HTML-2003) meta data profile.

Mapping the "DC record" to the ''description set''

Two approaches might be taken to constructing such a mapping

The first thing to note is that, unfortunately, the concept of the "DC record" in the DC-HTML-2003 document is highly underspecified. The introduction refers to a "record" as

    some structured metadata about a resource, comprising one or more properties and their associated values.

In the context of DC-HTML-2003, the term "value" is used to refer to a literal. However the document goes on to discuss concepts such as "element", "element refinement", "encoding scheme" and "language", and how instances of these concepts should be represented using the HTML/XHTML profile without ever explaining how the relationship of these concepts to that of the "record". For the purpose of this discussion, we assume that (using these terms as they are used in DC-HTML-2003, not as they are used in the DCAM):

Such an interpretation seems consistent with the use of those terms in the DCMI Recommendation Guidelines for implementing Dublin Core in XML [DC-XML-2003], which provides more explicit "abstract models" for the data being represented.

The "informal" approach

The following table is an attempt to specify a mapping between the "DC record" described by DC-HTML-2003 and the description set described by the DCAM, such that the assertions made by the description set correspond to - or at least do not contradict - the assertions made by the "DC record".

DC-HTML-2003 DCAM
"DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string or non-literal value surrogate/value URI
"Language" value string language

There are several points worth noting:

Using this mapping in conjunction with the DC-HTML-2003 profile, the following DCAM interpretation for DC-HTML-2003 might be inferred.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


dc-extract.xsl

Dan Connolly of the W3C produced an XSLT stylesheet which generates an RDF/XML representation of the encoded metadata from an XHTML document using the DC-HTML-2003 profile i.e. in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


Embedded RDF

Embedded RDF [ERDF], designed by Ian Davis (Talis), is a set of conventions for embeddimg RDF triples into HTML/XHTML. There is no formal association between Embedded RDF and the DC-HTML-2003 profile, but the documentation for Embedded RDF notes that it was designed to be compatible with the DC-HTML-2003 profile, so an Embedded RDF interpretation can be made for an instance of the DC-HTML-2003 profile. Again, in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions, which are a subset of those used by dc-extract.xsl:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


A DCAM interpretation of DC-HTML-2003

The following is a "conservative" DCAM interpretation of the DC-HTML-2003 profile which is supported by all three of the approaches above:


An X/HTML document using the DC-HTML-2003 profile encodes a description set containing


Appendix B: DC-HTML-2008 and the DCAM

In contrast to the case of DC-HTML-2003, the Proposed DCMI Recommendation, Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] is designed to support the encoding of a DC description set and the document describes explicitly a mapping between a subset of the features of the DCAM description set model and the X/HTML meta and link elements

An X/HTML document using the DC-HTML-2008 profile encodes a description set containing

References

[DCAM]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/

[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/

[DC-HTML-2003]
Expressing Dublin Core in HTML/XHTML meta and link elements DCMI Recommendation. 2003-11-30
http://dublincore.org/documents/2003/11/30/dcq-html/

[DC-HTML-2008]
Expressing Dublin Core using HTML/XHTML meta and link elements DCMI Proposed Recommendation. 2007-11-05
http://dublincore.org/documents/2008/mm/dd/dc-html/

[DC-EXTRACT]
Dublin Core Extraction Service
http://www.w3.org/2000/06/dc-extract/form.html

[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels

[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[ERDF]
Embedded RDF
http://purl.org/NET/erdf/profile

[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Recommendation 11 September 2007
http://www.w3.org/TR/2007/REC-grddl-20070911/

[HTML-PROFILE]
:"Meta data profiles" in HTML 4.01 Specification W3C Recommendation 24 December 1999.
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3

[RFC3986]
Uniform Resource Identifier (URI): Generic Syntax.
http://www.ietf.org/rfc/rfc3986.txt

[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2008, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf