Notes on expressing Dublin Core metadata in HTML and XHTML
Introduction
This document discusses the use of the meta and link elements of HTML/XHTML for expressing Dublin Core metadata. More specifically, its primary focus is on the use of these elements to represent a DC metadata description set, as defined by the "Description Set Model" of the DCMI Abstract Model [DCAM], i.e. in the terms of the document Interoperability levels for Dublin Core metadata [DC-LEVELS], it focuses on "DCAM-based syntactic interoperability" ("Level 3" interoperabilty) , with some reference to "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model.
In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form, according to the rules of a format or syntax. The DCAM itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".
Such a specification performs three functions:
-
it defines the subset of the features of the DCAM description set model which the syntax supports
-
it describes how each of the supported constructs and components of the DCAM description set are "encoded" in the concrete format
-
(conversely) it describes how features of the format are to be interpreted or "decoded" as representing constructs and components of the DCAM description set
The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2008 conference [SYNTAXTUT].
Encoding DC metadata using HTML/XHTML
For the case of encoding DC metadata in the header of an HTML/XHTML document, the constructs of the DC metadata description set have to be represented as components in that HTML/XHTML document header, i.e. as HTML/XHTML elements and attributes and as element content and attribute values. This involves the definition of what the HTML specification calls a "meta data profile", which describes conventions used in meta and link elements and their attributes [HTML-PROFILE].
Each "meta data profile" is identified by a URI. DCMI currently defines two such meta data profiles:
-
a profile defined by the document Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003] and identified by the URI http://dublincore.org/documents/dcq-html/, and
-
a profile defined by the document Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] and identified by the URI http://dublincore.org/documents/2008/mm/dd/dc-html/.
Each HTML/XHTML document can make use of one or more meta data profiles, and it discloses the URIs of those profiles as the value of the profile attribute of the HTML/XHTML head element.
Comparison between the DC-HTML-2003 and DC-HTML-2008 HTML/XHTML meta data profiles
The DC-HTML-2003 profile and the DC-HTML-2008 profile are two different HTML meta data profiles. The DC-HTML-2008 profile is specified in terms of the DCAM description set model and all features of the profile have a well-defined mapping to the constructs of the DCAM description set. The DC-HTML-2003 profile was not defined in terms of the DCAM description set model and although a retrospective mapping to the DCAM description set can be constructed, only some features of the profile have a mapping to the constructs of the description set. (For a full explanation of how the DCAM interpretation of the DC-HTML-2003 profile is constructed, see Appendix A)
The features of the DCAM description set supported by the two meta data profiles are summarised in the following table:
| DCAM Description Model | DC-HTML-2003 | DC-HTML-2008 |
| description set | One description set | One description set |
| description | One description | One description |
| described resource URI | Document URI/Base URI | Document URI/Base URI |
| statement | Multiple statements | Multiple statements |
| property URI | Supported | Supported |
| literal value surrogate | Partly supported | Supported |
| literal value surrogate / value string | Supported | Supported |
| literal value surrogate / value string language | Supported | Supported |
| literal value surrogate / SES URI | Not supported | Supported |
| non-literal value surrogate | Partly supported | Partly supported |
| non-literal value surrogate / value string | Not supported | Max one value string supported |
| non-literal value surrogate / value string language | Not supported | Supported |
| non-literal value surrogate / SES URI | Not supported | Not supported |
| non-literal value surrogate / value URI | Supported | Supported |
| non-literal value surrogate / VES URI | Not supported | Not supported |
In terms of the features of the DCAM description set model supported, the differences between them are:
-
The DC-HTML-2008 profile supports syntax encoding schemes for value strings in literal value surrogates; the DC-HTML-2003 profile does not support syntax encoding schemes for value strings in literal value surrogates
-
The DC-HTML-2008 profile supports a single value string in non-literal value surrogates; the DC-HTML-2003 profile does not support value strings in non-literal value surrogates
-
The DC-HTML-2008 profile provides a well-defined mechanism for representing a property URI as a "DC-HTML Prefixed Name"; the DC-HTML-2003 profile provides a similar mechanism, using "composite prefixed names" (e.g. "DC.date.created"), but it is not possible in all cases to make the mapping from such a "composite prefixed name" to a property URI without additional information
There are also differences in the syntactic features themselves:
-
The DC-HTML-2003 profile supports a "composite prefixed name" construct which the DC-HTML-2008 profile does not.
Note that neither the DC-HTML-2003 profile nor the DC-HTML-2008 profile supports the encoding of vocabulary encoding scheme URIs.
In any HTML/XHTML instance, the value of the profile attribute of the head element specifies which meta data profiles are used in that instance. An instance with a profile value of http://dublincore.org/documents/dcq-html/ is intended to be interpreted using the DC-HTML-2003 profile; and an instance with a profile value of http://dublincore.org/documents/2008/mm/dd/dc-html/ is intended to be interpreted using the DC-HTML-2008 profile.
If both DCMI profile URIs are present, then a processor may apply both interpretations. However, metadata providers should use this combination with caution. It is important to note that some of the conventions used in the DC-HTML-2003 profile will generate quite different sets of statements when interpreted using the DC-HTML-2008 profile. This is the case for "composite prefixed names", for example. Consider the following example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head profile="xxx yyy"> <title>My Document</title> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" > <meta name="DC.date.modified" content="2007-07-22" > </head> <body> </body> </html>
According to the DC-XHTML-2003 profile, this should be interpreted as encoding a single statement with a property URI http://purl.org/dc/terms/modified ; interpreted acording to the DC-XHTML-2008 profile, it generates a single statement with a property URI http://purl.org/dc/elements/1.1/date.modified. So if the document signals the use of both profiles, or if the value of the profile attribute is simply changed from http://dublincore.org/documents/dcq-html/ to http://dublincore.org/documents/2008/mm/dd/dc-html/ without changing the content of the meta/@name attribute, then unexpected interpretations of the data will result.
If neither DCMI profile URI is present, then no interpretation is licensed by DCMI specifications. An application may apply an interpretation of such a document as a DC description set, either as the result of the use of another profile defined by an agency other than DCMI, or as the result of some other agreement between provider and consumer.
The use of the profile attribute ensures that there is no question of ambiguity or confusion over how the provider of any single instance intends that it should be processed.
Recommendations
A provider of DC metadata encoded in the header of an HTML/XHTML document:
-
MAY use one or more of the meta data profiles defined by DCMI or MAY use a meta data profile defined by another party.
-
MUST indicate the HTML meta data profile(s) in use by providing a suitable value for the profile attribute.
-
SHOULD ensure that their encoded data is consistent with the semantics defined by the meta data profile(s) they specify. In particular:
-
SHOULD provide "namespace declarations" (using the link/@rel="schema.xxx" convention) for the prefixes in prefixed names used as abbreviations for URIs, both in documents which use the DC-HTML-2003 profile and in documents which use the DC-HTML-2008 profile
-
SHOULD NOT use "composite prefixed names" (like "DC.date.created" as abbreviation for the URI http://purl.org/dcterms/created ) in a document which uses the DC-HTML-2008 profile.
A consumer of DC metadata encoded in the header of an HTML/XHTML document :
-
SHOULD interpret the data in accordance with the meta data profile(s) specified in the value of the profile attribute. For "level 2"/"Semantic Interoperability", for the profiles provided by DCMI, this will be supported by the provision of GRDDL profile transformations [GRDDL].
-
SHOULD NOT apply an interpretation which is not licensed by the meta data profile(s) specified in the value of the profile attribute, unless there is some other agreement on interpretation between provider and consumer. In particular:
-
SHOULD NOT generate URIs from prefixed names if no "namespace declarations" for the prefix has been provided (using the link/@rel="schema.xxx" convention)
Appendix A: DC-HTML-2003 and the DCAM
Expressing Dublin Core in HTML/XHTML `meta` and `link` elements (2003)
The DCMI Recommendation, Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003] pre-dates the development of the DCAM, so it does not perform the functions described in the introduction to this document: it does not describe either how components of (a subset of) the DCAM description set model are to be "encoded", or how features of the format are to be interpreted as representing a DC metadata description set.
However, DC-HTML-2003 does broadly follow the general approach described above, of making a distinction between an information structure (which it calls a "DC record") and the way that record is represented. Essentially, it defines its own "description model", based on the concept of the "DC record", and describes how instances of that information structure are to be represented in HTML/XHTML documents. The DC-HTML-2003 concept of the "DC record" is not based on the DCAM description set model, and indeed it uses some of the same terminology used in the DCAM, but with different meanings.
So any attempt to provide an interpretation of the DC-HTML-2003 recommendation in terms of the DCAM description is - must be - a retrospective exercise. It depends on a two stage process:
-
defining a mapping from the DC-HTML-2003 "DC record" information structure to the DCAM description set information structure, in such a way that "what is said" by a DCAM description set is consistent with "what is said" by a "DC record"
-
based on that mapping between the two information structures, then establishing the mapping between the syntactic constructs used and the components of the DCAM description set
If the first step reveals that some components of a "DC record" can not be mapped to components of the DCAM description set, then there will be aspects of the syntax which, while they do have an interpretation as representing components of a "DC record", do not have an interpretation as representing components of the DCAM description set. And similarly, the first step may show that there are constructs and components of the DCAM description set which have no correspondence in the "DC record", in which case there will be no syntactic representation of those constructs and components in the current (DC-HTML-2003) meta data profile.
Mapping the "DC record" to the ''description set''
Two approaches might be taken to constructing such a mapping
-
an approach based on the interpretation of "what is said" in "informal" human-readable terms by a "DC record" and a DCAM description set;
-
an approach based on examining the "formal" interpretation of "what is said" by a "DC record" represented in HTML/XHTML using the DC-HTML-2003 HTML/XHTML meta data profile in terms of the RDF model and then using the description of "what is said" by a DCAM description set in terms of the RDF model, as defined by the DCMI Recommendation Expressing Dublin Core using the Resource Description Framework [DCRDF], to derive a mapping between the "DC record" and the description set. While DCMI itself did not specify an RDF interpretation of the DC-HTML-2003 HTML/XHTML meta data profile, such interpretations have been provided by two other sources:
-
Dan Connolly (W3C) has provided an XSLT transform [DC-EXTRACT] which takes as input an instance of the DC-HTML-2003 profile and outputs RDF/XML
-
Ian Davis (Talis) defined a separate, more generic set of conventions for embeddimg RDF triples into HTML/XHTML called Embedded RDF [ERDF]. While there is no formal association between Embedded RDF and the DC-HTML-2003 profile, the documentation for Embedded RDF notes that it was designed to be compatible with the DC-HTML-2003 profile, so an Embedded RDF interpretation can be made for an instance of the DC-HTML-2003 profile
The first thing to note is that, unfortunately, the concept of the "DC record" in the DC-HTML-2003 document is highly underspecified. The introduction refers to a "record" as
some structured metadata about a resource, comprising one or more properties and their associated values.
In the context of DC-HTML-2003, the term "value" is used to refer to a literal. However the document goes on to discuss concepts such as "element", "element refinement", "encoding scheme" and "language", and how instances of these concepts should be represented using the HTML/XHTML profile without ever explaining how the relationship of these concepts to that of the "record". For the purpose of this discussion, we assume that (using these terms as they are used in DC-HTML-2003, not as they are used in the DCAM):
-
an element is a property
-
an element refinement is a property
-
a value may be associated with a encoding scheme
-
a value may be associated with a language
Such an interpretation seems consistent with the use of those terms in the DCMI Recommendation Guidelines for implementing Dublin Core in XML [DC-XML-2003], which provides more explicit "abstract models" for the data being represented.
The "informal" approach
The following table is an attempt to specify a mapping between the "DC record" described by DC-HTML-2003 and the description set described by the DCAM, such that the assertions made by the description set correspond to - or at least do not contradict - the assertions made by the "DC record".
| DC-HTML-2003 | DCAM |
| "DC record" | description set containing a single description |
| "Property + Value" | statement |
| "URI of Property" | property URI |
| "Value" | literal value surrogate/value string or non-literal value surrogate/value URI |
| "Language" | value string language |
There are several points worth noting:
-
The mapping of "value" to either value string or value URI is arguably stretching the original definitions, but the intent seems to be to support either a literal or a URI reference to a non-literal resource.
-
No mapping is provided here for what DC-HTML-2003 calls "encoding schemes". From the model as described by DC-HTML-2003 alone, it is impossible to determine whether the target should be a vocabulary encoding scheme or a syntax encoding scheme. From the syntactic permutations available, and the fact that the XHTML scheme attribute is available only on the meta element, it might be reasonable to suggest that the target should be a syntax encoding scheme, not a vocabulary encoding scheme. However, the examples include cases of where the scheme attribute value is a reference to a vocabulary encoding scheme. So a null mapping is chosen, on the basis that it is better to lose some information rather than to risk introducing assertions which were not made in the original data.
-
At the syntactic level, the generation of a property URI may be problematic in some cases. The convention permitted in the DC-HTML-2003 profile of allowing a "composite prefixed name" (e.g. "DC.Date.modified") as the value of the name attribute of the meta element makes it impossible to reliably generate a URI for the property without additional information.
Using this mapping in conjunction with the DC-HTML-2003 profile, the following DCAM interpretation for DC-HTML-2003 might be inferred.
An X/HTML document using the DC-HTML-2003 profile encodes a description set containing
-
a description with described resource URI = URI of document, where
-
meta element maps to statement with literal value surrogate
-
meta/@name maps to property URI
-
meta/@content maps to value string
-
meta/@xml:lang maps to value string language
-
link element maps to one or more statements with non-literal value surrogate
-
each token in link/@rel maps to property URI
-
link/@href maps to value URI
dc-extract.xsl
Dan Connolly of the W3C produced an XSLT stylesheet which generates an RDF/XML representation of the encoded metadata from an XHTML document using the DC-HTML-2003 profile i.e. in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions:
-
meta element maps to RDF triple with literal object
-
meta/@name maps to predicate
-
meta/@content maps to literal object
-
meta/@xml:lang maps to language of plain literal object
-
meta/@scheme maps to datatype of typed literal object
-
link element maps to one or more RDF triples with RDF URI ref object
-
each token in link/@rel maps to predicate
-
link/@href maps to RDF URI ref object
-
link/@hreflang maps to separate triple with dc:language predicate and plain literal object
If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.
An X/HTML document using the DC-HTML-2003 profile encodes a description set containing
-
a description with described resource URI = URI of document, where
-
meta element maps to statement with literal value surrogate
-
meta/@name maps to property URI
-
meta/@content maps to value string
-
meta/@xml:lang maps to value string language
-
meta/@scheme maps to SES URI
-
link element maps to one or more statements with non-literal value surrogate
-
each token in link/@rel maps to property URI
-
link/@href maps to value URI
-
for each link/@hreflang used, a description with described resource URI = link/@href
-
a statement with literal value surrogate
-
property URI = http://purl.org/dc/elements/1.1/language
-
link/@hreflang maps to value string
Embedded RDF
Embedded RDF [ERDF], designed by Ian Davis (Talis), is a set of conventions for embeddimg RDF triples into HTML/XHTML. There is no formal association between Embedded RDF and the DC-HTML-2003 profile, but the documentation for Embedded RDF notes that it was designed to be compatible with the DC-HTML-2003 profile, so an Embedded RDF interpretation can be made for an instance of the DC-HTML-2003 profile. Again, in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions, which are a subset of those used by dc-extract.xsl:
-
meta element maps to RDF triple with literal object
-
meta/@name maps to predicate (Note: Embedded RDF does not support the "composite prefixed name" convention used by DC-HTML-2003)
-
meta/@content maps to literal object
-
meta/@xml:lang maps to language of plain literal object
-
link element maps to one or more RDF triples with RDF URI ref object
-
each token in link/@rel maps to predicate
-
link/@href maps to RDF URI ref object
If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.
An X/HTML document using the DC-HTML-2003 profile encodes a description set containing
-
a description with described resource URI = URI of document, where
-
meta element maps to statement with literal value surrogate
-
meta/@name maps to property URI
-
meta/@content maps to value string
-
meta/@xml:lang maps to value string language
-
link element maps to one or more statements with non-literal value surrogate
-
each token in link/@rel maps to property URI
-
link/@href maps to value URI
A DCAM interpretation of DC-HTML-2003
The following is a "conservative" DCAM interpretation of the DC-HTML-2003 profile which is supported by all three of the approaches above:
An X/HTML document using the DC-HTML-2003 profile encodes a description set containing
-
a description with described resource URI = URI of document, where
-
meta element maps to statement with literal value surrogate
-
meta/@name maps to property URI
-
meta/@content maps to value string
-
meta/@xml:lang maps to value string language
-
link element maps to one or more statements with non-literal value surrogate
-
each token in link/@rel maps to property URI
-
link/@href maps to value URI
Appendix B: DC-HTML-2008 and the DCAM
In contrast to the case of DC-HTML-2003, the Proposed DCMI Recommendation, Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] is designed to support the encoding of a DC description set and the document describes explicitly a mapping between a subset of the features of the DCAM description set model and the X/HTML meta and link elements
An X/HTML document using the DC-HTML-2008 profile encodes a description set containing
-
a description with described resource URI = URI of document, where
-
meta element maps to statement with literal value surrogate
-
meta/@name maps to property URI
-
meta/@content maps to value string
-
meta/@xml:lang maps to value string language
-
meta/@scheme maps to SES URI
-
link element maps to one or more statements with non-literal value surrogate
-
each token in link/@rel maps to property URI
-
link/@href maps to value URI
-
link/@title maps to value string
-
link/@xml:lang maps to value string language
References
[DCAM]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/
[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/
[DC-HTML-2003]
Expressing Dublin Core in HTML/XHTML meta and link elements DCMI Recommendation. 2003-11-30
http://dublincore.org/documents/2003/11/30/dcq-html/
[DC-HTML-2008]
Expressing Dublin Core using HTML/XHTML meta and link elements DCMI Proposed Recommendation. 2007-11-05
http://dublincore.org/documents/2008/mm/dd/dc-html/
[DC-EXTRACT]
Dublin Core Extraction Service
http://www.w3.org/2000/06/dc-extract/form.html
[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels
[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/
[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/
[ERDF]
Embedded RDF
http://purl.org/NET/erdf/profile
[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Recommendation 11 September 2007
http://www.w3.org/TR/2007/REC-grddl-20070911/
[HTML-PROFILE]
:"Meta data profiles" in HTML 4.01 Specification W3C Recommendation 24 December 1999.
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3
[RFC3986]
Uniform Resource Identifier (URI): Generic Syntax.
http://www.ietf.org/rfc/rfc3986.txt
[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2008, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf