Notes on the DC-XML-Full XML Format
| Title: | Notes on the DC-XML-Full XML Format |
| Creator: | Pete Johnston, Eduserv Foundation <pete.johnston@eduserv.org.uk> |
| Date Issued: | 2008-07-23 |
| Identifier: | http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFNotes/2008-07-23 |
| Replaces: | Not applicable |
| Is Replaced By: | Not applicable |
| Latest Version: | http://dublincore.org/architecturewiki/DCXMLFRevision/DCXMLFNotes |
| Description of Document: | This document describes the background to the development of Expressing Dublin Core metadata using XML (DC-XML-Full). |
Introduction
In September 2008, DCMI will circulate the document Expressing Dublin Core using XML (DC-XML-Full) [DC-XML-Full] as a DCMI Proposed Recommendation for public comment. This document describes the background to its development and its relationship to other DCMI specifications.
Background
The DCMI Abstract Model
Since 2003, DCMI has sought to formalise its model for Dublin Core metadata, and this has resulted in the publication of the DCMI Abstract Model [ABSTRACT-MODEL], the second version of which was given the status of DCMI Recommendation in June 2007.
The Abstract Model defines an abstract information structure called a DC metadata description set.
In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form according to the rules of a format or syntax. The DCMI Abstract Model itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".
Such a specification performs three functions:
-
it defines the subset of the features of the DCAM description set model which the syntax supports
-
it describes how each of the supported constructs and components of the DCAM description set are "encoded" in the concrete format
-
(conversely) it describes how features of the format are to be interpreted or "decoded" as representing constructs and components of the DCAM description set
The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2007 conference [SYNTAXTUT].
Expressing Dublin Core using XML
In order to represent a DC metadata description set in an XML document those constructs and components have to be represented as components in that XML document, i.e. as XML elements and XML attributes, XML element names and XML attribute names, and as XML element content and XML attribute values.
In June 2006, the Working Draft Expressing Dublin Core metadata using XML [DC-XML-2006] was released for public comment. As a result of comments received and subsequent discussions within the DCMI Architecture Forum, work continued in parallel on drafts for two different XML formats, one supporting the full description set model of the Abstract Model, known as DC-XML-Full, and the other supporting only a subset of that model, known as DC-XML-Min. The drafts for both formats were updated in 2007 to reflect the changes made to the DCMI Abstract Model.
Following discussions at the meeting of the DCMI Architecture Community at the DC-2007 conference and in subsequent telecons, it was decided to put forward a modified version of the DC-XML-Full format as a Proposed Recommendation, while continuing to work on DC-XML-Min, and in particular clarifying the requirements for that second format.
Expressing Dublin Core using RDF
In January 2008, DCMI published the document Expressing Dublin Core using the Resource Description Framework (RDF) [DC-RDF] as a DCMI Recommendation. This document described how the features of the DCMI Abstract Model description set model are represented using the RDF model, and replaced earlier DCMI specifications for expressing DC metadata in RDF.
Gleaning Resource Descriptions from Dialects of Languages (GRDDL)
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) [GRDDL] is a W3C Recommendation which describes a set of conventions for associating an XML document with an algorithm for the extraction of a set of RDF triples from that document. One of the mechanisms defined by GRDDL is the association of what it calls a Namespace Transformation with an XML Namespace Name, so that the transformation can be applied to extract RDF triples from any document which uses that XML Namespace Name in the name of its root element.
Interoperability Levels for Dublin Core Metadata
The DCMI Architecture Community is currently developing a draft document titled Interoperability levels for Dublin Core metadata [DC-LEVELS].
It describes several different categories or "levels" of interoperability that may be enabled using DC metadata, and specifies for each level the requirements that should be met by a metadata provider (and the expectations that a metadata consumer can expect to be satisfied).
The DC-XML-Full Format (2008)
The current DC-XML-Full format described in the Proposed Recommendation emerges from, and is directly shaped by, several of the developments listed above.
The primary purpose of the DC-XML-Full format is to enable what the "levels" document calls "DCAM-based syntactic interoperability" ("Level 3" interoperabilty), by providing rules for interpreting an instance of the format as a DC description set.
A pre-requisite for this is to support "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model. So the format also provides rules for interpreting an instance of the format as an RDF Graph, using the conventions specified in the DCMI Recommendation for representing DC metadata in RDF [DC-RDF]. Further, it provides an algorithm which implements this mapping to an RDF Graph in the form of a GRDDL Namespace Transformation.
The principles applied to the design of the DC-XML-Full format are described in the introduction to the document:
-
The format should provide a serialisation of all the features of the "Description Set Model" of the Abstract Model, i.e. it should be possible to represent all the constructs that make up a DC metadata description set.
-
The format is not required to address the features of the "Vocabulary Model" of the DCAM. For example, it is not required to express subproperty relationships between properties, subclass relationships between classes, etc.
-
The format should be easily usable with XML-based specifications such as XPath, XPointer and XQuery, i.e. for each construct in the DCAM there should be a mapping to exactly one construct in the XML syntax.
-
The format should not be dependent on features of a single XML Schema language.
-
It should be possible to describe the format using W3C XML Schema [XMLSCHEMA], but it is not a requirement that when the format is used to serialise description sets conforming to a DC Application Profile [DCAP], all the structural constraints expressed in the corresponding Description Set Profile [DSP] are captured using W3C XML Schema.
A W3C XML Schema for the DC-XML-Full format is available. The current URI of the schema is http://www.incognitum.net/petej/projects/dc-xml/full/xsd/2008/07/23/dcxf.xsd The schema will be assigned a DCMI-owned URI before circulation of the specification for comment.
The DC-XML-Full format provides a "base-line" XML format for serialising DC description sets. It is important to note that, in addition to the DC-XML-Full format, the DCMI Architecture Community intends to continue work on DC-XML-Min, the second new XML format for serialising description sets. The requirements for that second format are still being clarified.
Comparison between the DC-XML-Full and DC-XML-2003
The DC-XML-Full format is a different XML format from that specified by the current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core in XML [DC-XML-2003]. For the purposes of this discussion, that XML format is referred to as "DC-XML-2003". It was not defined in terms of the DCAM description set model, which in 2003 did not exist in today's form, or of an RDF Graph. It provides its own "abstract models" for a "simple DC record" and a "qualified DC record", and specifies an XML format for the representation of instances of those two models.
Although a mapping to the constructs of an RDF Graph and of a DCAM description set might be constructed retrospectively for DC-XML-2003, such a mapping can be made only for some features of the format, and is at best approximate as it relies on assumptions that may not accurately reflect the intent of metadata creators. Appendix A describes such a mapping for the DC-XML-2003 format
The features of the DCAM description set model supported by the two XML formats (DC-XML-2003, DC-XML-Full) are summarised in the following table:
| DCAM Description Set Model feature | Supported in DCAM Description Set Model | Supported in DC-XML-2003 | Supported in DC-XML-Full |
| description set | One description set | One description set | One description set |
| description | One to many descriptions | One description | One to many descriptions |
| described resource URI | One per description; any URI | Not supported | One per description; any URI |
| statement | One-to-many statements per description | One-to-many statements per description | One-to-many statements per description |
| property URI | One per statement; any URI | One per statement; any URI | One per statement; any URI |
| literal value surrogate | One per statement | One per statement; partial support | One per statement |
| literal value surrogate / value string | One per literal value surrogate | One per literal value surrogate; partial support | One per literal value surrogate |
| literal value surrogate / value string language | Zero-to-one per value string | Zero-to-one per value string | Zero-to-one per value string |
| literal value surrogate / SES URI | Zero-to-one per value string | Not supported | Zero-to-one per value string |
| non-literal value surrogate | One per statement | Not supported | One per statement |
| non-literal value surrogate / value string | Zero-to-many per non-literal value surrogate | Not supported | Zero-to-many per non-literal value surrogate |
| non-literal value surrogate / value string language | Zero-to-one per value string | Not supported | Zero-to-one per value string |
| non-literal value surrogate / SES URI | Zero-to-one per value string | Not supported | Zero-to-one per value string |
| non-literal value surrogate / value URI | Zero-to-many per non-literal value surrogate | Not supported | Zero-to-many per non-literal value surrogate |
| non-literal value surrogate / VES URI | Zero-to-one per non-literal value surrogate | Not supported | Zero-to-one per non-literal value surrogate |
Appendix A: ''Guidelines for implementing Dublin Core in XML'' (2003) and the DCAM
The current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core in XML (DC-XML-2003) pre-dated the development of the DCAM. That document provides its own "abstract models" for a "simple DC metadata record" and a "qualified DC metadata record", and specifies an XML format for the representation of instances of those two models. For the purposes of this discussion, that XML format is referred to as "DC-XML-2003".
However, the two models described by that document differ from the description set model provided by the DCAM: they use some different types of construct from those used by the DCAM, and also use different labels for constructs which are essentially similar to those used by the DCAM.
Simple Dublin Core (DC-XML-2003)
The "abstract model" for a "simple DC record" provided by DC-XML-2003 is:
-
A simple DC record is made up of one or more properties and their associated values.
-
Each property is an attribute of the resource being described.
-
Each property must be one of the 15 DCMES [DCMES] elements.
-
Properties may be repeated.
-
Each value is a literal string.
-
Each literal string value may have an associated language (e.g. en-GB).
Note that this is a much simpler model than that of the description set defined by the DCMI Abstract Model. In particular
-
It has no construct analogous to that of the description set
-
It has no construct analogous to that of the described resource URI
-
It limits property URIs to a fixed set of URIs
-
It makes no distinction analogous to that between a non-literal value surrogate and a literal value surrogate
-
It has no construct analogous to that of the syntax encoding scheme URI
-
It has no construct analogous to that of the value URI
-
It has no construct analogous to that of the vocabulary encoding scheme URI
-
It has no concept analogous to that that a non-literal value surrogate may include multiple value strings
On the basis of the description of the "simple DC record" model alone, it is not possible to determine whether a (simple DC record) "value" corresponds to:
-
A literal value surrogate containing a value string
-
A non-literal value surrogate containing a value string
To construct a mapping from the "simple DC record" model to (a subset of) the DCAM description set model, it is necessary to make a choice between those two options.
If one makes the assumption that the intent in the "simple DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then the following table specifies a mapping between the "simple DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "simple DC record".
| DC-XML-2003 | DCAM |
| "Simple DC record" | description set containing a single description |
| "Property + Value" | statement |
| "URI of Property" | property URI |
| "Value" | literal value surrogate/value string |
| "Language" | value string language |
Qualified Dublin Core (DC-XML-2003)
The "abstract model" for a "qualified DC record" provided by DC-XML-2003 is:
-
A qualified DC record is made up of one or more properties and their associated values.
-
Each property is an attribute of the resource being described.
-
Each property must be either:
-
one of the 15 DC elements,
-
one of the other elements recommended by the DCMI (e.g. audience) [DCTERMS],
-
one of the element refinements listed in the DCMI Metadata Terms recommendation [DCTERMS].
-
Properties may be repeated.
-
Each value is a literal string.
-
Each value may have an associated encoding scheme.
-
Each encoding scheme has a name.
-
Each literal string value may have an associated language (e.g. en-GB).
Again this is a simpler model than that of the description set defined by the DCMI Abstract Model. As above
-
It has no construct analogous to that of the description set
-
It has no construct analogous to that of the described resource URI
-
It limits property URIs to a fixed set of URIs
-
It makes no distinction analogous to that between a non-literal value surrogate and a literal value surrogate
-
It has no construct analogous to that of the syntax encoding scheme URI
-
It has no construct analogous to that of the value URI
-
It has no construct analogous to that of the vocabulary encoding scheme URI
-
It has no concept analogous to that that a non-literal value surrogate may include multiple value strings
For the "qualified DC record" model, the construction of a mapping to the DCAM description set model is more problematic.
As for the "simple DC record" case, there is no distinction between literal value surrogate and non-literal value surrogate. So, as above, on the basis of the description of the "qualified DC record" model alone, it is not possible to determine whether a (qualified DC record) "value" corresponds to:
-
A literal value surrogate containing a value string
-
A non-literal value surrogate containing a value string
Further, the "qualified DC record" model introduces a concept of "encoding scheme" but does not distinguish vocabulary encoding scheme URIs from syntax encoding scheme URIs, but it is not possible to determine whether a combination of (qualified DC record) "value" and "encoding scheme" corresponds to:
-
A literal value surrogate containing a value string plus syntax encoding scheme URI
-
A non-literal value surrogate containing a value string plus syntax encoding scheme URI
-
A non-literal value surrogate containing a value string plus vocabulary encoding scheme URI
If one makes the same assumption as for the "simple DC record" case, that the intent in the "qualified DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then only the first of the three options is possible for the interpretation of the "encoding scheme". However, examples in the DC-XML-2003 specification include referencess to "encoding schemes" which are vocabulary encoding schemes so a mapping of "encoding scheme" to syntax encoding scheme would not be correct in all cases. The only "safe" option would appear to be not to define a mapping for DC-XML-2003 "encoding schemes".
On that basis, the following table specifies a mapping between the "qualified DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "qualified DC record".
| DC-XML-2003 | DCAM |
| "Simple DC record" | description set containing a single description |
| "Property + Value" | statement |
| "URI of Property" | property URI |
| "Value" | literal value surrogate/value string |
| "URI of Encoding Scheme" | no mapping |
| "Language" | value string language |
Some points to note
-
No mapping is provided here for what DC-XML-2003 calls "encoding schemes".
-
The mapping of "value" to a literal value surrogate/value string may not be compatible with the intent behind the original model, where the intent seems to be to support either a literal value surrogate/value string or a non-literal value surrogate/value string or a non-literal value surrogate/value URI.
-
The mapping of "value" to a literal value surrogate/value string may introduce contradictions arising from the range of the property.
References
[ABSTRACT-MODEL]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/
[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels
[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/
[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2007-12-03
http://dublincore.org/documents/2007/12/03/dc-text/
[DC-XML-FULL]
Expressing Dublin Core metadata using XML (DC-XML-Full) Working Draft. 2008-07-23
http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFGuidelines/2008-07-23
[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/
[DC-XML-2006]
Expressing Dublin Core metadata using XML DCMI Working Draft. 2006-05-29
http://dublincore.org/documents/2006/05/29/dc-xml/
[DCAP]
The Singapore Framework for Dublin Core Application Profiles DCMI Recommended Resource. 2008-01-14
http://dublincore.org/documents/2008/01/14/singapore-framework/
[DSP]
Description Set Profiles: A constraint language for Dublin Core Application Profiles DCMI Working Draft. 2008-03-31
http://dublincore.org/documents/2008/03/31/dc-dsp/
[DCXML-DCAM]
DC-XML and the DCMI Abstract Model
http://www.ukoln.ac.uk/metadata/dcmi/dc-xml-issues/
[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2007, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf
[XMLSCHEMA]
XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/