> DCXMLRevision/DCXMLFNotes/2008-07-23

Notes on the DC-XML-Full XML Format

Title: Notes on the DC-XML-Full XML Format
Creator: Pete Johnston, Eduserv Foundation <pete.johnston@eduserv.org.uk>
Date Issued: 2008-07-23
Identifier: http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFNotes/2008-07-23
Replaces: Not applicable
Is Replaced By: Not applicable
Latest Version: http://dublincore.org/architecturewiki/DCXMLFRevision/DCXMLFNotes
Description of Document: This document describes the background to the development of Expressing Dublin Core metadata using XML (DC-XML-Full).

Introduction

In September 2008, DCMI will circulate the document Expressing Dublin Core using XML (DC-XML-Full) [DC-XML-Full] as a DCMI Proposed Recommendation for public comment. This document describes the background to its development and its relationship to other DCMI specifications.

Background

The DCMI Abstract Model

Since 2003, DCMI has sought to formalise its model for Dublin Core metadata, and this has resulted in the publication of the DCMI Abstract Model [ABSTRACT-MODEL], the second version of which was given the status of DCMI Recommendation in June 2007.

The Abstract Model defines an abstract information structure called a DC metadata description set.

In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form according to the rules of a format or syntax. The DCMI Abstract Model itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".

Such a specification performs three functions:

The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2007 conference [SYNTAXTUT].

Expressing Dublin Core using XML

In order to represent a DC metadata description set in an XML document those constructs and components have to be represented as components in that XML document, i.e. as XML elements and XML attributes, XML element names and XML attribute names, and as XML element content and XML attribute values.

In June 2006, the Working Draft Expressing Dublin Core metadata using XML [DC-XML-2006] was released for public comment. As a result of comments received and subsequent discussions within the DCMI Architecture Forum, work continued in parallel on drafts for two different XML formats, one supporting the full description set model of the Abstract Model, known as DC-XML-Full, and the other supporting only a subset of that model, known as DC-XML-Min. The drafts for both formats were updated in 2007 to reflect the changes made to the DCMI Abstract Model.

Following discussions at the meeting of the DCMI Architecture Community at the DC-2007 conference and in subsequent telecons, it was decided to put forward a modified version of the DC-XML-Full format as a Proposed Recommendation, while continuing to work on DC-XML-Min, and in particular clarifying the requirements for that second format.

Expressing Dublin Core using RDF

In January 2008, DCMI published the document Expressing Dublin Core using the Resource Description Framework (RDF) [DC-RDF] as a DCMI Recommendation. This document described how the features of the DCMI Abstract Model description set model are represented using the RDF model, and replaced earlier DCMI specifications for expressing DC metadata in RDF.

Gleaning Resource Descriptions from Dialects of Languages (GRDDL)

Gleaning Resource Descriptions from Dialects of Languages (GRDDL) [GRDDL] is a W3C Recommendation which describes a set of conventions for associating an XML document with an algorithm for the extraction of a set of RDF triples from that document. One of the mechanisms defined by GRDDL is the association of what it calls a Namespace Transformation with an XML Namespace Name, so that the transformation can be applied to extract RDF triples from any document which uses that XML Namespace Name in the name of its root element.

Interoperability Levels for Dublin Core Metadata

The DCMI Architecture Community is currently developing a draft document titled Interoperability levels for Dublin Core metadata [DC-LEVELS].

It describes several different categories or "levels" of interoperability that may be enabled using DC metadata, and specifies for each level the requirements that should be met by a metadata provider (and the expectations that a metadata consumer can expect to be satisfied).

The DC-XML-Full Format (2008)

The current DC-XML-Full format described in the Proposed Recommendation emerges from, and is directly shaped by, several of the developments listed above.

The primary purpose of the DC-XML-Full format is to enable what the "levels" document calls "DCAM-based syntactic interoperability" ("Level 3" interoperabilty), by providing rules for interpreting an instance of the format as a DC description set.

A pre-requisite for this is to support "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model. So the format also provides rules for interpreting an instance of the format as an RDF Graph, using the conventions specified in the DCMI Recommendation for representing DC metadata in RDF [DC-RDF]. Further, it provides an algorithm which implements this mapping to an RDF Graph in the form of a GRDDL Namespace Transformation.

The principles applied to the design of the DC-XML-Full format are described in the introduction to the document:

A W3C XML Schema for the DC-XML-Full format is available. The current URI of the schema is http://www.incognitum.net/petej/projects/dc-xml/full/xsd/2008/07/23/dcxf.xsd The schema will be assigned a DCMI-owned URI before circulation of the specification for comment.

The DC-XML-Full format provides a "base-line" XML format for serialising DC description sets. It is important to note that, in addition to the DC-XML-Full format, the DCMI Architecture Community intends to continue work on DC-XML-Min, the second new XML format for serialising description sets. The requirements for that second format are still being clarified.

Comparison between the DC-XML-Full and DC-XML-2003

The DC-XML-Full format is a different XML format from that specified by the current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core in XML [DC-XML-2003]. For the purposes of this discussion, that XML format is referred to as "DC-XML-2003". It was not defined in terms of the DCAM description set model, which in 2003 did not exist in today's form, or of an RDF Graph. It provides its own "abstract models" for a "simple DC record" and a "qualified DC record", and specifies an XML format for the representation of instances of those two models.

Although a mapping to the constructs of an RDF Graph and of a DCAM description set might be constructed retrospectively for DC-XML-2003, such a mapping can be made only for some features of the format, and is at best approximate as it relies on assumptions that may not accurately reflect the intent of metadata creators. Appendix A describes such a mapping for the DC-XML-2003 format

The features of the DCAM description set model supported by the two XML formats (DC-XML-2003, DC-XML-Full) are summarised in the following table:

DCAM Description Set Model feature Supported in DCAM Description Set Model Supported in DC-XML-2003 Supported in DC-XML-Full
description set One description set One description set One description set
description One to many descriptions One description One to many descriptions
described resource URI One per description; any URI Not supported One per description; any URI
statement One-to-many statements per description One-to-many statements per description One-to-many statements per description
property URI One per statement; any URI One per statement; any URI One per statement; any URI
literal value surrogate One per statement One per statement; partial support One per statement
literal value surrogate / value string One per literal value surrogate One per literal value surrogate; partial support One per literal value surrogate
literal value surrogate / value string language Zero-to-one per value string Zero-to-one per value string Zero-to-one per value string
literal value surrogate / SES URI Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate One per statement Not supported One per statement
non-literal value surrogate / value string Zero-to-many per non-literal value surrogate Not supported Zero-to-many per non-literal value surrogate
non-literal value surrogate / value string language Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate / SES URI Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate / value URI Zero-to-many per non-literal value surrogate Not supported Zero-to-many per non-literal value surrogate
non-literal value surrogate / VES URI Zero-to-one per non-literal value surrogate Not supported Zero-to-one per non-literal value surrogate

Appendix A: ''Guidelines for implementing Dublin Core in XML'' (2003) and the DCAM

The current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core in XML (DC-XML-2003) pre-dated the development of the DCAM. That document provides its own "abstract models" for a "simple DC metadata record" and a "qualified DC metadata record", and specifies an XML format for the representation of instances of those two models. For the purposes of this discussion, that XML format is referred to as "DC-XML-2003".

However, the two models described by that document differ from the description set model provided by the DCAM: they use some different types of construct from those used by the DCAM, and also use different labels for constructs which are essentially similar to those used by the DCAM.

Simple Dublin Core (DC-XML-2003)

The "abstract model" for a "simple DC record" provided by DC-XML-2003 is:

Note that this is a much simpler model than that of the description set defined by the DCMI Abstract Model. In particular

On the basis of the description of the "simple DC record" model alone, it is not possible to determine whether a (simple DC record) "value" corresponds to:

To construct a mapping from the "simple DC record" model to (a subset of) the DCAM description set model, it is necessary to make a choice between those two options.

If one makes the assumption that the intent in the "simple DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then the following table specifies a mapping between the "simple DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "simple DC record".

DC-XML-2003 DCAM
"Simple DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string
"Language" value string language

Qualified Dublin Core (DC-XML-2003)

The "abstract model" for a "qualified DC record" provided by DC-XML-2003 is:

Again this is a simpler model than that of the description set defined by the DCMI Abstract Model. As above

For the "qualified DC record" model, the construction of a mapping to the DCAM description set model is more problematic.

As for the "simple DC record" case, there is no distinction between literal value surrogate and non-literal value surrogate. So, as above, on the basis of the description of the "qualified DC record" model alone, it is not possible to determine whether a (qualified DC record) "value" corresponds to:

Further, the "qualified DC record" model introduces a concept of "encoding scheme" but does not distinguish vocabulary encoding scheme URIs from syntax encoding scheme URIs, but it is not possible to determine whether a combination of (qualified DC record) "value" and "encoding scheme" corresponds to:

If one makes the same assumption as for the "simple DC record" case, that the intent in the "qualified DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then only the first of the three options is possible for the interpretation of the "encoding scheme". However, examples in the DC-XML-2003 specification include referencess to "encoding schemes" which are vocabulary encoding schemes so a mapping of "encoding scheme" to syntax encoding scheme would not be correct in all cases. The only "safe" option would appear to be not to define a mapping for DC-XML-2003 "encoding schemes".

On that basis, the following table specifies a mapping between the "qualified DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "qualified DC record".

DC-XML-2003 DCAM
"Simple DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string
"URI of Encoding Scheme" no mapping
"Language" value string language

Some points to note

References

[ABSTRACT-MODEL]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/

[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels

[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2007-12-03
http://dublincore.org/documents/2007/12/03/dc-text/

[DC-XML-FULL]
Expressing Dublin Core metadata using XML (DC-XML-Full) Working Draft. 2008-07-23
http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFGuidelines/2008-07-23

[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/

[DC-XML-2006]
Expressing Dublin Core metadata using XML DCMI Working Draft. 2006-05-29
http://dublincore.org/documents/2006/05/29/dc-xml/

[DCAP]
The Singapore Framework for Dublin Core Application Profiles DCMI Recommended Resource. 2008-01-14
http://dublincore.org/documents/2008/01/14/singapore-framework/

[DSP]
Description Set Profiles: A constraint language for Dublin Core Application Profiles DCMI Working Draft. 2008-03-31
http://dublincore.org/documents/2008/03/31/dc-dsp/

[DCXML-DCAM]
DC-XML and the DCMI Abstract Model
http://www.ukoln.ac.uk/metadata/dcmi/dc-xml-issues/

[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2007, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf

[XMLSCHEMA]
XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/