> DcamInContext

2011-11-06: Moved to http://wiki.dublincore.org/index.php/Glossary/DCAM_Review

A review of the DCMI Abstract Model with scenarios for its future

Tom Baker, Pete Johnston

Identifier: http://dublincore.org/architecturewiki/DcamInContext

Date : 2010-10-15

About this review

This paper:

The paper has been produced for discussion on 22 October at a [WWW]Joint Meeting of the DCMI Architecture Forum and the W3C Library Linked Data Incubator Group. To the extent possible, the meeting will try to determine a realistic way forward for DCAM.

A short history of Dublin Core

The Dublin Core community's first step was the definition of twelve (later fifteen) "elements" in 1995, supplemented in 1997 by the addition of a notion of "qualifiers" of those elements, the [WWW]"Canberra qualifiers" [3]. In July 2000, with the publication of [WWW]"Dublin Core Qualifiers", qualifiers were differentiated into "element refinements" and "element encoding schemes" [4].

This "typology of terms" formed the basis of a guide for DCMI Usage Board vocabulary maintenance decisions as [WWW]"DCMI Grammatical Principles" created in February 2003, which further differentiated "encoding schemes" into "vocabulary encoding schemes" and "syntax encoding schemes" [5].

In addition to these initiatives to define a "typology of terms", and specific sets of terms based on that typology, work was also done to specify the format-independent "abstract data structure", or "abstract syntax", within which references to those terms were made. This work is usefully summarised in the 2000 D-Lib article [WWW]"A Grammar of Dublin Core" [5a] which articulates a "grammar" of "statements" made up of:

DCMI's first specification for a concrete syntax for Dublin Core metadata [WWW]RFC2731 Encoding Dublin Core Metadata in HTML [5b] (1999) was based (at least loosely/informally) on this model.

Starting in 1997, W3C's effort to define a Resource Description Framework (RDF) paralleled this work within DCMI, culminating in a first W3C Recommendation [WWW]"RDF Model and Syntax Specification" in February 1999 [6] and, following an extensive review process, a second W3C Recommendation [WWW]"RDF Concepts and Syntax" in February 2004 [7].

This lead to the development witin DCMI of two further sets of "encoding guidelines" for using Dublin Core terms in RDF, each of which specified the use of slightly different patterns/conventions:

The rationale for the DCMI Abstract Model

By the early 2000s, in addition to applications based on the specifications above, there were a growing number of "Dublin Core metadata" implementations which made no reference to any "abstract syntax", and the resulting picture was one of a landscape in which interoperability between applications was problematic. The RDF abstract syntax was recognized by parts of the Dublin Core community as a crucial development -- and the "grammar of Dublin Core" model was intended in part to popularize its notion of "statement-based" metadata -- but on the whole, RDF was seen by a large part of the Dublin Core community as a research project of dubious practical value. More specifically, RDF was seen less as a fundamentally different way of conceptualizing metadata and more as an alternative XML format for metadata -- and one that compared unfavorably to apparently simpler and more readable XML formats.

Given the political difficulty of directly promoting the RDF abstract syntax as a common basis for metadata, work on the DCMI Abstract Model was undertaken in 2003 in an attempt:

This effort resulted in a first [WWW]DCMI Recommendation in March 2005 [1] and a second, revised [WWW]DCMI Recommendation in June 2007 [2]. The term "element" was de-emphasized in favor of "RDF property". "Element refinements" were defined as "RDF sub-properties". Syntax Encoding Schemes were defined as RDF datatypes. The generic term "encoding scheme" and the even more generic term "qualifier" were further de-emphasized [8].

The DCMI Abstract Model defines an "abstract syntax" based on a data structure it calls the "description set". The specification [WWW]Expressing Dublin Core metadata using the Resource Description Framework (RDF) specification defines a mapping from the "description set" model to the RDF abstract syntax.

The revised DCMI Abstract Model of 2007 became the basis for revised concrete syntax specifications ([WWW]DC-HTML [8a], [WWW]DC-DS-XML [8b], [WWW]DC-TEXT [8c] ).

In 2009, Mikael Nilsson outlined a draft for an "RDF-based version" of the DCMI Abstract Model. The outline points towards an abstract model dramatically simplified with respect to the 2007 model. Entirely missing are two of the three component models of the 2007 model: the DCMI Resource Model (to be replaced by a simple reference to RDF) and the DCMI Vocabulary Model (to be replaced by a simple reference to the RDF Vocabulary Description Language, also known as RDF Schema). The intention of the "RDF-based version" was to define the abstract model entirely in terms of RDF while maintaining the "interface" to constructs defined in the third component of the 2007 model, the Description Set Model. As of 2010, this work remains at the stage of an early draft.

DC Application Profiles and Description Set Profiles

In 2000, the notion of an "application profile" was put forward and quickly became a central point of reference for the Dublin Core community. As originally proposed, an application profile was the specification of a particular pattern of "elements" and "encoding schemes" "used" in the context of a particular application or to describe a particular type of resource. In practice, this very general conceptualization was interpreted in a multiplicity of ways, resulting in a wide range of incompatible constructs.

The DCMI Abstract Model, with its formalization of an abstract syntax, provided the basis for a number of documents which sought to provide a formal specification of the "DC application profile" concept:

A user-oriented document, [WWW]"Guidelines for Dublin Core Application Profiles", was written to guide users through the process of designing and creating application profiles on the basis of the DCMI Abstract Model [16].

In addition to the XML syntax and RDF representations specified in the Description Set Profile document itself, a wiki syntax [16a] was also developed, together with a MoinMoin extension [16b] which generated both a tabular human-readable view and an XML representation of an application profile. The Description Set Profile was intended to be used for applications such as the automatic configuration of metadata editing tools and the generation of schemas for document validation.

Other approaches to the "structural constraints"/"validation" question have been explored within the Semantic Web community:

Relationship of the DCMI Abstract Model to RDF

While the 2009 draft "RDF-based" revision of the DCMI Abstract Model was never developed beyond the outline stage, this discussion paper uses its basic ideas as a starting point. Specifically, this paper ignores the Resource Model and Vocabulary Model defined in the 2007 Abstract Model and focuses exclusively on its centerpiece: the Description Set Model. As a guide to how the constructs of this model translate into RDF, this paper additionally follows the 2008 guidelines, [WWW]"Expressing Dublin Core metadata using the Resource Description Framework" (referred to hereafter by its short name, "DC-RDF") [18].

In the 2007 Abstract Model, the Description Set Model specifies both a set of syntactic elements (things found in data) and a set of referents in the real world (things to which the syntactic elements may be interpreted to refer). If the DCMI Abstract Model is to be based on the RDF abstract syntax, we can limit our analysis here to the syntactic elements. These include grouping constructs (Description Set, Description, Statement, Non-Literal Value Surrogate, and Literal Value Surrogate) and slots for URIs and character strings (Described Resource URI, Property URI, Value URI, Vocabulary Encoding Scheme URI, Syntax Encoding Scheme URI, Value String Language, Plain Value String, and Typed Value String). One might think of these slots as components of the DCAM abstract syntax that can be tested. In the 2007 specification, these syntactic elements are described using UML, but they are more popularly depicted in the form of a nested metadata template, as in Figure 1 below.

DescriptionSetModel75.jpg
Figure 1: Description Set Model (part of DCMI Abstract Model)

As an illustration of how the syntactic elements of the Description Set Model are used, Figure 2 shows a set of example information values in the placeholders corresponding to those shown in Figure 1.

DescriptionSetModelValues75.jpg
Figure 2: Description Set Model slots with example URIs and character strings

How the elements of the Description Set Model relate to RDF is roughly visualized in Figure 3 and described in more detail in Appendix B below.

DcamAndRdfGraph.jpg
Figure 3: Relationship of Description Set Profile components to RDF graphs

The Description Set Profile constraint language

The March 2008 specification [WWW]"Description Set Profiles: a constraint language for Dublin Core application profiles" (hereafter DC-DSP) [9] provides a language for specifying a set of constraints on the "description set" construct defined by the DCMI Abstract Model. The word "constraints" evokes the notion that the set of possible ways that the slots defined by the Description Set Model may be filled is infinite, and these infinite possibilities are being configured, or "constrained", in specific ways for specific content.

In the terminology of the DC-DSP specification, sets of constraints are expressed in "templates". Templates use constraints to specify some community- or application-specific rules for the contents of a description set: first, the descriptions, description by description (Description Templates); then within each description, statement by statement (Statement Templates); and within each statement, slot by slot (Constraints).

Conceptually, templates are like cookie cutters for mass-producing actual descriptions and statements in real instance metadata. Actual descriptions and statements in real instance metadata, in turn, are conceptualized as "matching" specific templates according to a matching algorithm.

Very broadly, DC-DSP provides a language for specifying things such as:

DCAM in 2010

It is difficult to track the use of freely available specifications once they are released on the Web, but as of 2010, DCMI is not aware that any of the Abstract-Model-related specifications, with the possible exception of specific syntax guidelines, have been widely implemented.

Rather than building a bridge from more traditional metadata communities to the Semantic Web, the Abstract Model appears to have fallen between two stools -- its use of the "description set" abstraction perplexing to users more accustomed to metadata specifications defined in terms of a concrete syntax, and its added layer of Dublin-Core-specific terminology confusing to users comfortable with the RDF model.

Since 2006, however, the rapid success of Linked Data has given the notion of Semantic Web, based on Resource Description Framework (RDF), wider visibility and acceptance. If Linked Data is crossing the chasm to widespread deployment, and the conceptual model of RDF is reaching a wider community, is there still a need for the bridge that the DCMI Abstract Model was intended to be?

As of 2010, moreover, new Semantic Web specifications such as Simple Knowledge Organization System (SKOS) address issues that overlap with those of the DCMI Abstract Model. [WWW]Discussions in the Semantic Web community about a new version of RDF ("RDF 2") point towards further developments in the core Semantic Web standards, such as Named Graphs, that parallel some of the more innovative features of the DCMI Abstract Model [17]. To what extent can the DCMI Abstract Model already be expressed in terms of newly-mainstream Linked Data concepts? If aspects of the DCMI Abstract Model still cannot be expressed with more mainstream concepts, what are the prospects for being able to do so in the foreseeable future and, more to the point, what steps should be taken in the meantime to revise, deprecate, or replace the DCMI Abstract Model?

The question does not affect just the Abstract Model, but the suite of related specifications, syntax guidelines, and user documentation built on the Abstract Model. What requirements are reflected in this considerable body of work, which of these requirements now appear to be most important, and how should we best proceed to address those requirements?

Scenarios for the future of DCAM

Scenario 1. DCMI carries on developing DCAM as before

Scenario 2a. DCMI develops a "DCAM 2" spec as the basis for new work

Scenario 2b. DCMI develops "DCAM 2" as a transitional explanatory document

Scenario 3. DCMI deprecates the DCAM abstract syntax and embraces RDF abstract syntax

Scenario 4. DCMI does nothing - DCAM is simply left untouched

General issues for discussion

DCAM abstract syntax versus the RDF abstract syntax

Application Profiles

Appendix A. The DCAM family of specifications

Appendix B. Relationship of the Description Set Model to RDF

It is worth noting that the Dublin Core "grammar" for "statements" of circa 2000, pictured in Figure 4, was in part an early attempt to popularize the notion of metadata as being based on meaningful "statements" by way of analogy to commonly understood notions of natural-language grammar.

Grammar2000.jpg
Figure 4: Early "grammar" of Dublin Core "statements" (circa 2000)

As part of the model, a rough mapping of the Dublin Core "grammar" to RDF "statements" was provided, as pictured in Figure 5.

GrammarRdf.jpg
Figure 5: Mapping of early Dublin Core "grammar" to RDF statements (circa 2000)

This appendix examines how the syntactic elements of the Description Set Model, the central component of the DCMI Abstract Model of 2005-2007, relate to RDF -- both as it currently exists and as it is likely to evolve in the medium term. Note that the 2007 abstract model defines these elements both as syntactic elements and in terms of the "things in the world" to which these elements refer. For simplicity of exposition, this analysis focuses exclusively on the syntactic elements.

Grouping constructs in the Description Set Model

The grouping constructs in the Description Set Model are:

Slots in the Description Set Model

The slots in the Description Set Model map to constructs in the [WWW]RDF abstract syntax [19]. Following the [WWW]DC-RDF guidelines [18], these slots may be characterized in terms of the RDF abstract syntax as follows:

References