Differences between versions dated 2011-11-06 22:28:56 and 2011-11-06 22:30:34
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| = 2011-11-06: Moved to http://wiki.dublincore.org/index.php/Glossary/DCAM_Review = |
2011-11-06: Moved to http://wiki.dublincore.org/index.php/Glossary/DCAM_Review |
A review of the DCMI Abstract Model with scenarios for its future
Tom Baker, Pete Johnston
Identifier: http://dublincore.org/architecturewiki/DcamInContext
Date : 2010-10-15
About this review
This paper:
-
contextualizes the DCMI Abstract Model (DCAM) within the history of DCMI and Web standards;
-
describes the DCAM approach with reference to Resource Description Framework (RDF);
-
proposes alternative scenarios for the future development of DCAM;
-
assesses the impact of alternative scenarios on specifications that depend on DCAM.
The paper has been produced for discussion on 22 October at a
Joint Meeting of the DCMI Architecture Forum and the W3C Library Linked Data Incubator Group. To the extent possible, the meeting will try to determine a realistic way forward for DCAM.
A short history of Dublin Core
The Dublin Core community's first step was the definition of twelve (later fifteen) "elements" in 1995, supplemented in 1997 by the addition of a notion of "qualifiers" of those elements, the
"Canberra qualifiers" [3]. In July 2000, with the publication of
"Dublin Core Qualifiers", qualifiers were differentiated into "element refinements" and "element encoding schemes" [4].
This "typology of terms" formed the basis of a guide for DCMI Usage Board vocabulary maintenance decisions as
"DCMI Grammatical Principles" created in February 2003, which further differentiated "encoding schemes" into "vocabulary encoding schemes" and "syntax encoding schemes" [5].
In addition to these initiatives to define a "typology of terms", and specific sets of terms based on that typology, work was also done to specify the format-independent "abstract data structure", or "abstract syntax", within which references to those terms were made. This work is usefully summarised in the 2000 D-Lib article
"A Grammar of Dublin Core" [5a] which articulates a "grammar" of "statements" made up of:
-
an implicit reference to the thing being described
-
a reference to one of the 15 "properties" or "elements" of the DCMES
-
a "property value" ("an appropriate literal")
-
(optionally) references to one or more "qualifiers"
DCMI's first specification for a concrete syntax for Dublin Core metadata
RFC2731 Encoding Dublin Core Metadata in HTML [5b] (1999) was based (at least loosely/informally) on this model.
Starting in 1997, W3C's effort to define a Resource Description Framework (RDF) paralleled this work within DCMI, culminating in a first W3C Recommendation
"RDF Model and Syntax Specification" in February 1999 [6] and, following an extensive review process, a second W3C Recommendation
"RDF Concepts and Syntax" in February 2004 [7].
This lead to the development witin DCMI of two further sets of "encoding guidelines" for using Dublin Core terms in RDF, each of which specified the use of slightly different patterns/conventions:
-
Expressing Simple Dublin Core in RDF/XML [7a] (2002): specified the use of literal values only
-
Expressing Qualified Dublin Core in RDF [7b] (2002): a wider range of conventions, including use of Bag, Seq, Alt
The rationale for the DCMI Abstract Model
By the early 2000s, in addition to applications based on the specifications above, there were a growing number of "Dublin Core metadata" implementations which made no reference to any "abstract syntax", and the resulting picture was one of a landscape in which interoperability between applications was problematic. The RDF abstract syntax was recognized by parts of the Dublin Core community as a crucial development -- and the "grammar of Dublin Core" model was intended in part to popularize its notion of "statement-based" metadata -- but on the whole, RDF was seen by a large part of the Dublin Core community as a research project of dubious practical value. More specifically, RDF was seen less as a fundamentally different way of conceptualizing metadata and more as an alternative XML format for metadata -- and one that compared unfavorably to apparently simpler and more readable XML formats.
Given the political difficulty of directly promoting the RDF abstract syntax as a common basis for metadata, work on the DCMI Abstract Model was undertaken in 2003 in an attempt:
-
to clarify and formalize the "home-grown" model of metadata that had emerged from early Dublin Core workshops, and formed the basis of the DCMI Grammatical Principles; and
-
(particularly in the second revision) to align that model with that of the RDF abstract syntax and RDF semantics.
This effort resulted in a first
DCMI Recommendation in March 2005 [1] and a second, revised
DCMI Recommendation in June 2007 [2]. The term "element" was de-emphasized in favor of "RDF property". "Element refinements" were defined as "RDF sub-properties". Syntax Encoding Schemes were defined as RDF datatypes. The generic term "encoding scheme" and the even more generic term "qualifier" were further de-emphasized [8].
The DCMI Abstract Model defines an "abstract syntax" based on a data structure it calls the "description set". The specification
Expressing Dublin Core metadata using the Resource Description Framework (RDF) specification defines a mapping from the "description set" model to the RDF abstract syntax.
The revised DCMI Abstract Model of 2007 became the basis for revised concrete syntax specifications (
DC-HTML [8a],
DC-DS-XML [8b],
DC-TEXT [8c] ).
In 2009, Mikael Nilsson outlined a draft for an "RDF-based version" of the DCMI Abstract Model. The outline points towards an abstract model dramatically simplified with respect to the 2007 model. Entirely missing are two of the three component models of the 2007 model: the DCMI Resource Model (to be replaced by a simple reference to RDF) and the DCMI Vocabulary Model (to be replaced by a simple reference to the RDF Vocabulary Description Language, also known as RDF Schema). The intention of the "RDF-based version" was to define the abstract model entirely in terms of RDF while maintaining the "interface" to constructs defined in the third component of the 2007 model, the Description Set Model. As of 2010, this work remains at the stage of an early draft.
DC Application Profiles and Description Set Profiles
In 2000, the notion of an "application profile" was put forward and quickly became a central point of reference for the Dublin Core community. As originally proposed, an application profile was the specification of a particular pattern of "elements" and "encoding schemes" "used" in the context of a particular application or to describe a particular type of resource. In practice, this very general conceptualization was interpreted in a multiplicity of ways, resulting in a wide range of incompatible constructs.
The DCMI Abstract Model, with its formalization of an abstract syntax, provided the basis for a number of documents which sought to provide a formal specification of the "DC application profile" concept:
-
A definition of a constraint language,
"Description Set Profiles" [9]
-
The
"Singapore Framework for Dublin Core Application Profiles" (2007) [10];
-
application profile review criteria used by the DCMI Usage Board [11]; and
A user-oriented document,
"Guidelines for Dublin Core Application Profiles", was written to guide users through the process of designing and creating application profiles on the basis of the DCMI Abstract Model [16].
In addition to the XML syntax and RDF representations specified in the Description Set Profile document itself, a wiki syntax [16a] was also developed, together with a MoinMoin extension [16b] which generated both a tabular human-readable view and an XML representation of an application profile. The Description Set Profile was intended to be used for applications such as the automatic configuration of metadata editing tools and the generation of schemas for document validation.
Other approaches to the "structural constraints"/"validation" question have been explored within the Semantic Web community:
-
In 2005, Dan Brickley put forward a proposal for conceptualizing
Dublin Core application profiles as "query profiles" [13].
-
In 2007, Alistair Miles proposed
"Son of Dublin Core", a draft approach for encoding and validating "graph-based metadata" using a concrete XML syntax and language for expressing application-specific syntax constraints over a metadata graph [14].
-
In 2009, at the Bristol Vocamp, Dave Reynolds noted
the use of OWL as a constraint language [14a]: "there is no problem at all with creating tools which make a closed world and unique name assumption for the purposes of data validation. They aren't violating the OWL semantics, so long as they don't purport to be doing OWL consistency checking, they are doing a different job but a useful one."
Relationship of the DCMI Abstract Model to RDF
While the 2009 draft "RDF-based" revision of the DCMI Abstract Model was never developed beyond the outline stage, this discussion paper uses its basic ideas as a starting point. Specifically, this paper ignores the Resource Model and Vocabulary Model defined in the 2007 Abstract Model and focuses exclusively on its centerpiece: the Description Set Model. As a guide to how the constructs of this model translate into RDF, this paper additionally follows the 2008 guidelines,
"Expressing Dublin Core metadata using the Resource Description Framework" (referred to hereafter by its short name, "DC-RDF") [18].
In the 2007 Abstract Model, the Description Set Model specifies both a set of syntactic elements (things found in data) and a set of referents in the real world (things to which the syntactic elements may be interpreted to refer). If the DCMI Abstract Model is to be based on the RDF abstract syntax, we can limit our analysis here to the syntactic elements. These include grouping constructs (Description Set, Description, Statement, Non-Literal Value Surrogate, and Literal Value Surrogate) and slots for URIs and character strings (Described Resource URI, Property URI, Value URI, Vocabulary Encoding Scheme URI, Syntax Encoding Scheme URI, Value String Language, Plain Value String, and Typed Value String). One might think of these slots as components of the DCAM abstract syntax that can be tested. In the 2007 specification, these syntactic elements are described using UML, but they are more popularly depicted in the form of a nested metadata template, as in Figure 1 below.
![]() |
| Figure 1: Description Set Model (part of DCMI Abstract Model) |
As an illustration of how the syntactic elements of the Description Set Model are used, Figure 2 shows a set of example information values in the placeholders corresponding to those shown in Figure 1.
![]() |
| Figure 2: Description Set Model slots with example URIs and character strings |
How the elements of the Description Set Model relate to RDF is roughly visualized in Figure 3 and described in more detail in Appendix B below.
![]() |
| Figure 3: Relationship of Description Set Profile components to RDF graphs |
The Description Set Profile constraint language
The March 2008 specification
"Description Set Profiles: a constraint language for Dublin Core application profiles" (hereafter DC-DSP) [9] provides a language for specifying a set of constraints on the "description set" construct defined by the DCMI Abstract Model. The word "constraints" evokes the notion that the set of possible ways that the slots defined by the Description Set Model may be filled is infinite, and these infinite possibilities are being configured, or "constrained", in specific ways for specific content.
In the terminology of the DC-DSP specification, sets of constraints are expressed in "templates". Templates use constraints to specify some community- or application-specific rules for the contents of a description set: first, the descriptions, description by description (Description Templates); then within each description, statement by statement (Statement Templates); and within each statement, slot by slot (Constraints).
Conceptually, templates are like cookie cutters for mass-producing actual descriptions and statements in real instance metadata. Actual descriptions and statements in real instance metadata, in turn, are conceptualized as "matching" specific templates according to a matching algorithm.
Very broadly, DC-DSP provides a language for specifying things such as:
-
Minimum and maximum allowable occurences of actual descriptions matching a given Description Template within a Description Set.
-
Whether descriptions matching a given Description Template may stand alone within a Description Set or whether their presence depends on the presence of descriptions matching another Description Template (example: "no stand-alone descriptions of authors in the absence of a description of the book they wrote").
-
Minimum and maximum allowable occurences of actual statements matching a given Statement Template within a Description.
-
For slots designed to hold URIs -- such as Property URI, Value URI, and Value Encoding Scheme URI -- whether it is mandatory, optional, or disallowed that the given slot be filled in a given Statement Template, or a list of URIs that may be used in the given slot (as in: "the slot labeled Property URI must contain one of these URIs").
-
For slots designed to hold character strings or language tags, whether it is mandatory, optional, or disallowed that the given slot be filled in a given Statement Template, or a list of character strings that may be used in the given slot.
DCAM in 2010
It is difficult to track the use of freely available specifications once they are released on the Web, but as of 2010, DCMI is not aware that any of the Abstract-Model-related specifications, with the possible exception of specific syntax guidelines, have been widely implemented.
Rather than building a bridge from more traditional metadata communities to the Semantic Web, the Abstract Model appears to have fallen between two stools -- its use of the "description set" abstraction perplexing to users more accustomed to metadata specifications defined in terms of a concrete syntax, and its added layer of Dublin-Core-specific terminology confusing to users comfortable with the RDF model.
Since 2006, however, the rapid success of Linked Data has given the notion of Semantic Web, based on Resource Description Framework (RDF), wider visibility and acceptance. If Linked Data is crossing the chasm to widespread deployment, and the conceptual model of RDF is reaching a wider community, is there still a need for the bridge that the DCMI Abstract Model was intended to be?
As of 2010, moreover, new Semantic Web specifications such as Simple Knowledge Organization System (SKOS) address issues that overlap with those of the DCMI Abstract Model.
Discussions in the Semantic Web community about a new version of RDF ("RDF 2") point towards further developments in the core Semantic Web standards, such as Named Graphs, that parallel some of the more innovative features of the DCMI Abstract Model [17]. To what extent can the DCMI Abstract Model already be expressed in terms of newly-mainstream Linked Data concepts? If aspects of the DCMI Abstract Model still cannot be expressed with more mainstream concepts, what are the prospects for being able to do so in the foreseeable future and, more to the point, what steps should be taken in the meantime to revise, deprecate, or replace the DCMI Abstract Model?
The question does not affect just the Abstract Model, but the suite of related specifications, syntax guidelines, and user documentation built on the Abstract Model. What requirements are reflected in this considerable body of work, which of these requirements now appear to be most important, and how should we best proceed to address those requirements?
Scenarios for the future of DCAM
Scenario 1. DCMI carries on developing DCAM as before
-
DCMI carries on developing DCAM as before, incrementally improving the DCAM and Description Set Profile specifications, with a work plan for developing further concrete syntaxes based on DCAM.
-
Questions:
-
Is there a demonstrated interest?
-
Who would edit the specs?
-
How would testing and review be managed?
Scenario 2a. DCMI develops a "DCAM 2" spec as the basis for new work
-
DCMI develops a "DCAM 2" specification -- simplified and better aligned with RDF
-
In Variant 2a, the improved DCAM 2 specification would be taken as the new basis
-
for the Description Set Profile language of structural constraints for application profiles
-
for a workplan to develop new and improve existing concrete syntaxes on the basis of "DCAM 2".
-
Questions:
-
Is there a demonstrated interest in "DCAM 2"?
-
As in #1 above: Who would edit the specs? How would review and testing be managed?
-
What would be the impact of "DCAM 2" on specifications in the existing "DCAM family of specifications"?
Scenario 2b. DCMI develops "DCAM 2" as a transitional explanatory document
-
DCMI develops "DCAM 2" as an explanation for how the legacy DCAM model relates to RDF
-
In Variant 2b, "DCAM 2" would serve the purposes of:
-
Clarification, for the Dublin Core community generally and for users of DCAM in particular, of how DCAM relates to RDF and Linked Data
-
Role of a "transitional" specification, to be deprecated over time in favor of RDF
-
No workplan for new concrete syntaxes would be undertaken.
-
Questions
-
Is there interest in the clarification that a "DCAM 2" spec would provide?
-
Who would edit "DCAM 2"?
-
What should be done with the existing "DCAM family" of specifications? (Currently, most are DCMI Recommendations or DCMI Recommended Resources.)
-
What is an "application profile"? Is it based on DCAM/DSP or on the RDF abstract syntax?
Scenario 3. DCMI deprecates the DCAM abstract syntax and embraces RDF abstract syntax
-
Rather than explain in any detail how the legacy DCAM model relates to RDF, DCMI simply depicts DCAM as a "product of its time" and henceforth promotes the RDF abstract syntax.
-
Questions
-
Are there current users of DCAM who would be negatively impacted?
-
What should be done with the existing "DCAM family" of specifications (e.g., in terms of status as DCMI Recommendations or Recommended Resources)?
-
What explanation would DCMI provide, particularly with regard to application profiles -- hitherto a central aspect of the DCMI message? What is an "application profile" if it is not based on DCAM, DSP, and the Singapore Framework?
Scenario 4. DCMI does nothing - DCAM is simply left untouched
-
DCMI does nothing to change the statuses of DCAM-related specifications.
-
DCAM and DSP are in effect "frozen" and de-emphasized, with no particular explanation.
-
Scenario 4 is the most "economical" in terms of (human) resources.
-
Questions
-
If DCMI does not, in fact, stand behind specifications that continue to bear the status of DCMI Recommendations, what would be the cost to DCMI in terms of credibility?
General issues for discussion
DCAM abstract syntax versus the RDF abstract syntax
-
Should DCAM dissolve into mainstream RDF? For example:
-
Are Descriptions and Description Sets expressible as Named Graphs?
-
Are there significant differences between Vocabulary Encoding Schemes and SKOS Concept Schemes?
-
Do aspects of the DCAM mapping to RDF need to be revisited (e.g., the rdf:value for value strings associated with object nodes, as opposed to skos:prefLabel, rdfs:label, foaf:name, skos:notation, or dcterms:title).
Application Profiles
-
Does RDF need a notion of Application Profiles?
-
If so, what are the requirements?
-
Do application profiles need to express constraints?
-
If not with DCAM, how should patterns of constraints at the level of RDF graphs be expressed?
-
Using syntax pattern checks (patterns "in the graph" rather than "in the world") along the lines of the Description Set Profile constraint language? Or might it be enough to use SPARQL query patterns?
-
Using OWL applied with closed-world Assumptions?
-
Is it useful, as in the Singapore Framework, to distinguish strongly between declared vocabularies and declared vocabularies as used and constrained in data formats?
-
Should constraints be wired into the formal specifications of vocabularies?
-
Or should constraints be expressed as patterns matched to the data?
Appendix A. The DCAM family of specifications
-
Description Set Profiles: A constraint language for Dublin Core Application Profiles
-
DCMI Usage Board Criteria for the Review of Application Profiles
-
Expressing Dublin Core metadata using the Resource Description Framework (RDF)
-
Expressing Dublin Core Description Sets using XML (DC-DS-XML)
-
Expressing Dublin Core metadata using HTML/XHTML meta and link elements (DC-HTML)
Appendix B. Relationship of the Description Set Model to RDF
It is worth noting that the Dublin Core "grammar" for "statements" of circa 2000, pictured in Figure 4, was in part an early attempt to popularize the notion of metadata as being based on meaningful "statements" by way of analogy to commonly understood notions of natural-language grammar.
![]() |
| Figure 4: Early "grammar" of Dublin Core "statements" (circa 2000) |
As part of the model, a rough mapping of the Dublin Core "grammar" to RDF "statements" was provided, as pictured in Figure 5.
![]() |
| Figure 5: Mapping of early Dublin Core "grammar" to RDF statements (circa 2000) |
This appendix examines how the syntactic elements of the Description Set Model, the central component of the DCMI Abstract Model of 2005-2007, relate to RDF -- both as it currently exists and as it is likely to evolve in the medium term. Note that the 2007 abstract model defines these elements both as syntactic elements and in terms of the "things in the world" to which these elements refer. For simplicity of exposition, this analysis focuses exclusively on the syntactic elements.
Grouping constructs in the Description Set Model
The grouping constructs in the Description Set Model are:
-
Description Set: A set of one or more Descriptions.
-
Description: A set of statements about one, and only one, resource. Syntactically, the Description consists of an optional Description Resource URI plus one or more Statements.
-
Statement: A Property URI slot plus a set of slots grouped either in a Non-Literal Value Surrogate or a Literal Value Surrogate.
-
Non-Literal Value Surrogate: A set of slots consisting of any combination of zero to one Value URIs, zero to one Vocabulary Encoding Scheme URIs, zero to many Value Strings, each of which may be a Plain Value String (with an optional Value String Language), or a Typed Value String (with a Syntax Encoding Scheme URI).
-
Literal Value Surrogate: A slot or set of slots consisting of either a Plain Value String (with an optional Value String Language), or a Typed Value String (with a Syntax Encoding Scheme URI).
Slots in the Description Set Model
The slots in the Description Set Model map to constructs in the
RDF abstract syntax [19]. Following the
DC-RDF guidelines [18], these slots may be characterized in terms of the RDF abstract syntax as follows:
-
Described Resource URI: In RDF terms, the URI in the Described Resource URI slot identifies the resource that is the subject of a Description. When a URI is present in this slot, that URI is the subject of triples about the described resource. In the absence of a Described Resource URI (i.e., the slot is empty), a blank node is the subject of triples about the described resource.
-
Property URI: A URI identifying a property that is the predicate of an RDF triples about the described resource.
-
Value URI: A URI identifying a resource that is the object of an RDF triple about the described resource. In the absence of a Value URI (i.e., the slot is empty), the object is a blank node.
-
Vocabulary Encoding Scheme URI: A URI identifying an enumerated set of resources ("Vocabulary Encoding Scheme") to which the object of the RDF triple about the described resource belongs. This is expressed with a triple in which the subject is a reference (whether a blank node or URI) to the value resource, the predicate is the property "dcam:memberOf", and the object is the URI in the Vocabulary Encoding Scheme URI slot.
-
Plain Value String: A character string with an optional language tag, an RDF plain literal. For a Value String within a Literal Value Surrogate, the plain literal is directly the object of an RDF triple about the described resource. For a Value String within a Non-Literal Value Surrogate, the plain literal is the object of an RDF triple in which the predicate is the property "rdf:value", and the subject is a reference (whether a blank node or URI) to the value resource.
-
Value String Language: The language tag used together with the character string in the Plain Value String slot to form an RDF literal.
-
Typed Value String: A character string associated with exactly one URI in the Syntax Encoding Scheme URI slot, an RDF typed literal. For a Value String within a Literal Value Surrogate, the typed literal is directly the object of an RDF triple about the described resource. For a Value String within a Non-Literal Value Surrogate, the typed literal is the object of an RDF triple in which the predicate is the property "rdf:value", and the subject is a reference (whether a blank node or URI) to the value resource.
-
Syntax Encoding Scheme URI: A URI identifying an RDF datatype, which for legacy reasons is known in the DCMI Abstract Model as a "syntax encoding scheme".
References
-
[1] http://dublincore.org/documents/2005/03/07/abstract-model/
-
[2] http://dublincore.org/documents/2007/06/04/abstract-model/
-
[4] http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/
-
[5] http://dublincore.org/usage/documents/2003/02/07/principles/
-
[7b] http://dublincore.org/documents/2002/05/15/dcq-rdf-xml/
-
[8] http://dublincore.org/documents/abstract-model/#app-a (relationship to legacy terminology)
-
[11] http://dublincore.org/documents/2009/03/02/profile-review-criteria/
-
[12] http://dublincore.org/documents/interoperability-levels/
-
[13] https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0509&L=DC-RDF-TASKFORCE&P=R2034&I=-3
-
[14] http://web.archive.org/web/20080214232032/http://isegserv.itd.rl.ac.uk/sodc/SODC-0_2/
-
[16] http://dublincore.org/usage/documents/profile-guidelines/
-
[16a] http://dublincore.org/documents/2008/10/06/dsp-wiki-syntax/
-
[16b] http://dublincore.org/documents/2008/10/06/dsp-wiki-syntax/DescriptionSetProfile-dist.zip
-
[22] http://www.w3.org/2001/sw/wiki/index.php?title=RDF_Core_Work_Items&oldid=1990#Graph_Identification




