Dublin Core metadata in RDF: implications of new guidelines for legacy implementations
About this note
DCMI is currently considering the assignment of domains and ranges to
DCMI metadata terms. Such a step would have important implications for
the interpretation of legacy metadata. This note presents a high-level
view of the issue and its implications. No such changes
will be undertaken by DCMI until their impact has been well understood
and discussed in a public comment period. Implementers
with an opinion about the issues presented here are invited to
participate in discussion on the DCMI Architecture Working Group mailing
list (
http://www.jiscmail.ac.uk/lists/dc-architecture.html).
The addition of domains and ranges would help clarify the semantics of DCMI properties in a formal sense. It should be noted, however, that this would have practical consequences only for the creation and interpretation of Dublin Core metadata in RDF. Metadata creators would need to add a few extra angle brackets to ensure that RDF-consuming applications interpret value strings as properties of nodes; and metadata consumers might need to "special-case" the processing of value strings associated directly with Dublin Core properties (i.e., without intervening nodes). The generation of Dublin Core metadata in RDF would become slightly more complex for anyone producing metadata by hand. However, these measures would eliminate the current ambiguity, enabling metadata that is mappable more consistently to the DCAM and improved support by tools thanks to machine-processable restrictions.
The expression of Dublin Core metadata in the other recommended formats
recommended by DCMI -- i.e., "Expressing Dublin Core in HTML/XHTML meta
and link elements" (
http://dublincore.org/documents/dcq-html) and the existing "Guidelines for implementing Dublin Core in XML" (
http://dublincore.org/documents/dc-xml-guidelines)
-- would not be negatively affected by these developments. The rules
for interpreting metadata in these syntaxes in terms of the DCAM are
simpler than for RDF, as they are not bound by the semantics of RDF.
Historical context
Since 1997, the "Dublin Core data model" has evolved in a process of
mutual influence with W3C's Resource Description Framework (RDF). This
process has resulted in the DCMI Abstract Model, which was published in
2005 as a DCMI Recommendation ([http://dublincore.org/documents/abstract-model/ (1)]. The DCMI Abstract Model now provides a reference model on the basis of which particular DC encoding guidelines (
(2)) can be defined.
DCMI currently has two specifications for expressing Dublin Core
metadata in RDF. The first, "Expressing Simple Dublin Core in RDF/XML",
or "DC-Simple-in-RDF" for short (
(3)) became a DCMI Recommendation in 2002. The second, "Expressing Qualified Dublin Core in RDF/XML", or "DC-Qualified-in-RDF" (
(4)), has been a DCMI Proposed Recommendation since 2002.
Current developments
The RDF Task Force of the DCMI Architecture Working Group is currently drafting a document which is intended to replace these two legacy specifications with a single consolidated and updated DCMI Recommendation for expressing Dublin Core in RDF ([http://dublincore.org/architecturewiki/DCRDFGuidelines (5)]).
This process has important implications for how DCMI "defines" its
metadata terms. DCMI metadata terms have hitherto been defined entirely
in natural language; the RDF expression of the DCMI term set (e.g.,
(6))
served essentially to convey these English-language definitions in a
form ingestable by RDF applications. As part of the process of
clarifying the RDF expression for Dublin Core metadata, the RDF Task
Force has recommended that DCMI supplement these English-language
definitions with machine-understandable definitions of the "domain" and
"range" of DCMI metadata terms ([http://dublincore.org/architecturewiki/DCPropertyDomainsRanges
(5)]). Such additional, machine-understandable precision is necessary
as Dublin Core is deployed in the context of inference engines and
ontology-based solutions.
For most DCMI metadata terms, the process of clarifying domains and ranges machine-understandably is straightforward and unambiguous. However, one problem with regard to legacy metadata usage is serious enough to bear closer consideration. In the early years, the Dublin Core community distinguished between Simple and Qualified Dublin Core -- a distinction which was reflected in the difference between the specifications "DC-Simple-in-RDF" and the "DC-Qualified-in-RDF".
The two legacy specifications differ with regard to whether properties such as dc:creator and dc:date have values that are resources (e.g., a Person or a Date, seen as entities), or strings representing the resources (i.e., a value string). In "DC-Simple-in-RDF", a dc:creator is a name:
<http://www.example.com> dc:creator "John Smith".
In "DC-Qualified-in-RDF", in contrast, a dc:creator is an entity, as in:
<http://www.example.com> dc:creator <http://www.example.org/person32>
or
<http://www.example.com> dc:creator _:xxx. _:xxx rdf:type foaf:Person _:xxx dcrdf:valueString "John Smith"
These two contrasting approaches may be pictured as follows:
|
|
| Figure 1 | Figure 2 |
The current draft DC-in-RDF specification under development follows the latter approach -- dc:creator refers to an entity which can be identified (e.g., in an authority file) and described in its own right (e.g., with a name, an affiliation, and a birth date). The English-language definitions of these terms bear out this interpretation; dc:creator is "an entity primarily responsible for making the content of the resource", examples being "a person, an organization, or a service". However, the usage comments associated with these definitions also reflect the ambiguity: "Typically, the name of a Creator should be used to indicate the entity".
In accordance with the current approach, the DCMI Usage Board would assign a range of "Agent" to dc:creator and dc:contributor, where "Agent" would be defined as "the class of all things that are a Person, Organization, or Service". If it is used at all, the range "Literal" would apply only to metadata terms which are typically associated with value strings, such as dc:title. In most cases, the ranges to be defined are reasonably obvious given usage patterns in practice.
Due to the ambiguous usage of dc:creator and dc:contributor over the years, however, the assignment of any range would make one or another part of the legacy metadata appear invalid in the context of machine processing. Declaring "Agent" as the range of dc:creator will mean that inferencing applications will expect to treat the value of the dc:creator property as an entity. Where metadata represents names as literal values for dc:creator, applications will need to treat these as "special cases" in order to merge them with metadata which associate those names with the expected entity constructs.
The existing specifications from the DCMI have not taken these applications into account, which has resulted in an unknown amount of Dublin Core-based RDF data that is inconsistent with the definitions of the Dublin Core properties. The DC-RDF taskforce has judged that the mentioned changes are necessary, albeit painful, to ensure the long-term viability of Dublin Core in RDF.
References
-
Expressing Simple Dublin Core in RDF/XML, DCMI Recommendation, 2002-07-31
-
Expressing Qualified Dublin Core in RDF / XML, DCMI Proposed Recommandation, 2002-05-15
-
Expressing Dublin Core Metadata in RDF, Working Draft
-
Release Notes for above draft.