DCMI Abstract Model
Creator: |
Andy Powell UKOLN, University of Bath, UK Mikael Nilsson KMR Group, CID, NADA, KTH (Royal Institute of Technology), Sweden Ambjörn Naeve KMR Group, CID, NADA, KTH (Royal Institute of Technology), Sweden Pete Johnston UKOLN, University of Bath, UK |
---|---|
Date Issued: | 2005-01-31 |
Identifier: | http://dublincore.org/specifications/dublin-core/abstract-model/2005-01-31/ |
Replaces: | http://dublincore.org/specifications/dublin-core/abstract-model/2004-12-08/ |
Is Replaced By: | http://dublincore.org/specifications/dublin-core/abstract-model/2005-03-07/ |
Latest Version: | http://dublincore.org/specifications/dublin-core/abstract-model/ |
Status of Document: | This is a DCMI Proposed Recommendation. |
Description of Document: | This document describes an abstract model for DCMI metadata descriptions. |
Table of contents
- Introduction
- DCMI abstract model
- Descriptions, description sets and records
- Values
- Dumb-down
- Encoding guidelines
- Terminology
References
Acknowledgements
Appendix A - A note about structured values
Appendix B - The abstract model and RDF
Appendix C - The abstract model and XML
Appendix D - The abstract model and XHTML
1. Introduction
This document specifies an abstract model for DCMI metadata [DCMI]. The primary purpose of this document is to provide a reference model against which particular DC encoding guidelines can be compared. To function well, a reference model needs to be independent of any particular encoding syntax. Such a reference model allows us to gain a better understanding of the kinds of descriptions that we are trying to encode and facilitates the development of better mappings and translations between different syntaxes.
This document is primarily aimed at the developers of software applications that support Dublin Core™ metadata, people involved in developing new syntax encoding guidelines for Dublin Core™ metadata and those people developing metadata application profiles based on the Dublin Core™.
2. DCMI abstract model
The abstract model of the resources being described by DCMI metadata descriptions is as follows:
- Each resource has zero or more property/value pairs.
- Each property/value pair is made up of one property and one value.
- Each value is a resource (the physical or conceptual entity that is associated with a property when it is used to describe a resource).
- Each resource may be a member of one or more classes.
- Each property and class has some declared semantics.
- Each class may be related to one or more other classes by a refines (sub-class) relationship (where the two classes share some semantics such that all resources that are members of the sub-class are also members of the related class).
- Each property may be related to one or more other properties by a refines (sub-property) relationship (where the two properties share some semantics such that whenever a resource is related to a value by the sub-property, it follows that the resource is also related to that same value by the property).
The abstract model of DCMI metadata descriptions is as follows:
- A description is made up of one or more statements (about one, and only one, resource) and zero or one resource URI (a URI reference that identifies the resource being described).
- Each statement instantiates a property/value pair and is made up of a property URI (a URI reference that identifies a property), zero or one value URI (a URI reference that identifies a value of the property), zero or one vocabulary encoding scheme URI (a URI reference that identifies the class of the value) and zero or more value representations of the value.
- The value representation may take the form of a value string or a rich representation.
- Each value string is a simple, human-readable string that is a representation of the resource that is the value of the property.
- Each value string may have an associated syntax encoding scheme URI that identifies a syntax encoding scheme.
- Each value string may have an associated value string language that is an ISO language tag (e.g. en-GB).
- Each rich representation is some marked-up text, an image, a video, some audio, etc. or some combination thereof that is a representation of the resource that is the value of the property.
- Each value may be the subject of a separate related description.
The italicized words and phrases used above are defined in the terminology section below. A number of things about the model are worth noting:
- A related description describes a related resource and is therefore not part of the description - for example, a related description may provide metadata about the person that is the creator of the described resource.
- Syntax encoding schemes are also known as 'datatypes' in some contexts.
- Each resource may be a member of one or more classes. Note that where the resource is a value, the class is referred to as a vocabulary encoding scheme.
- In DCMI metadata descriptions, the class of the resource being described is normally indicated by the value of the DC Type property.
The DCMI abstract model for resources and descriptions is represented as UML class diagrams [UML] in figures 1 and 2.
**Figure 1 - the DCMI resource model** **Figure 2 - the DCMI description model**Readers that are not familiar with UML class diagrams should note that lines ending in a block-arrow should be read as 'is' or 'is a' (for example, 'a vocabulary encoding scheme is a class') and that lines starting with a block-diamond should be read as 'contains a' or 'has a' (for example, 'a statement contains a property URI'). Other relationships are labeled appropriately. The classes represented by the clear boxes are not mentioned explicitly in the textual description of the abstract model above but are discussed in Appendix A. Note that the UML modeling used here shows the abstract model but is not intended to form a suitable basis for the development of DCMI software applications.
3. Descriptions, description sets and records
The abstract model described above indicates that each DCMI metadata description describes one, and only one, resource. This is commonly referred to as the one-to-one principle.
However, real-world metadata applications tend to be based on loosely grouped sets of descriptions (where the described resources are typically related in some way), known here as description sets. For example, a description set might comprise descriptions of both a painting and the artist. Furthermore, it is often the case that a description set will also contain a description about the description set itself (sometimes referred to as 'admin metadata' or 'meta-metadata').
Description sets are instantiated, for the purposes of exchange between software applications, in the form of metadata records, according to one of the DCMI encoding guidelines (XHTML meta tags, XML, RDF/XML, etc.) [DCMI-ENCODINGS].
This document defines a description set and a DCMI metadata record as follows:
- A description set is a set of one or more descriptions about one or more resources.
- A DCMI metadata record is a description set that is instantiated according to one of the DCMI encoding guidelines (XHTML meta tags, XML, RDF/XML, etc.)
4. Values
A DCMI metadata value is the physical or conceptual entity that is associated with a property when it is used to describe a resource. For example, the value of the DC Creator property is a person, organization or service - a physical entity. The value of the DC Date property is a point (or range) in time - a conceptual entity. The value of the DC Coverage property may be a geographic region or country - a physical entity. The value of the DC Subject property may be a concept - a conceptual entity - or a physical object or person - a physical entity. Each of these entities is a resource.
The value may be identified using a value URI; the value may be represented by one or more value strings and/or rich representations; the value may have some related descriptions - but the value is a resource.
5. Dumb-down
The notions of 'simple DC' and 'qualified DC' are widely used within DCMI documentation and discussion fora. This document does not present a definitive view of what these phrases mean because their usage is somewhat variable. However, in general terms, the phrase 'simple DC' is used to refer to DC metadata that does not make any use of encoding schemes and element refinements and in which each statement only contains a value string while the phrase 'qualified DC' is used to refer to metadata that makes use of all the features of the abstract model described here.
The process of translating qualified DC into simple DC is normally referred to as 'dumbing-down'. The process of dumbing-down can be separated into two parts: property dumb-down and value dumb-down. Furthermore, each of these processes can be approached in one of two ways. Informed dumb-down takes place where the software performing the dumb-down algorithm has knowledge built into it about the property relationships and values being used within a specific DCMI metadata application. Uninformed dumb-down takes place where the software performing the dumb-down algorithm has no prior knowledge about the properties and values being used.
Based on this analysis, it is possible to outline a 'dumb-down algorithm' matrix, shown below:
Element dumb-down | Value dumb-down | |
---|---|---|
Uninformed | Discard any statement in which the property URI identifies a property that isn't in the Dublin Core™ Metadata Element Set [DCMES]. | Use value URI (if present) or value string as new value string. Discard any related descriptions and rich representations. Discard any encoding scheme URIs. |
Informed | Recursively resolve sub-property relationships until a recognised property is reached and substitute the property URI of that property for the existing property URI in the statement. If no recognised property is reached, then discard the statement. (In many cases, this process stops when a property is reached that is not an element refinement.) | Use knowledge of any rich representations, related descriptions or the value string to create a new value string. |
Note that software should make use of the DCMI term declarations represented in RDF schema language [DC-RDFS], the DC XML namespace URIs [DC-NAMESPACES] and the appropriate DCMI encoding guidelines (XHTML meta tags, XML, RDF/XML, etc.) [DCMI-ENCODINGS] to automate the resolution of sub-property relationships.
In cases where software is dumbing-down a description set containing multiple descriptions, it may either generate several 'simpler' descriptions (one per description in the original description set) or a single 'simple' description (in which case it will have to determine which is the 'primary' description in the original description set). This is an application-specific decision.
6. Encoding guidelines
Particular encoding guidelines (HTML meta tags, XML, RDF/XML, etc.) [DCMI-ENCODINGS] do not need to encode all aspects of the abstract model described above. However, DCMI recommendations that provide encoding guidelines should refer to the DCMI abstract model and indicate which parts of the model are encoded and which are not. In particular, encoding guidelines should indicate the mechanism by which resource URIs and value URIs are encoded. Note that the abstract model does not indicate that a value string with an associated http://purl.org/dc/terms/URI syntax encoding scheme should be treated as a value URI or resource URI. Encoding guidelines should provide an explicit mechanism for encoding these features of the model. Encoding guidelines should also indicate whether any rich representations or related descriptions associated with a statement are embedded within the record or are encoded in a separate record and linked to it using a URI reference.
Appendices B, C and D below provide a summary comparison between the abstract model and the RDF/XML, XML and XHTML encoding guidelines.
7. Terminology
This document uses the following terms:
- class
- A class is a group containing members that have attributes, behaviours, relationships or semantics in common; a kind of category.
- class URI
- A class URI is a URI reference that identifies a class.
- description
- A description is made up of one or more statements about one, and only one, resource.
- description set
- A description set is a set of one or more descriptions about one or more resources.
- element
- Within DCMI, element is typically used as a synonym for property. However, it should be noted that the word element is also commonly used to refer to a structural markup component within an XML document.
- element refinement
- An element refinement is a property of a resource that shares the meaning of a particular DCMI property but with narrower semantics. Since element refinements are properties, they can be used in metadata descriptions independently of the properties they refine. In DCMI practice, an element refinement refines just one parent DCMI property.
- encoding scheme
- Encoding scheme is the generic name for vocabulary encoding scheme and syntax encoding scheme.
- encoding scheme URI
- The generic name for a vocabulary encoding scheme URI or a syntax encoding scheme URI.
- marked-up text
- A string that contains HTML, XML or other markup (for example TeX) and that is associated with the value of a property.
- property
- A property is a specific aspect, characteristic, attribute, or relation used to describe resources.
- property URI
- A property URI is a URI reference that identifies a single property.
- property/value pair
- A property/value pair is the combination of a property and a value, used to describe a resource.
- qualifier
- Qualifier was the generic name used for the terms that are now usually referred to specifically as element refinements or encoding schemes.
- record
- A record is a description set that is instantiated according to one of the DCMI encoding guidelines (XHTML meta tags, XML, RDF/XML, etc.)
- related description
- A related description is a description of a resource that is related to the resource being described.
- resource
- A resource is anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, concepts and bound books in a library can also be considered resources.
- resource URI
- A resource URI is a URI reference that identifies a single resource.
- rich representation
- Some marked-up text, an image, a video, some audio, etc. (or some combination thereof) that is associated with the value of a property.
- statement
- A statement is made up of a property URI (a URI reference that identifies a property), zero or one value URI (a URI reference that identifies a value of the property), zero or one vocabulary encoding scheme URI (a URI reference that identifies the class of the value) and zero or more value representations of the value.
- structured value
-
Structured value is the generic name for the following:
- A value string that contains machine-parsable component parts (and which has an associated syntax encoding scheme that indicates how the component parts are encoded within the string).
- Some marked-up text.
- A related description
- syntax encoding scheme
- A syntax encoding scheme indicates that the value string is formatted in accordance with a formal notation, such as "2000-01-01" as the standard expression of a date.
- syntax encoding scheme URI
- A syntax encoding scheme URI is a URI reference that identifies a syntax encoding scheme. For all DCMI recommended encoding schemes, the URI reference is constructed by concatenating the name of the encoding scheme with the http://purl.org/dc/terms/ namespace URI.
- term
- The generic name for a property (i.e. element or element refinement), vocabulary encoding scheme, syntax encoding scheme or concept taken from a controlled vocabulary (concept space).
- term URI
- The generic name for a URI reference that identifies a term.
- value
- A value is the physical or conceptual entity that is associated with a property when it is used to describe a resource.
- value URI
- A value URI is a URI reference that identifies the value of a property.
- value representation
- A value representation is a surrogate for (i.e. a representation of) the value.
- value string
- A value string is a simple string that represents the value of a property. In general, a value string should not contain any marked-up text.
- value string language
- The value string language indicates the language of the value string.
- vocabulary encoding scheme
- A vocabulary encoding scheme is a class that indicates that the value of a property is taken from a controlled vocabulary (or concept-space), such as the Library of Congress Subject Headings.
- vocabulary encoding scheme URI
- A vocabulary encoding scheme URI is a URI reference that identifies a vocabulary encoding scheme. For all DCMI recommended encoding schemes, the URI reference is constructed by concatenating the name of the encoding scheme with the http://purl.org/dc/terms/ namespace URI.
References
- DCMI
- Dublin Core™ Metadata Initiative
<http://dublincore.org/> - UML
- The Unified Modeling Language User Guide
Grady Booch, James Rumbaugh and Ivar Jacobson, Addison-Wesley, 1998 - DCTERMS
- DCMI Metadata Terms
<http://dublincore.org/specifications/dublin-core/dcmi-terms/> - DCMES
- Dublin Core™ Metadata Element Set, Version 1.1: Reference Description
<http://dublincore.org/specifications/dublin-core/dces/> - DCMI-ENCODINGS
- DCMI Encoding Guidelines
<http://dublincore.org/schemas/> - DC-RDFS
- DCMI term declarations represented in RDF schema language
<http://dublincore.org/schemas/rdfs/> - DC-NAMESPACES
- Namespace Policy for the Dublin Core™ Metadata Initiative (DCMI)
<http://dublincore.org/specifications/dublin-core/dcmi-namespace/>
Acknowledgements
Thanks to Tom Baker, Rachel Heery, the members of the DC Usage Board and the members of the DC Architecture Working Group for their comments on previous versions of this document.
Appendix A - A note about structured values
This appendix discusses 'structured values', as they are used in DC metadata applications at the time of writing.
Many existing applications of DC metadata have attempted to encode relatively complex 'value representations' (i.e. representations that are not just a simple string). These attempts have been loosely referred to as 'structured values'. It is possible to identify a number of different kinds of structured values that have been commonly used. Four are enumerated below. The first two of these are recommended by the DCMI, in the sense that there are existing encoding schemes that define values that conform to these definitions of structured values. The latter two are not currently recommended, but it is likely that they are in fairly common usage across metadata applications worldwide.
Labelled strings
These are strings that contain explicitly labeled components. Examples of this kind of structured value include:
- DCSV
- and the various DCMI syntax encoding schemes built on it - Period, Point and Box. An example of the use of DCSV in Period is:
<meta name="dcterms:temporal" scheme="dcterms:Period" content="start=Cambrian period; scheme=Geological timescale; name=Phanerozoic Eon;" />
- vCard
- for example:
<meta name="dc:creator" content="BEGIN:VCARD\nORG:University of Oxford\nEND:VCARD\n" />
Note that vCard is not currently a DCMI recommended encoding scheme.
Unlabeled strings
These are strings that contain implicit components within the string, i.e. the components are determined based solely on their position within the string. Examples of this kind of structured value include:
- W3CDTF
- the date-time format used within most DC metadata. For example:
<meta name="dc:date" scheme="dcterms:W3CDTF" content="2003-06-10" />
Marked-up text
These are strings containing 'presentational' or other markup, for example adding paragraph breaks, superscripts or chemical/mathematical markup to a dc:description. It is possible to characterize various kinds of markup as follows:
- Markup based on a version of HTML.
- Markup based on other XML-based languages such as CML and MathML.
- Non-XML markup languages such as TeX.
Related resource descriptions
These are metadata descriptions that describe a second resource (i.e. not the resource being described by the DC description). For example, a related description associated with the value of dc:creator could contain a complete description of the resource author (including birthday, eye-colour and favourite beverage if desired!).
In the past, 'related resource descriptions' have tended to be encoded using XML, vCard (see above) or by inventing multiple 'refinements' of DCMES properties (for example DC.Creator.Address). The RDF/XML encoding of DC (see below) provides us with a more thorough modeling of related metadata records through the use of multiple linked nodes in an RDF graph.
Summary
The categories outlined above are not watertight and there are certainly overlaps between them. For example, labeled strings can be viewed as a type of non-XML markup language. In addition, there will be cases where marked-up text (e.g. MathML) can be viewed as a related resource description.
Nevertheless, the purpose of the categorization used here is to try and analyze existing usage of complex metadata structures within current DC metadata applications. In the context of the abstract model proposed here, all the types of structured values outlined above form part of the DCMI abstract model:
- A labeled string should be treated as a related description (though it should be noted that DCSV and the various DCMI syntax encoding schemes built on it - Period, Point and Box - are currently encoded as value strings with an appropriate syntax encoding scheme).
- An unlabeled string should be treated as a value string with an appropriate syntax encoding scheme.
- Marked-up text should be treated as a rich representation.
- A related resource description should be treated as a related description.
Appendix B - The abstract model and RDF
This appendix discusses the relationship between the DCMI abstract model and the Resource Description Framework (RDF).
RDF currently provides DCMI with the richest encoding environment of the available encoding syntaxes. It is therefore worth taking a brief look at how the abstract model described here compares with the RDF model.
Note that the intention here is not to provide a full and detailed description of how to encode DC metadata records in RDF. Instead, three simple examples of the use of DC in RDF are considered.
Example 1: dc:creator
Figure 3 shows a simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:creator). The value of the property is a second (blank) node, representing the creator of the resource. This second blank node has several properties, used to describe the creator, and an rdfs:label property that is used to provide the value string for the dc:creator property. |
|
Figure 4 shows the same information separated into two graphs. In this case the related description that describes the creator has been more clearly separated from the description of the resource by moving it into a separate RDF/XML document. In order to do this, the node representing the value has been assigned a value URI, allowing the two nodes in the two RDF/XML documents to be treated as representing the same thing. The related description in the second RDF/XML document is linked to the first using the rdfs:seeAlso property and the URI of the RDF/XML document. Note that it is not strictly necessary to separate the two graphs in this way; it is perfectly valid to represent the second graph as a sub-graph of the first, as shown in figure 3. However, for the purposes of this document, the two graphs have been separated in order to more clearly differentiate the description from the related description. In some cases it will be good practice to facilitate this separation anyway. For example, in order to serve the second graph from a directory service of some kind. |
|
Example 2: dc:subject
Figure 5 shows a second simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:subject). The value of the property is a second (blank) node, representing the subject of the resource. This second blank node has an rdfs:label property that is used to provide the value string for the dc:subject property, an rdf:value property that is used to provide the classification scheme notation and an rdf:type property to provide the encoding scheme URI. |
|
Figure 6 shows the same information separated into two graphs. In this case the related description that describes the subject has been more clearly separated from the description of the resource by moving it into a separate RDF/XML document. In order to do this, the node representing the value has been assigned a value URI, allowing the two nodes in the two RDF/XML documents to be treated as representing the same thing. The related description in the second RDF/XML document is linked to the first using the rdfs:seeAlso property and the URI of the RDF/XML document. Note that it is not strictly necessary to separate the two graphs in this way; it is perfectly valid to represent the second graph as a sub-graph of the first, as shown in figure 5. However, for the purposes of this document, the two graphs have been separated in order to more clearly differentiate the description from the related description. In some cases it will be good practice to facilitate this separation anyway. For example, in order to serve the second graph from a terminology service of some kind. |
|
Example 3: dc:description
Figure 7 shows a third simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:description). The value of the property is a second (blank) node with an rdfs:label property that is used to provide the value string for the dc:description property. The second node also has an rdfs:seeAlso property that links to a rich representation - in this case some HTML marked-up text that provides a richer representation of the description. Note that it is possible to embed the marked-up text within a single RDF graph (using rdf:parseType="Literal"). However, this is not shown here. |
|
Summary
By re-visiting the second figure from example 2 (figure 6) it is possible to layer the terminology used in the abstract model above over the RDF graph. Almost all aspects of the DCMI abstract model are supported by the RDF encoding guidelines though, at the time of writing, some issues about how best to handle description sets still need to be resolved. |
|
Appendix C - The abstract model and XML
This appendix compares the DCMI abstract model with the Guidelines for implementing Dublin Core™ in XML DCMI recommendation.
Simple DC
**Figure 9**Figure 9 shows an example simple DC description encoded according to the XML guidelines above. The example shows how the encoding supports the property URI, value string and value string language aspects of the DCMI abstract model. It should be noted that all the values that are encoded in this syntax are represented by value strings, even those that look, to the human reader, as though they are URIs.
Qualified DC
**Figure 10**Figure 10 shows an example qualified DC description encoded according to the XML guidelines above. This example shows how the encoding supports the property URI, value string, value string language, encoding scheme URI and resource class aspects of the DCMI abstract model. Note also that, although the resource class is indicated, the class URI is not encoded anywhere in this description.
Summary
The following aspects of the DCMI abstract model are supported by the Guidelines for implementing Dublin Core™ in XML recommendation:
- properties
- property URIs
- value strings
- value string languages
- encoding schemes
- encoding scheme URIs
- resource classes
The following aspects of the DCMI abstract model are not supported:
- resource URIs
- value URIs
- rich representations
- related descriptions
- property/sub-property relationships
- resource class URIs
The following constraints apply:
- Each property may have one value string (but not more than one).
- Vocabulary encoding schemes and syntax encoding schemes are handled in exactly the same way.
Note that, at the time of writing, neither resource URIs nor value URIs can be explicitly encoded in the XML encoding syntax. Although it may be the case that some software applications have chosen to interpret the use of a http://purl.org/dc/terms/URI syntax encoding scheme as an indication that the URI in the value string is a resource URI or value URI, this is not guaranteed to be a correct interpretation of the metadata record in all cases.
Appendix D - The abstract model and XHTML
This appendix compares the DCMI abstract model with the Expressing Dublin Core™ in HTML/XHTML meta and link elements DCMI recommendation.
Simple DC
**Figure 11**Figure 11 shows an example simple DC description encoded according to the XHTML guidelines above. This example shows how the encoding supports the property URI, value string, value string language and value URI aspects of the DCMI abstract model. Again, it should be noted that the value of the DC Identifier property represented in this encoding syntax is denoted by a value string, even though it looks, to the human reader, as though it is a URI.
Qualified DC
**Figure 12**Figure 12 shows an example qualified DC description encoded according to the XHTML guidelines above. This example shows how the encoding supports the property URI, value string, value string language, value URI, encoding scheme URI and resource class aspects of the DCMI abstract model. Note that although the resource class is indicated, the class URI is not encoded anywhere in this description. Finally, note that although the http://purl.org/dc/terms/URI syntax encoding scheme means that software can reliably interpret the DC Identifier value string as a URI, it should not be interpreted as a resource URI.
Summary
The following aspects of the DCMI abstract model are supported by the Expressing Dublin Core™ in HTML/XHTML meta and link elements DCMI recommendation:
- properties
- property URIs
- value strings
- value string languages
- value URIs
- encoding schemes
- encoding scheme URIs
- resource classes
The following aspects of the DCMI abstract model are not supported:
- resource URIs
- rich representations
- related descriptions
- property/sub-property relationships
- resource class URIs
The following constraints apply:
- Each property may have one value string (but not more than one) or a value URI but not both.
- Vocabulary encoding schemes and syntax encoding schemes are handled in exactly the same way.
Note that, at the time of writing, resource URIs cannot be explicitly encoded in the XHTML encoding syntax. However, the resource URI may be implicit from the URI of the resource into which the record is embedded.