Notes on the W3C XML Schemas for Qualified Dublin Core
Creator: |
Pete Johnston UKOLN, University of Bath |
---|---|
Contributor: |
Tim Cole University of Illinois at Urbana-Champaign |
Contributor: |
Thomas Habing University of Illinois at Urbana-Champaign |
Contributor: |
Jane Hunter DSTC, University of Queensland |
Contributor: |
Carl Lagoze Cornell University |
Contributor: |
Andy Powell UKOLN, University of Bath |
Date Issued: | 2003-04-02 |
Identifier: | http://dublincore.org/schemas/xmls/qdc/2003/04/02/notes/ |
Replaces: | Not applicable |
Is Replaced By: | Not applicable |
Latest version: | http://dublincore.org/schemas/xmls/qdc/notes/ |
Description of document: | This document provides a brief description of a set of W3C XML Schemas which implement the XML encoding conventions described in the Guidelines for implementing Dublin Core™ in XML. |
1. Introduction
The schema presented in this document conform to the W3C XML Schema 1.0 recommendation [XMLSCHEMA]. They are designed to support the conventions for representing Dublin Core™ metadata in XML that are described in the DCMI recommendation, Guidelines for implementing DC in XML [DCXMLGUIDELINES]. These schema are suggested rather than prescribed and may co-exist with other schema for exchanging Dublin Core™ metadata. XML schema are interoperability vehicles; the greater number of applications that agree on a single schema the greater the ability to easily share Dublin Core™ metadata. It is hoped that these schemas will be useful to a breadth of applications, but it is recognized that different functionality, provided by different schema, may be required by some.
Qualified Dublin Core™
The functionality these schema support is congruent with the Dublin Core™ model of "qualification" [DCPRINCIPLES]. Applications that employ other schema that express additional functionality should recognize that doing so potentially compromises interoperability with applications that use these schema.
The three schema for the DCMI namespaces declare XML elements to represent the Dublin Core™ elements and their refinements. The container schema provided here restrict the elements in a valid instance document to
- the 15 Dublin Core™ elements [DCMES],
- the additional elements listed in the DCMI Metadata Terms recommendation (e.g., "audience") [DCTERMS],
- the element refinements listed in the DCMI Metadata Terms recommendation [DCTERMS]
The value of a DC element or refinement - the XML element content - may be associated with one of the named encoding scheme, also listed in the DCMI Metadata Terms recommendation [DCTERMS].
Application profiles
This means that so-called application profiles that mix elements from other namespaces or metadata vocabularies are not valid according to these container schema. An application profile schema may import one or more of the base schema listed here and use them in association with schema for other non-DCMI namespaces. However, implementers adopting that approach should give consideration to the implications for interoperability with applications based on the schema which specify that only Dublin Core™ elements and element refinements are valid.
"Structured values"
According to the schema, the values of XML elements representing Dublin Core™ elements and element refinements may only have simple "string" values (which may be further restricted in the manner described below), defined by the type dc:SimpleLiteral in the schema. The use of the xml:lang
attribute permits the recording of the language of the string that is the element value. Complex or structured values - i.e., the use of additional XML elements nested within the XML elements representing Dublin Core™ elements and element refinements - are not valid. By exploiting features of the XML schema specification, the proposed schema are designed so that it is possible to import the schema into an extension schema that does allow additional nested elements as values for the Dublin Core™ elements. Such extensions will not be valid according to the container schema listed here and, therefore, not interoperable except by translation methods (not yet defined here or by the DCMI).
2. The Schemas and their Use
The schemas were created jointly by an ad hoc working group of: Tim Cole (University of Illinois at Urbana-Champaign), Thomas Habing (University of Illinois at Urbana-Champaign), Jane Hunter (DSTC, University of Queensland), Pete Johnston (UKOLN, University of Bath), Carl Lagoze (Cornell University), Andy Powell (UKOLN, University of Bath)
Base schemas
These three schemas declare XML elements to represent the Dublin Core™ elements and element refinements and a number of complexTypes
to represent encoding schemes:
- Schema: dc.xsd
Target XML Namespace:http://purl.org/dc/elements/1.1/
- Schema: dcterms.xsd
Target XML Namespace:http://purl.org/dc/terms/
- Schema: dcmitype.xsd
Target XML Namespace:http://purl.org/dc/dcmitype/
Container schemas
These schemas declare XML elements to act as containers for specified subsets of the Dublin Core™ elements and element refinements declared in the base schemas:
- simpledc.xsd
Target XML Namespace: none - qualifieddc.xsd
Target XML Namespace: none
Sample application schemas
These schemas provide examples of how a container schema might be used in an application:
- appsimpledc.xsd
Target XML Namespace: [decided by application] - appqualifieddc.xsd
Target XML Namespace: [decided by application]
Schema: dc.xsd
Target XML Namespace: http://purl.org/dc/elements/1.1/
The schema dc.xsd defines a complexType
called SimpleLiteral :
<xs:complexType name="SimpleLiteral"> <xs:complexContent mixed="true"> <xs:restriction base="xs:anyType"> <xs:sequence> <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/> </xs:sequence> <xs:attribute ref="xml:lang" use="optional"/> </xs:restriction> </xs:complexContent> </xs:complexType>
The SimpleLiteral complexType
makes the xml:lang attribute available. The type is defined in terms of mixed complexContent
. However , the cardinality attributes on the xs:any
element dictate that this complexType
does not permit child elements.
The fifteen Dublin Core™ elements in this namespace are represented as XML elements. The schema declares an abstract
element any with a type of SimpleLiteral. Because it is declared as abstract
, this element can not be used in an instance document. Each XML element representing a Dublin Core™ element is declared as a non-abstract element which is substitutable for the any element e.g.
<xs:element name="title" substitutionGroup="any"/>
Finally, the schema defines a group
elementsGroup and a complexType
elementContainer. With the dc:any element, these two constructs provide mechanisms by which external schemas can reference the set of elements declared in this schema without referencing each element individually - though it is still possible for an external schema to reference individual elements if desired.
For example, a schema can simply import
the dc.xsd schema and use the elementContainer complexType
as the type of an element, and this would make the DC elements available as child elements.
<xs:import namespace="http://purl.org/dc/elements/1.1/" schemaLocation="dc.xsd"/> <xs:element name="simpledc" type="dc:elementContainer"/>
Such a schema is provided as simpledc.xsd.
The simpledc.xsd schema does not use a targetNamespace
. It is possible to validate an instance directly against this schema. DCMI makes no recommendation for the XML Namespace with which this simpledc container element is associated. Where an application wishes to specify a namespace for the container element, it can be assigned when this schema is included in an application schema.
An example of such an application schema is provided as appsimpledc.xsd.
An example of an instance document which validates against that application schema is provided as testsimpledc.xml.
An example of an instance document which fails to validate against that application schema is provided as testsimpledc2.xml. (dcterms:modified
not permitted.)
Note: You can reference the simpledc.xsd schema in your application if you wish. The appsimpledc.xsd schema, however, is provided as an example only. It uses an XML Namespace name based on a reserved DNS name (example.org
). You must create your own version of this schema.
Schema: dcterms.xsd
Target XML Namespace: http://purl.org/dc/terms/
The schema dcterms.xsd imports
the schema dc.xsd. The Dublin Core™ elements and element refinements in this namespace are all represented as XML elements, and importing the dc.xsd schema makes the any abstract
element and the SimpleLiteral complexType
available for use. Importing the dc.xsd schema also enables the indication of relationships between DC element refinements and the elements that they refine, using substitutionGroups
.
An XML element which represents a DC element in this namespace is declared as substitutable for the any abstract element:
<xs:element name="audience" substitutionGroup="dc:any"/>
And an XML element which represents a DC element refinement is declared as susbtitutable for the element it refines:
<xs:element name="alternative" substitutionGroup="dc:title"/>
Encoding schemes are mechanisms for constraining the "value spaces" of DC elements and element refinements. In this schema, they are represented as named complexTypes
derived from the SimpleLiteral complexType
. For example, the complexType
corresponding to the encoding scheme for "W3CDTF" is as follows:
<xs:complexType name="W3CDTF"> <xs:simpleContent> <xs:restriction base="dc:SimpleLiteral"> <xs:simpleType> <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/> </xs:simpleType> <xs:attribute ref="xml:lang" use="prohibited"/> </xs:restriction> </xs:simpleContent> </xs:complexType>
N.B. Some schema-validating XML parsers may not support this construct. See Appendix A.
The use of one of these complexTypes
is specified by the use of the xsi:type
attribute in the instance document. The value of the xsi:type
attribute is a QName
correponding to the name of the complexType
:
<dc:date xsi:type="dcterms:W3CDTF">2002-07-09</date>
Use of this datatype means that a validating parser will check that the element content conforms to one of the builtin date/time types.
Not all of the complexTypes
associated with encoding schemes impose such "tight" validation. For example, the complexType
for "LCSH" prescribes only that the element content is a character string:
<xs:complexType name="LCSH"> <xs:simpleContent> <xs:restriction base="dc:SimpleLiteral"> <xs:simpleType> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:attribute ref="xml:lang" use="prohibited"/> </xs:restriction> </xs:simpleContent> </xs:complexType>
In theory at least, it is possible to define a complexType
which enumerates all the possible values of a Library of Congress Subject Heading, but it would be impractical to validate against such a list. However, the principle of validating against an enumerated list of values is illustrated in the schema dcmitype.xsd for the DCMI Type Vocabulary (see next section).
An example schema which takes this approach for ISO639-2 language codes is available at http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_schemas/iso639-2.xsd.
Similarly to the dc.xsd schema, the dcterms.xsd schema defines a group
elementsAndRefinementsGroup as a means of referring to all the elements and element refinements. A complexType
elementOrRefinementContainer is also defined.
A schema can simply import
the dcterms.xsd schema and use the elementOrRefinementContainer complexType
as the type of an element, and this would make the DC elements and element refinements available as child elements.
<xs:import namespace="http://purl.org/dc/terms/" schemaLocation="dcterms.xsd"/> <xs:element name="qualifieddc" type="dcterms:elementOrRefinementContainer"/>
An example of such a schema is provided as qualifieddc.xsd.
Like the simpledc.xsd schema, the qualifieddc.xsd schema does not use a targetNamespace
. An implementation may validate directly against this schema or it may specify a namespace for the container element by including this schema in an application schema.
An example of such an application schema is provided as appqualifieddc.xsd.
An example of an instance document which validates against that application schema is provided as testqualifieddc.xml.
An example of an instance document which fails to validate against that application schema is provided as testqualifieddc2.xml. ('1963/08/17' is not a valid W3CDTF date.)
Note: As in the case of the simpledc.xsd schema, you can reference the qualifieddc.xsd schema in your application if you wish. The appqualifieddc.xsd schema, however, is provided as an example only. It uses an XML Namespace name based on a reserved DNS name (example.org
). You must create your own version of this schema.
Schema: dcmitype.xsd
Target XML Namespace: http://purl.org/dc/dcmitype/
The dcmitype.xsd includes only a named simpleType
which defines an enumerated list of values for the DCMI Type Vocabulary.
This simpleType
is referenced in a complexType
in the dcterms.xsd schema.
Appendix A : Parser Behaviour
The parsers/validators tested
- XSV 2.2-1 of 2002/12/01 21:59:33
- Xerces Xerces-J 2.2.1 2002/11/11 17:40
- MSXML4 Microsoft XML Core Services 4.0 SP1
Results
testsimpledc.xml | ||
---|---|---|
Parser | Result | Messages |
XSV | Schema and instance accepted as valid | |
Xerces | Schema and instance accepted as valid | |
MSXML4 | Schema and instance accepted as valid | |
testqualifieddc.xml | ||
XSV | Schema and instance accepted as valid | |
Xerces | Schema dcterms.xsd rejected as invalid | [Error] dcterms.xsd:nnn:nn: src-ct.2: Complex Type Definition Representation Error for type 'xxxx'. When simpleContent is used, the base type must be a complexType whose content type is simple, or, only if extension is specified, a simple type. (where 'xxxx' is the name of a complexType corresponding to one of the encoding schemes.) |
MSXML4 | Schema and instance accepted as valid |
The "dc:SimpleLiteral" problem
The schema dc.xsd defines a base complexType
called SimpleLiteral :
<xs:complexType name="SimpleLiteral"> <xs:complexContent mixed="true"> <xs:restriction base="xs:anyType"> <xs:sequence> <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/> </xs:sequence> <xs:attribute ref="xml:lang" use="optional"/> </xs:restriction> </xs:complexContent> </xs:complexType>
Encoding schemes are represented as complexTypes
derived from the SimpleLiteral complexType
. For example, the complexType
corresponding to the encoding scheme for "W3CDTF" is as follows:
<xs:complexType name="W3CDTF"> <xs:simpleContent> <xs:restriction base="dc:SimpleLiteral"> <xs:simpleType> <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/> </xs:simpleType> <xs:attribute ref="xml:lang" use="prohibited"/> </xs:restriction> </xs:simpleContent> </xs:complexType>
This derivation of a complexType
with simpleContent
by restriction of a base complexType
with complexContent
is valid under section 3.4.6 of XML Schema Part 1: Structures, specifically item 5.1.2 of the section "Schema Component Constraint: Derivation Valid (Restriction, Complex)", because the base complexContent
is mixed and emptiable.
This was confirmed by Henry Thompson, see e.g.
http://www.w3.org/2001/05/xmlschema-rec-comments#pfiSimpleContent
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0005.html
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0008.html
Conclusion: Xerces appears to be behaving incorrectly in rejecting this derivation.
References
[XMLSCHEMA] XML Schema
http://www.w3.org/XML/Schema
[DCXMLGUIDELINES] Guidelines for implementing Dublin Core™ in XML
http://dublincore.org/documents/dc-xml-guidelines/
[DCPRINCIPLES] DCMI Grammatical Principles
http://dublincore.org/usage/documents/principles/
[DCMES] Dublin Core™ Metadata Element Set, Version 1.1: Reference Description
http://dublincore.org/documents/dces/
[DCTERMS] DCMI Metadata Terms
http://dublincore.org/documents/dcmi-terms/