| |
|
Title:
|
Recording qualified Dublin Core
metadata in HTML
|
|
Creator:
|
|
|
Date Issued:
|
1999-08-16
|
|
Identifier:
|
|
|
Replaces:
|
Not Applicable
|
|
Is Replaced By:
|
Not Applicable
|
|
Latest version:
|
Not Applicable
|
|
|
|
|
Status of document:
|
This document is a NOTE made
available by the Dublin Core Metadata Inititive Directorate
for discussion only. The publication of a NOTE by the
Dublin Core implies no endorsement of any kind.
|
| Description of document: |
We describe a notation for recording
qualified Dublin Core metadata in HTML meta elements. The
syntax includes recommended usage of the standard HTML syntax
to record the different classes of qualification needed to
represent the model. |
|
Document
metadata:
|
|
|
- Introduction
- Qualified Dublin Core model
- HTML notation
- Recommendation
- Examples
- Acknowledgments
- References
1. Introduction
The 15 elements from Dublin Core version 1 [DCRFC]
allow much descriptive information about resources to be recorded.
The most widespread method for transporting Dublin Core metadata
is probably by embedding it in <meta > elements in the <head>
of HTML documents [DCHTML]. For example,
this document might be described in part with the metadata
<html>
<head>
<title>qDC in HTML</title> <link rel="schema.DC" href="http://purl.org/dc/elements/1.0/"> <meta name="DC.Title" content="Recording qualified Dublin Core metadata in HTML"> <meta name="DC.Description" content="We describe a notation ... "> <meta name="DC.Creator" content="Simon Cox"> <meta name="DC.Contributor" content="Renato Ianella"> <meta name="DC.Contributor" content="Kim Covil"> <meta name="DC.Publisher" content="DCMI"> <meta name="DC.Subject" content="metadata, Dublin Core, HTML"> <meta name="DC.Relation" content="http://purl.org/dc/"> <meta name="DC.Relation" content="http://www.w3.org/TR/REC-html40/"> <meta name="DC.Language" content="en-AU"> <meta name="DC.Format" content="text/html"> <meta name="DC.Date" content="1999-08-09">
[...]
</head> <body>
<h1>Recording qualified Dublin Core metadata in HTML</h1>
[...]
</body>
</html>
Each <meta > element effectively captures a statement that
the present resource (i.e. the page containing this metadata)
has a metadata value (labelled "content") for
a property indicated by the metadata element (labelled
"name").
However it is often desirable to refine the meanings of the elements
in particular instances. For example, it may be desirable to restrict
the semantics of the property or relationship between the resource
and the metadata value, for example by specifying the exact type
of contribution that a contributor made to the resource. It may
be useful to indicate that the value has been selected from a
particular controlled vocabulary, such as a list of keywords,
or is encoded using a particular convention - the format for dates
is an important case - or in a particular natural language. The
metadata value may itself be usefully represented as a compound
object, such as addresses where components like street, locality
and post-code, can be clearly recorded separately. These refinements
are encompassed in an extended datamodel for qualified Dublin
Core metadata (qDC) [DC-datamodel]. A simplified
guide to qDC, including many diagrams and examples, is also available
[DCmodel-guide].
There are a variety of ways of recording qualified DC metadata.
The canonical form used by the datamodel working group uses the
XML expression of the RDF datamodel [qDC-RDF].
However, given the popularity of HTML and the widespread availability
of tools for preparing and processing it, including software to
harvest and use metadata transported in this way, it is desirable
to have a way to record qDC within this document format.
HTML is syntactically limited in comparison to XML [XML].
Nevertheless, suitable conventions regarding the content of attributes
of <meta > elements permit the recording of most aspects
of qualified DC. In this note we describe the methods provided
directly by HTML-4 [HTML4], and propose some
conventions which extend the built-in syntax to encompass most
of the requirements of the qDC model. These conventions concern
a notation using certain punctuation characters in both the NAME
and CONTENT attributes.
2. Qualified Dublin Core model
In the Dublin Core datamodel [DC-datamodel]
qualification may occur in several ways.
A. Element Qualifiers
These modify the property by using additional terms to refine
the element.
Examples:
- Contributor might also have a "role" to indicate
the nature of the contribution (illustrator, editor, collator,
etc)
- Relation will usefully have a "type" to
indicate the nature of the relationship between the resources
(isBasedOn, isFormatOf, hasPart, etc)
B. Value Qualifiers
These indicate how the value is to be interpreted, by referring
to
- an authority for the terms chosen as values of DC metadata
elements. This is normally through a controlled list
of valid terms.
Examples:
- MIME types [MIME] for Format
- LCSH [LCSH] for Subject
- RFC 1766 [RFC1766] for Language
- a notation or language (ie an encoding syntax) used for the
value.
Examples:
- ISO8601 [ISO8601] for Date
- a term selected from RFC 1766 [RFC1766]
as the language for any of the element-values recorded in
plain-text, such as Title, Description
- URIs [URI] or ISBNs [ISBN]
for resource identifiers used in Identifer, Relation
- vCard [vCard] for information about
people or organisations for used for values of Creator,
Contributor, Publisher
C. Value Components
A metadata value may itself have structure, typically through
components which are labelled either explicitly or implicitly
(e.g. according to position within the string).
Examples:
- dates and times have components corresponding to year, month,
day, hour, minute, second
- the dimensions of an image or 3D object contain measurements
on multiple axes (e.g. height, breadth, depth)
- information about people and organisations is commonly split
into components, such as the name of the agent, and their contact
address broken up into street, locality, region, postcode, country,
telephone, email etc.
3. HTML notation
HTML <meta > elements allow a simple list of metadata
to be recorded, which accomodate the basic version of Dublin Core.
All data must be contained in text strings within the values of
the attributes of these elements.
The DC element-name and value are recorded in the NAME
and CONTENT attributes [DCHTML].
In order to have the text and labels that we use for the different
classes of qualification for DC [above] clearly
distinguished within the metadata statements, we must find three
additional positions in the notation.
A. Element Qualifiers
Element Qualifiers are not supported directly in HTML <meta
> elements.
To accomodate Element Qualifiers, dots (.) are used to append
qualifiers to DC element names, which are stored in the NAME
attribute text string. Multiple qualifiers may be appended separated
by dots to create a hierachical qualification scheme. This follows
much existing practice [DCHTML].
B. Value Qualifiers
Value Qualifiers are supported directly in HTML-4, using two additional
attributes of the <meta > element - SCHEME and LANG.
LANG is used in cases where the value is in plain-text,
and SCHEME otherwise. Where a SCHEME or LANG
is specified, then the value must be encoded in the CONTENT
according to that scheme, including the use of any structure or
punctuation.
C. Value Components
Values encoded according to many schemes have a semantic structure.
This is often indicated using punctuation within a text-string,
which can therefore be used directly in the value recorded in
the CONTENT of the HTML <meta >.
For example, a colon terminates the protocol label, and slashes,
question-marks, ampersands and hashes are used to separate other
fields in identifiers coded using the URI scheme [URI].
A colon separates each value from its label, and semi-colons and
commas separate components within each of these, in descriptions
of parties given in a common text notation for the vCard scheme
[vCard]. Hyphens separate the components
of a date according with the profile of the ISO8601 scheme commonly
used for DC [ISO8601]. These structuring devices
are provided explicitly within the selected schemes.
To allow the recording of Value Components generically, we recommend
the use of Dublin Core Structured Values (DCSV) syntax
[DCSV]. DCSV provides a self-describing
value-structuring method, to be used when no other suitable scheme
is available. It uses punctuation characters as follows:
- colons (:) separate plain-text labels of structured value-components
from the values themselves
- semi-colons (;) separate (optionally labelled) values within
a list
- dots (.) indicate hierachical structure in value-component
labels, if required.
Where no SCHEME is specified, then the DC Element value,
recorded in the CONTENT attribute, has no parsing rules
required, and thus no structure implied.
3.1 qDC-HTML
The complete syntax for expressing qualified DC elements in
HTML-4 may be summarised:
<META NAME="DC.Element" CONTENT="Unqualified
value">
<META NAME="DC.Element.EQ" SCHEME="schemeA"
CONTENT="Value coded according to schemeA">
<META NAME="DC.Element.EQ" SCHEME="listB"
CONTENT="Value selected from listB">
<META NAME="DC.Element.EQ" LANG="langC"
CONTENT="Value expressed in language langC">
<META NAME="DC.Element.EQ" SCHEME="DCSV"
CONTENT="u1; u2; u3">
<META NAME="DC.Element.EQ" SCHEME="DCSV"
CONTENT="cA:v1">
<META NAME="DC.Element.EQ" SCHEME="DCSV"
CONTENT="cA:v1; cB.part1:v2; cB.part2:v3">
where the codewords are:
- Element is one of the 15 DC Elements,
- EQ represents one of the Element Qualifiers,
- schemeA is a coding scheme,
- listB is a controlled vocabulary,
- langC is a language-code,
- u1, u2 and u3 are an unlabelled list
of values,
- cA and cB are the labels of Structured Value
components,
- part1 and part2 are sub-components of cB,
- v1, v2 and v3 are values of the components.
In actual instances of DC metadata each of these codewords is
replaced by tokens or strings drawn from official DC vocabularies
or elsewhere (see examples below).
3.2 Features of the HTML notation
An important feature of the HTML syntax for qualified DC is
that the association of qualifiers with the thing that they are
qualifying is preserved. There are three positions for qualifiers
corresponding to the three classes of qualification introduced
above, with element and value qualifiers clearly attached to the
element-name or value-string as appropriate.
The dot-syntax for element qualifiers is compatible with the
sense that refining the element usually consists of selecting
one type or flavour of the element from a list of alternatives.
Qualified elements do not normally need grouping.
Value components are normally aspects of a value and are complete
when considered as a set rather than being mutually exclusive.
Using the DCSV semi-colon list separator, these may be grouped
by recording them as a list, in order to associate them with the
same value. The label tokens in both NAME and CONTENT
may be structured to any depth in a simple hierarchical scheme
using the dot-syntax.
An apparent disadvantage of embedding structured values in HTML,
for example using DCSV [DCSV], is that parsing
of the element value is now implied. However, the string syntaxes
should not inconvenience any existing software since (a) the encoding
is valid for HTML, (b) parsing of the value is generally not essential
for resource discovery, since the IR/string-matching methods
used in most search operations would still find the target text-strings
from within the extended values. Existing systems can harvest
the complete colon- and semi-colon-syntax structured value from
DCSV into an index and the resource could still be located using
reasonable queries.
3.3 Limitations and Suggestions
We have not discussed how to identify the reference definition
of each Value Qualifier indicated in the SCHEME attribute.
It is possible that a method using the <link rel= ...> may
be developed. In this document we have focussed on the representation
of the qualified DC datamodel and we leave it to users, or to
future work, to complete the details needed particularly to permit
automatic processing of qualified metadata. Note, however, that
many "schemes" have several variant notations. For example,
each numeric code in the Dewey subject classification also corresponds
to a standard text string as well. Indicating which variant of
a scheme is to be used may require the development of profiles
similar to the one developed for ISO8601.
A significant limitation of HTML is that there is no explicit,
recursive, grouping mechanism for <meta > elements. This
means that general recording of fully structured metadata in HTML
<meta > elements is not possible.
Nevertheless, there are two methods for listing repeated values
for DC metadata elements:
- repeating the entire <META NAME="DC.Element"
... > element for the particular element with different
values
- putting the repeated values in a list in a single <meta
> element, with items separated by the DCSV ";"
list separator.
These two different grouping methods might be used by metadata
providers to indicate structure distinguishing between values
that need to be grouped (e.g. information about a single contributor)
and values that are distinct (e.g. identifiers for different contributors).
Some implementors have used numeric suffices (DC.Creator.1, DC.Creator.2
etc) and similar techniques in order to overcome the grouping
limitations. In general, such methods will be very implementation
specific and only useful in a local context. We do not attempt
to propose a regularisation of these techniques here.
Attempting to record complex knowledge according to the fully
qualified and structured DC datamodel [DC-datamodel]
using HTML is not advisable. HTML has incomplete expressive ability
in comparison with the complete qualified Dublin Core model, which
is why new notations using RDF and XML are being developed. Note
that fragments of XML can currently be embedded in HTML-4 [HTML4]
documents, by using a restricted syntax in order to hide content
from interpretation by current HTML clients or browsers [RDF-in-HTML].
However, there are almost no current applications which can parse
and use metadata embedded in this way. Future versions of HTML
are expected to overcome these limitations by allowing general
XML documents to be included [XHTML].
4. Recommendation
The HTML syntax shown here contains many of the components required
to represent the qualified DC model, while remaining fully compatible
with the HTML-4 [HTML4] standard. It offers
a recording method compatible with HTML tools such as browsers
and metadata harvesters. While tools to make full use of the qualified
information may not be widely available yet, metadata providers
may use the qDC-HTML syntax to record rich information in the
interim. Since the requirements of the semantic model for qualified
DC are largely captured by the notation and usage described here,
users may be confident that tools can be built to migrate the
metadata into other notations preserving full semantics, so their
investment in capturing rich information at this stage will be
worthwhile.
This extension to DCHTML [DCHTML] adds
no additional top-level elements or prefixes to those defined
for use as values of the HTML <meta > element's NAME
in the DCHTML document. Rather, it merely offers ways in which
these might be further refined and enriched, consistent with both
HTML-4 syntax and the qualified DC model, to enhance the description
and discovery of resources. Resource descriptions utilising the
qualification mechanisms discussed here should therefore be declared
with reference to the same DC schema definition as basic Dublin
Core descriptions, currently http://purl.org/dc/elements/1.0/
. A reference document for this extension to the DC schema dealing
with the qualified DC model (perhaps this document) should be
made available, and linked from the canonical schema. This will
allow the current standard schema link in the HTML <head>,
<link rel="schema.DC" href="http://purl.org/dc/elements/1.0/">,
to also define documents using qDC-HTML.
5. Examples
The following examples are snippets from within the <HEAD>
element of HTML-4 documents. In complete documents, these should
be preceded by the schema declaration <link rel="schema.DC"
href="http://purl.org/dc/elements/1.0/">
Diagrams of the models corresponding to many of the examples,
together with alternative encodings using XML and XML-RDF are
also available [DCmodel-guide].
<META NAME="DC.Creator" SCHEME="DCSV"
CONTENT="name.given:Simon; name.family:Cox; employer:CSIRO;
height:177 cm">
<META NAME="DC.Language" SCHEME="RFC1766"
CONTENT="en-AU">
<META NAME="DC.Contributor.illustrator" SCHEME="vCard"
CONTENT="fn:Simon Cox; org:CSIRO">
<META NAME="DC.Date.created" SCHEME="ISO8601"
CONTENT="1999-04-21">
<META NAME="DC.Date.revised" SCHEME="ISO8601"
CONTENT="1999-04-28">
<META NAME="DC.Relation.isBasedOn" SCHEME="URL"
CONTENT="http://www.foo.bar/explication.html">
<META NAME="DC.Relation.isFormatOf" SCHEME="URL"
CONTENT="http://www.foo.bar/explanation.doc">
<META NAME="DC.Relation.hasFormat" SCHEME="URL"
CONTENT="http://www.foo.bar/explanation.pdf">
<META NAME="DC.Format.media" SCHEME="MIME"
CONTENT="text/html">
<META NAME="DC.Format.size" CONTENT="27 kB">
<META NAME="DC.Format.media" SCHEME="MIME"
CONTENT="image/gif">
<META NAME="DC.Format.size" CONTENT="14 kB">
<META NAME="DC.Format.size" SCHEME="DCSV"
CONTENT="rows:200; cols:450">
An extended example using all the qualifying and structuring
components discussed here is:
<META NAME="DC.Identifier" SCHEME="URL" CONTENT="http://www.agcrc.csiro.au/projects/3018CO/metadata/agls/metadata_model.gif"> <META NAME="DC.Title" LANG="en" CONTENT="Diagram of data model for AGLS"> <META NAME="DC.Date.Created" SCHEME="ISO8601" CONTENT="1999-03-12"> <META NAME="DC.Creator" SCHEME="DCSV" CONTENT="Name.Given:Simon; Name.Family:Cox; Employer:CSIRO Exploration and Mining; Contact:39 Fairway, Nedlands, W.A."> <META NAME="DC.Contributor.reviewer" SCHEME="vCard" CONTENT="fn:Renato Ianella; org:DSTC; email:renato@dstc.edu.au"> <META NAME="DC.Format.size" SCHEME="DCSV" CONTENT="cols:600; rows:350"> <META NAME="DC.Format.media" SCHEME="MIME" CONTENT="image/gif"> <META NAME="DC.Relation.isBasedOn" LANG="en" CONTENT="Figure 1 from AGLS manual version 1.0">
6. Acknowledgments
Renato Ianella made the suggestion for the "colon-syntax"
for structured values, and John Kunze suggested moving that into
a separate spec.
7. References
[DC-datamodel]
Dublin Core Data Model Working Group mail archive http://www.mailbase.ac.uk/lists/dc-datamodel/archive.html
[DCHTML]
J. Kunze 1999 Encoding Dublin Core Metadata in HTML http://www.ietf.org/rfc/rfc2731.txt
[DCMI]
Dublin Core Metadata Initiative, OCLC, Dublin Ohio. http://purl.org/dc/
[DCmodel-guide]
S. Cox, 1999. A Guide to the Dublin Core datamodel with some notations
for recording Dublin Core metadata http://www.agcrc.csiro.au/projects/3018CO/metadata/dc-guide/.
[DCRFC]
S. Weibel, J. Kunze, C. Lagoze, M. Wolf 1998. Dublin Core Metadata
for Resource Discovery. RFC2413 http://info.internet.isi.edu/in-notes/rfc/files/rfc2413.txt
[DCSV]
S. Cox, R. Iannella, 1999. A syntax for writing a list of labelled
values in a text string http://www.agcrc.csiro.au/projects/3018CO/metadata/dcsv/
http://purl.org/dc/documents/notes-cox-19990430.htm
[HTML4]
Dave Raggett, Arnaud Le Hors, Ian Jacobs, 1998, HTML 4.0 Specification
http://www.w3.org/TR/REC-html40/
[ISBN]
International Standard Book Number for example, see http://www.nlc-bnc.ca/isbn/e-isbn.htm
[ISO8601]
M. Wolf and C. Wicksteed, 1997, Date and Time Formats, http://www.w3.org/TR/NOTE-datetime
[LCSH]
Library of Congress (USA) Subject Headings For information, follow
links from http://lcweb.loc.gov/catdir/
[MIME]
List of registered content types (MIME types) ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types
as required by N. Borenstein, N. Freed, 1993 MIME (Multipurpose
Internet Mail Extensions) Part One: Mechanisms for Specifying
and Describing the Format of Internet Message Bodies RFC1521 http://info.internet.isi.edu/in-notes/rfc/files/rfc1521.txt
[qDC-RDF]
E. Miller, P. Miller, D. Brickley, 1999. Guidance on expressing
the Dublin Core within the Resource Description Framework (RDF)
http://www.ukoln.ac.uk/interop-focus/activities/dc/datamodel/
[RDF-in-HTML]
This uses the most compact form of XML-RDF [RDF-syntax], in which
all the data occurs as attribute values. In this form several
important capabilities are not available, such as multiple (repeated)
values. For an example, see Figure 5 in S.J.D. Cox and K.D. Covil,
"A web-based geological information system using metadata",
Proc. 3rd IEEE META-DATA Conference, http://computer.org/conferen/proceed/meta/1999/papers/7/cox_covil.html
[RDF-syntax]
Ora Lassila, Ralph Swick, 1999 Resource Description Framework
(RDF) Model and Syntax Specification http://www.w3.org/TR/REC-rdf-syntax/
[RFC1766]
H. Alvestrand, 1995 Tags for the Identification of Languages.
See also Codes for the representation of names of languages, ISO
639:1988 http://www.oasis-open.org/cover/iso639a.html.
See also Codes for the representation of names of countries, ISO
3166:1993 http://www.oasis-open.org/cover/country3166.html.
[URI]
T. Berners-Lee, R. Fielding, L Masinter, 1998 Uniform Resource
Identifiers (URI): Generic Syntax RFC2396 http://info.internet.isi.edu/in-notes/rfc/files/rfc2396.txt
T. Berners-Lee, L. Masinter, and M. McCahill, 1994 Uniform Resource
Locators, RFC1738 http://info.internet.isi.edu/in-notes/rfc/files/rfc1738.txt.
T. Berners-Lee, 1994 Universal Resource Identifiers in WWW: A
Unifying Syntax for the Expression of Names and Addresses of Objects
on the Network as used in the World-Wide Web, RFC1630 http://info.internet.isi.edu/in-notes/rfc/files/rfc1630.txt.
[vCard]
F. Dawson, T. Howes, 1998 vCard MIME Directory Profile RFC2426
http://info.internet.isi.edu/in-notes/rfc/files/rfc2426.txt
See also F. Dawson, P. Hoffman, 1998 The vCard v3.0 XML DTD http://www.ietf.org/internet-drafts/draft-dawson-vcard-xml-dtd-03.txt
[XHTML]
Steven Pemberton and many others, 1999 XHTML 1.0: The Extensible
HyperText Markup Language http://www.w3.org/TR/WD-html-in-xml/
See also Dave Raggett, HyperText Markup Language Activity Statement
http://www.w3.org/MarkUp/Activity.html
[XML]
Extensible Markup Language http://www.w3.org/XML/
|
|