
| Creator: | Simon Cox |
|---|---|
| Creator: | Renato Iannella |
| Contributor: | Andy Powell |
| Contributor: | Andrew Wilson |
| Date Issued: | 2005-07-25 |
| Identifier: | http://dublincore.org/documents/2005/07/25/dcmi-dcsv/ |
| Is Replaced By: | http://dublincore.org/documents/2006/02/13/dcmi-dcsv/ |
| Latest version: | http://dublincore.org/documents/dcmi-dcsv/ |
| Status of document: | This is a DCMI Proposed Recommendation. From 2005-07-25 to 2005-10-10, the status of this revision was incorrectly shown as "DCMI Recommendation".. |
| Description of document: | This document describes a method for recording lists of labelled values in a text string, called Dublin Core Structured Values, with the label DCSV. The notation is intended for structured information within attribute values in Dublin Core metadata descriptions. |
It is often highly desirable to be able to encode or serialise values within a plain-text string. Some generic methods are in common use. Inheriting conventions from natural languages, commas (,) and semi-colons (;) are frequently used as list separators. Similarly, comma-separated-values (CSV) and tab-separated-values (TSV) are common export formats from spreadsheet and database software, with line-feeds separating rows or tuples. Dots (.) and dashes (-) are sometimes used to imply hierarchies, particularly in thesaurus applications. The eXtensible Markup Language [XML] provides one general solution, using tags contained within angle brackets (<, >) to indicate the structure.
To allow the recording of generic Structured Values, we introduce the Dublin Core Structured Values (DCSV) encoding scheme.
This document describes a particular method for structuring simple string values within a DCMI description. Here, we distinguish between two types of substring within a value string - componentLabels and componentValues, where a componentLabel is the name of the type of a componentValue, and a componentValue is the data itself. Furthermore, we allow a complete value string to be disaggregated into set of components, each of which has its own componentLabel and componentValue. A value that is comprised of components in this way is called a structured value.
Punctuation characters are used in recording a structured value as follows:
The componentLabels and the componentValues themselves each consist of a text-string. The intention is that the componentLabel will be a word or code corresponding to the name of the value-component. componentLabels may be absent, in which case the entire sub-string delimited by semi-colons (;) or the end of the string comprise a componentValue.
The following patterns show how structured values may be recorded in strings using DCSV:
"u1; u2; u3" "cA=v1" "cA=v1; cB.part1=v2; cB.part2=v3" "cA=v1; u2; u3"
where u1, u2 and u3 are unlabelled components, cA and cB are the componentLabels of Structured Value components, part1 and part2 are sub-components of cB, and v1, v2 and v3 are componentValues of specific components.
The use of specific punctuation characters in DCSV coded values means that care must be exercised if these characters are to be used directly within strings which comprise the content (either componentLabels or componentValues) of the components. For DCSV, therefore, when an equals-sign (=), or a semi-colon (;) is required within the componentValue, the characters are escaped using a backslash, appearing as \= \;. There should be no ambiguity regarding the dot, full-stop or period (.) within strings: when it is part of a componentLabel, a dot indicates some hierachy; when part of a componentValue, it has the conventional meaning for the context. This method of escaping special characters largely preserves readability and the ability to enter DCSV coded metadata value strings easily using a text-editor if required. Software written to process DCSV coded values must make the necessary substitutions.
As there is no explicit grouping mechanism, DCSV can only be used to record a list. DCSV is only intended to be used for relatively simple structured values.
A simple method can be used to parse metadata values recorded according to the DCSV scheme. For a single value recorded using the DCSV scheme:
The following Perl program reads a DCSV coded string entered on stdin, and prints a formatted version of the structured result. This code is provided for demonstration purposes only and contains no error-checking.
#!/usr/local/bin/perl
use strict
print "Enter string to be parsed:\n";
my $string = join('',<STDIN>);
print "\nString to be parsed is [$string]\n";
# First escape % characters
$string =~ s/%/"%".unpack('C',"%")."%"/eg;
# Next change \ escaped characters to %d% where d is the character's ascii code
$string =~ s/\\(.)/"%".unpack('C',$1)."%"/eg;
print "\nEscaped string is [$string]\n";
# Now split the string into components
my @components = split(/;/, $string);
print "\nComponents:\n";
foreach $component (@components) {
my ($label, $value) = split(/=/, $component, 2);
# if there is no = copy contents of $label into $value and empty $label
if (!$value) {
$value = $label;
$name = '';
}
# strip whitespace from name string
$label =~ s/^\s*(\S+)\s*$/$1/;
# convert % escaped characters back in label string
$label =~ s/%(\d+)%/pack('C',$1)/eg;
#convert % escaped characters back in value string
$value =~s/%(\d+)%/pack('C',$1)/eg;
print "Component Label [$label] has Component Value [$value]\n";
}
This document uses the following terms:
John Kunze encouraged the original authors to write up this proposal formally. Kim Covil wrote the perl code. Eric Miller nagged regarding the overlap with XML. Steve Tolkin convinced the original authors to switch to =.
[DCMI]
Dublin Core Metadata Initiative, OCLC, Dublin Ohio.
http://dublincore.org/
[DCMIAM]
A. Powell, M. Nilsson, A. Naeve, Pete Johnson, 2004, DCMI Abstract Model
http://dublincore.org/documents/abstract-model/
[HTML4]
Dave Raggett, Arnaud Le Hors, Ian Jacobs, 1999, HTML 4.01 Specification
http://www.w3.org/TR/html4/
[Profiles]
DCMI Box - specification of the spatial limits of a place, and methods for encoding this in a text string
http://dublincore.org/documents/dcmi-box/
DCMI Point - a point location in space, and methods for encoding this in a text string
http://dublincore.org/documents/dcmi-point/
DCMI Period - specification of the limits of a time interval, and methods for encoding this in a text string
http://dublincore.org/documents/dcmi-period/
[Q-DC-HTML]
S. Cox, 2000, Recording qualified Dublin Core metadata in HTML
http://dublincore.org/documents/dcq-html/
[RDF-in-HTML]
This uses the most compact form of XML-RDF [RDF-syntax], in which all the data occurs as attribute values. In this form several important capabilities are not available, such as multiple (repeated) values. For an example, see Figure 5 in S.J.D. Cox and K.D. Covil, "A web-based geological information system using metadata", Proc. 3rd IEEE META-DATA Conference
http://computer.org/conferen/proceed/meta/1999/papers/7/cox_covil.html
[RDF/XML]
D. Beckett, 2004, RDF/XML Syntax Specification (Revised)
http://www.w3.org/TR/rdf-syntax-grammar/
[URI]
T. Berners-Lee, R. Fielding, L Masinter, 1998 Uniform Resource Identifiers (URI): Generic Syntax RFC2396
http://www.ietf.org/rfc/rfc2396.txt
T. Berners-Lee, L. Masinter, and M. McCahill, 1994 Uniform Resource Locators, RFC1738
http://www.ietf.org/rfc/rfc1738.txt
T. Berners-Lee, 1994 Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web, RFC1630
http://www.ietf.org/rfc/rfc1630.txt
[vCard]
F. Dawson, T. Howes, vCard MIME Directory Profile RFC2426
http://www.ietf.org/rfc/rfc2426.txt
[W3C-DTF]
M. Wolf, C. Wicksteed, 1997, Date and Time Formats
http://www.w3.org/TR/NOTE-datetime
[XHTML]
Steven Pemberton and many others, 1999 XHTML 1.0: The Extensible HyperText Markup Language
http://www.w3.org/TR/xhtml1/
See also Dave Raggett, HyperText Markup Language Activity Statement
http://www.w3.org/MarkUp/Activity.html
[XML]
Extensible Markup Language
http://www.w3.org/XML/
Copyright © 1995-2013 DCMI. All Rights Reserved.