
| Creator: | Simon Cox |
|---|---|
| Creator: | Renato Iannella |
| Contributor: | Andy Powell |
| Contributor: | Andrew Wilson |
| Contributor: | Pete Johnston |
| Contributor: | Thomas Baker |
| Date Issued: | 2006-02-13 |
| Identifier: | http://dublincore.org/documents/2006/02/13/dcmi-dcsv/ |
| Replaces: | http://dublincore.org/documents/2005/07/25/dcmi-dcsv/ |
| Is Replaced By: | http://dublincore.org/documents/2006/04/10/dcmi-dcsv/ |
| Latest version: | http://dublincore.org/documents/dcmi-dcsv/ |
| Status of document: | This is a DCMI Proposed Recommendation. |
| Description of document: | This document describes a method for recording simple structured data in a text string, or structured value string. This method is referred to for historical reasons as DCSV (which originally meant "Dublin Core Structured Value"). |
| Revision note: | 2006-02-13. After approval of the DCMI Abstract Model [DAM] as a DCMI Recommendation in March 2005, the DCMI Usage Board undertook a review of the DCSV syntax specification and of the related specifications for the encoding schemes DCMI Box, DCMI Point, and DCMI Period, with the goal of revising their language for conformance with the Abstract Model. A summary of the changes made can be found in the document "Revision of legacy DCSV specifications". As of 1995, the DCMI Abstract Model supports the representation of complex structures, such as those encoded in DCSV-syntax-based encoding schemes, as "related descriptions". The DCMI Usage Board encourages implementers to consider the longer-term consequences for interoperability of packaging structured information in parsable DCSV-encoded string values as opposed to conveying that information in related descriptions using other syntax encodings. |
It is often desirable to encode or serialise simple structured data within a text string. Some generic methods are in common use. Borrowing conventions from natural languages, commas (,) and semi-colons (;) are frequently used as list separators. Similarly, comma-separated values (CSV) and tab-separated values (TSV) are common export formats from spreadsheet and database software, with line feeds separating rows or tuples. Dots (.) and dashes (-) are sometimes used to imply hierarchies, particularly in thesaurus applications. The eXtensible Markup Language [XML] provides a more general solution, using tags contained within angle brackets (<, >) to indicate structure.
This document describes a particular method for encoding simple structured data within a value string. In the DCMI Abstract Model [DAM], a value string is defined as "a simple string that represents the value of a property". Value strings encoded according to the method described in this document are referred to here as structured value strings.
(Note that for historical reasons, the method itself is still referred to here as the DCSV Syntax, or DCSV. "DCSV" originally stood for "Dublin Core Structured Value", a legacy concept from circa 1997 which no longer has a place in today's DCMI Abstract Model [DAM].)
The DCSV Syntax allows a structured value string to be parsed into a set of components. To represent this set of components, the syntax distinguishes between two types of substring within the structured value string -- componentLabels and componentValues. A componentLabel is the name of a component within the structured data, and a componentValue is the data itself.
Punctuation characters are used in encoding a structured value string as follows:
The componentLabels and the componentValues themselves each consist of a text string. The intention is that the componentLabel will be a word or code corresponding to the name of the component. The componentLabels may be absent, in which case the entire substring delimited by semi-colons (;) or the end of the string comprises a componentValue.
The following patterns show how structured information about a resource may be recorded in strings using DCSV:
"u1; u2; u3" "cA=v1" "cA=v1; cB.part1=v2; cB.part2=v3" "cA=v1; u2; u3"
where
The use of specific punctuation characters in DCSV-encoded value strings means that care must be exercised if these characters are to be used directly within strings which comprise the content (either componentLabels or componentValues) of the components. For DCSV, therefore, when an equal sign (=) or a semicolon (;) is required within the componentValue, the characters are escaped using a backslash, appearing as \= \;. There should be no ambiguity regarding the dot, full-stop, or period (.) within strings: when it is part of a componentLabel, a dot indicates some hierarchy; when part of a componentValue, it has the conventional meaning for the context. This method of escaping special characters largely preserves readability and the ability to enter DCSV-encoded metadata value strings easily using a text editor if required. Software written to process DCSV-encoded value strings must make the necessary substitutions.
Note that DCSV is only intended to be used for relatively simple structured information about resources.
A simple method can be used to parse metadata value strings encoded according to the DCSV syntax. For a single DCSV-encoded value string:
The following Perl program reads a DCSV-encoded string entered on stdin and prints a formatted version of the structured result. This code is provided for demonstration purposes only and contains no error-checking.
#!/usr/local/bin/perl
use strict
print "Enter string to be parsed:\n";
my $string = join('',<STDIN>);
print "\nString to be parsed is [$string]\n";
# First escape % characters
$string =~ s/%/"%".unpack('C',"%")."%"/eg;
# Next change \ escaped characters to %d% where d is the character's ascii code
$string =~ s/\\(.)/"%".unpack('C',$1)."%"/eg;
print "\nEscaped string is [$string]\n";
# Now split the string into components
my @components = split(/;/, $string);
print "\nComponents:\n";
foreach $component (@components) {
my ($label, $value) = split(/=/, $component, 2);
# if there is no = copy contents of $label into $value and empty $label
if (!$value) {
$value = $label;
$name = '';
}
# strip whitespace from name string
$label =~ s/^\s*(\S+)\s*$/$1/;
# convert % escaped characters back in label string
$label =~ s/%(\d+)%/pack('C',$1)/eg;
#convert % escaped characters back in value string
$value =~s/%(\d+)%/pack('C',$1)/eg;
print "Component Label [$label] has Component Value [$value]\n";
}
This document uses the following terms:
John Kunze encouraged the original authors to write up their proposal formally, resulting in the first DCSV specification of July 2000. Kim Covil wrote the perl code. Eric Miller nagged regarding overlap with XML. Steve Tolkin convinced the original authors to switch to =.
[DAM]
A. Powell, M. Nilsson, A. Naeve, P. Johnston, 2005, DCMI Abstract Model
http://dublincore.org/documents/abstract-model/.
[XML]
Extensible Markup Language
http://www.w3.org/XML/.
Copyright © 1995-2013 DCMI. All Rights Reserved.