Dublin Core (Registered Trademark) Metadata Initiative logo and catchphrase: 
Making it easier to find information
Jump to main content: This Page
Jump to site map: New Page
Dublin Core (Registered Trademark) logo in banner
 
 

 

DC-6 Resource Type and Format breakout

The focus of the breakout was

(1) examining how the DCQ structured data model, with its element and value qualifiers, can be used for the Type and
Format elements in DC

(2) teasing out the proper semantics of Type, in particular clarifying its relationship with Format.

This breakout group had the advantage, or possibly baggage, of corresponding to an existing DC working group. Thus, a deal of background work was available. However, while many of those present had a strong background in some of the issues because of this, there were a number of people at the breakout who were coming relatively new to the issues and had to be brought up to speed as possible.

The working group had in particular been wrestling with proposals for controlled vocabularies or list of terms to be used as recommended in the RFC:

"For the sake of interoperability, Type|Format should be selected from a list that is currently under development in the workshop series."

Format

For Format it has generally been accepted that digital resources can be adequately described using IMT or MIME types. This obviates the need for the DC community to maintain a list or vocabulary.

One outstanding issue in this, however, is the requirement
sometimes to indicate nested formats, exemplified by the case
of compressed formats which in turn contain data in another
special format. A further form of nesting is exemplified
by a CD-ROM (ie a piece of plastic), containing an
ISO-#### digital data stream, which in turn contains files
of various MIME types. The problem here really comes down
to application of the 1:1 principle

However, with the extension of DC to also cover non-digital resources, the solution is less clear. In the generalised usage, the Format element can be used in the sense of "form of instantiation for operational purposes". This should cover the "museum's" requirements for a "physical description". For this usage there are a number of vocabularies or coding systems available, such as the Getty Art and Architecture Thesaurus Physical Media. The particular vocabulary used for any instance may be indicated using the DCQ value qualifier, also known as "scheme".

Another outstanding issue for the Format element is the requirement to encode size information in addition to media. Size clearly represents a particular usage of the Format element which probably requires the element to be qualified, ie a Format "type" of "size", in contrast to another Format "type" of "media" or similar. Furthermore, for many resources several different measurements of size can be made, so capturing this in a machine-readable way will require some structuring or qualification of the element and content. This was not explored in the breakout.

Most outstanding questions related to the Format element appear to be largely engineering issues which can be accommodated by convention within the DCQ structure.

Type

Discussions around Type were much more contentious. In particular, there is still a lot of confusion about the distinction between Type and Format.

The generalised definition of Format given above ("form of instantiation for operational purposes") appears to successfully capture a useful semantic for Format, and thus imply what Type does not cover.

The existing definition of Type refers to the "genre" of the resource, and has been generally understood well in relation to information resources.

Genre terms such as "novel", "poem", etc are quoted in the definition of Type in the current RFC. Several very rich vocubularies are available for these from the communities dealing with bibliographic information, and also for education and art. Terms from these vocabularies may be used for Type, with the source indicated using a DCQ value qualifier or "scheme". Of course the use of domain-specific vocabularies will limit semantic interoperability.

However, the notion of "genre" is more problematic in the expanded universe to be described by DC which includes things other than information resources.

A "minimalist" list of values for Type

(the cross-domain 8 or XD8) has been developed by
the working group, to fulfill the requirement
from the RFC quoted above, as follows:
text
image
sound
dataset
software
interactive
event
physical object

This list has the benefit of attempting to be fully cross-domain, and is also simple (short!) enough to be generally used. However, an inspection of the XD8 reveals that it is strongly biased with much finer granularity for information resources in the first six items (the I6), and then a rump of two ("event" and "physical object") for all other resources (the O2).

However, there were questions about

(a) whether the granularity in the I6, though finer than in the O2, was in fact useful for any particular real-world application,
(b) whether the specific terms in the O2 are comprehensive (unlikely, especially in view of the absence of a fall-through value of "other")
(c) what was the basis for privileging the O2 over other possible non-information-resource terms (special pleading was suggested!),
(d) (in particular) whether the aspect of the property distinguished between the I6 was followed through in the rest of the list.

A number of other requirements were also raised as requiring additional attention. These included

(1) the need to distinguish between "collections" and "items" (particularly important for archivists): since more-or-less all resources can occur in either form, this appears to be a distinction approximately orthogonal to other genre indications, and thus may be accommodated through DCQ qualification of the Type element with a "type=aggregation-level", in contrast to the "type=genre";

(2) needs for Type values of "abstract work", "service", "moving-image".

The specific requirements could be seen in some cases to emanate mainly from specific communities. In fact, recognising the functional needs of specific domains is often the key to understanding the basis for arguments around the Type element. Different communities have different views of the world driven by their particular requirements, which may in fact require different solutions, equally valid in context.

Clearly some work remains to be done in cleaning up the semantics of the Type element, even before proceeding to recommendations and the engineering issues. It is likely that some pragmatic compromise and "rules-of-thumb" will be needed.


The problems in both Format and Type appear to have been incited by the generalisation of DC to describe non-digital resources, and in particular, non-information resources. A particular consequence of this has been to muddy the 1:1 issue which appears to particular confuse the use of the Type and Format elements.


Dr Simon Cox - Australian Geodynamics Cooperative Research Centre
CSIRO Exploration & Mining, PO Box 437, Nedlands, WA 6009 Australia
T: +61 8 9389 8421 F: +61 8 9389 1906 Simon.Cox@dem.csiro.au
http://www.ned.dem.csiro.au/SimonCox/

[an error occurred while processing this directive]