----------------------------------------------------------------------
2008-11-14 Question posed off-list

    I tried my best to generate a xml that will have a correct format but is
    not working very good. it can be that my knowledge in xml is not as good
    as it should be. is this the correct way to introduce a ISBN in DC?

       <epdcx:statement
    epdcx:propertyURI="http://purl.org/dc/elements/1.1/identifier"
    epdcx:valueURI="URI:ISBN:0552996009" />

    If not, have you got any examples that I could follow?

----------------------------------------------------------------------
One draft answer

    1. Any identifier can be used with dc:identifier and dcterms:identifier

    2.  If the identifier is defined as a URI, including URLs and
    URNs, the syntax encoding scheme dcterms:URI can be specified.
    Identifiers that have been registered as
    <http://www.iana.org/assignments/urn-namespaces/>; URN namespaces
    (e.g.  ISBN, ISSN, UUID) are distinguished by that namespace
    (e.g.  URN:ISSN:0-395-36341-1).  HTTP URIs cannot be easily
    distinguished (e.g.  it cannot be determined automatically that
    <blocked::http://dx.doi.org/10.1000/186>;
    http://dx.doi.org/10.1000/186 is a DOI and not just a URL).

    3. If the identifier is not a URI, a syntax encoding scheme should be
    specified, e.g. somenamespace:GOVDOC, to indicate the context that the
    identifier is related to.

----------------------------------------------------------------------
2008-10-21 Question posed

    I need to use your schema to generate a XML to upload it to
    SWORD but I need to difference between diferent types of
    Identifiers: DOI, ISBN, URI, EISSN, GOVDOC......

    http://dublincore.org/2008/01/14/dcelements.rdf#identifier

    Could you please tell me how to do it. Does your schema allow that?

----------------------------------------------------------------------
2008-11-14 Pete draft answer

    I think your example is using the eprints DC XML format [1]. This isn't
    a format owned/maintained by DCMI, but more on that below.

    Using that XML format, in your example

    >    <epdcx:statement
    > epdcx:propertyURI="http://purl.org/dc/elements/1.1/identifier"
    >  epdcx:valueURI="urn:ISBN:0552996009" />

    your statement is "saying":

    "The described resource is-identified-by a second resource (the value)
    which in turn is identified by the URI urn:ISBN:0552996009 (note: I
    think the URI scheme should be "urn", not "uri")

    It's perhaps easier to discuss this using the DC-Text format [2], rather
    than an XML format. Using DC-Text this would be represented

    @prefix dc: <http://purl.org/dc/elements/1.1/>; .

    DescriptionSet (
      Description (
        Statement (
          PropertyURI ( dc:identifier )
          ValueURI ( <urn:ISBN:0552996009> )
        )
      )
    )

    What I think you intend to "say" is

    "The described resource is-identified by the URI urn:ISBN:0552996009

    i.e. rather than using a second "thing" as value, you want to use the
    URI-as-literal as value. So, again using the DC-Text syntax, this would
    be represented

    @prefix dc: <http://purl.org/dc/elements/1.1/>; .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#>; .

    DescriptionSet (
      Description (
        Statement (
          PropertyURI ( dc:identifier )
          LiteralValueString ( "urn:ISBN:0552996009"
            SyntaxEncodingSchemeURI (xsd:anyURI )
          )
        )
      )
    )

    And using the eprints DC XML format you would represent this as

    <epdcx:statement
     epdcx:propertyURI="http://purl.org/dc/elements/1.1/identifier">
      <epdcx:valueString 
          epdcx:sesURI="http://www.w3.org/2001/XMLSchema#anyURI">
          urn:ISBN:0552996009</epdcx:valueString>
    </epdcx:statement>

    As I say above, the eprints DC XML format is not a DCMI-owned/maintained
    XML format. It was developed by the owners of the Scholarly Works
    Application Profile, because at the time SWAP was developed, DCMI did
    not have an XML format available for representing DC description sets
    (disclaimer: I wrote the eprints DC-XML spec, though I'm no longer
    directly involved in the maintainance of SWAP). There is a slight
    problem with eprints DC XML (which makes discussions like this one
    rather more complicated than they should be!), because eprints DC-XML is
    actually based on a version of the DCMI Abstract Model which has been
    superceded, and that version of the DCAM did not adequately distinguish
    literal from non-literal values.

    DCMI has recently published a proposed recommendation for a new XML
    format, called DC DS XML [3], which _is_ based on the current version of
    the DCAM, and it may be helpful to look at that format rather than at
    eprints DC XML, though I should emphasise that it is likely to change in
    the near future in the light of comments received on the current
    version.

    [1] http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_DC_XML
    [2] http://dublincore.org/documents/2007/12/03/dc-text/
    [3] http://dublincore.org/documents/2008/09/01/dc-ds-xml/

----------------------------------------------------------------------
2008-05-22 Re: Identifier from Douglas Campbell To: dc-identifiers@jiscmail.ac.uk
    Subject: Re: DC, OAI and identifying the digital object

    I'd like to cover four issues, though probably not provide many
    answers... :-(

    1. Resolvability of identifiers

    Getting theoretical for a bit...

    dc:identifier is a place to capture any, and all, identifiers for the
    resource - "An unambiguous reference to the resource within a given
    context".  The question is, what resource is being identified? 

    Typically we would consider the following to be all identifiers for a
    book in our library collection:
    * ISBN 1234567890
    * Digitised books database record 333
    * http://digitised.com/did/333
    * OPAC catalogue record 888 (about the book)
    * http://opac.com/bibid/888
    * OPAC catalogue record 999 (about the digitised book)
    * http://opac.com/bibid/999
    * http://handle.net/1/123

    On one level, they ARE all identifiers for the same thing, so could all
    go in dc:identifier.  But following a lot of deeper analysis within the
    Semantic Web community, we are realising there are four independent (but
    related) resources here:
    - a physical book
    - a digitised version of the book
    - a catalogue record (describing our holding of the physical book)
    - a catalogue record (describing our holding of the digitised version of
    the book)

    I believe this is the point-of-view Mikael has when he suggests the
    confusion of mixing up object and record identifiers.

    Section 3 of the "Architecture of the World Wide Web, Volume One" 2004
    [1] states:
    a) Agents may use a URI to access the referenced resource; this is
    called dereferencing the URI.  Access may take many forms, including
    retrieving a representation of the resource, adding or modifying a
    representation of the resource, and deleting some or all representations
    of the resource.
    b) A representation is data that encodes information about resource
    state. Representations do not necessarily describe the resource, or
    portray a likeness of the resource, or represent the resource in other
    senses of the word "represent".

    But this is fairly silent on what is acceptable as a representation,
    especially for an offline resource.  So it would seem acceptable to me
    if, say, http://handle.net/1/123 resolved to a catalogue record (which
    contains links to the digitised version).

    However, there has been some thinking subsequently which DOES consider
    what happens when a URI points to offline things/concepts [2].  But this
    is within the Semantic Web context and doesn't seem to have reached a
    conclusion yet.  Though it does suggest you should have different URIs
    for the basic identifier, the page describing it, the metadata
    describing it, etc.  And to be careful that dereferencing a URI does
    actually return a representation of the thing that URI identifies (eg.
    which of the above four book resources).  One option it suggests is to
    redirect using an HTTP 303 code - I note the http://hdl.handle.net/
    resolver currently uses the HTTP 302 (moved temporarily) code.

    To get back to the matter in hand...

    I guess the question is whether it is acceptable at all to include in
    dc:identifier a URI (eg. handle) that resolves to a metadata
    record/description page, not to the object?  In the case of an OAI-type
    repository, the handle IS an identifier for the digital version of the
    paper, the problem is dereferencing the handle doesn't return a
    representation of the paper, it typically provides a description page
    that contains a link to the representation of the paper.

    I don't have an answer to my question.  The above OAI repository
    behaviour seems intuitive for us humans so we'd tend to say yes, but we
    need to be aware this is incompatible with the Semantic Web direction.

    2. The 'main' URI

    You are interested in which of multiple identifiers is resolvable in an
    online context, in particular the 'best' resolvable one.  This is quite
    a specialist requirement because you happen to be online, e.g. would we
    also need a "best" identifier for offline objects? 

    The refinement/subproperty seems to be the most appropriate direction,
    such as Mikael suggest s.  At the risk of saying "here's one I prepared
    earlier", we defined a set similar to Mikael's list back in 2002 which
    included pid, digital object [location], and local identifier [3],
    though thus far we've really only used "local".  The Collection
    Description Application Profile also has something in a similar area -
    cld:isLocatedAt and cld:isAccessedVia [4], though interestingly these
    refine dc:relation, not dc:identifier.

    Defining some refinements to dc:identifier useful to multiple domains is
    certainly something the DCMI Identifiers Community may want to consider
    further.  Any support from others on the list for pursuing this?

    3. Simple DC is here to stay

    I sympathise with Mikael's desire for developers to move away from the
    restrictions of DCMES, but the reality is were stuck with it, on a large
    scale.  I am coming to realise that the things that establish a huge
    support are the extremely simple things, that usually have all sorts of
    failings, eg:
    - hyperlinks (<a href>) - there is no typing (XLink solves this but
    hasn't really taken off)
    - tagging - inconsistency and untyped (but much more popular than
    controlled vocabularies)
    - wikis - unstructured and untyped information
    - DCMES - ambiguity of types of values, eg. dates, identifiers

    No one is trying to stop people using hyperlinks or tagging or
    non-semantic wikis [OK, there are some people, but it looks futile to
    me], so we shouldn't discourage people from using DCMES.  Useful,
    specialist extensions have been built on all these (XLink, Pete's DC
    tagging, semantic wikis, DC qualifiers), but they will always remain
    niche due to the extra effort involved.

    We need a parallel stream to improve the workability of DCMES
    implementations.  Self-descibing values may be a first step...

    4. Self-describing values

    I have been toying with this idea for a while.  Potential alternative
    names for this concept: self-describing values, self-encoding values,
    decodable values?

    Basically it is a principle that instead of defining the encoding scheme
    separate to the value, wherever possible it is embedded into the value
    itself as a namespace prefix, preferably using an officially registered
    URI namespace (not a made up one) so it is globally unique (eg.
    "nlnz:334" doesn't mean much to any system outside the National
    Library of NZ).   In DCMI Abstract Model [5] terminology, a non-literal
    surrogate with separate vocabulary encoding scheme URI is replaced with
    a value URI wherever possible.

    So instead of:

    <dc:type xsi:type="dcterms:DCMIType">StillImage</dc:format>
    <dc:format xsi:type="dcterms:IMT">image/jpeg</dc:format>
    <dc:subject>
       <dcam:memberOf rdf:resource="http://lcsh.info/"/>
       <rdf:value>Science</rdf:value>
    </dc:subject>
    <dc:identifier>
       <dcam:memberOf rdf:resource="http://www.isbn.org/"/>
       <rdf:value>1234567890</rdf:value>
    </dc:identifier>
    <dc:identifier>
       <dcam:memberOf rdf:resource="http://www.natlib.govt.nz/"/>
       <rdf:value>EP/1994/2454/10-F</rdf:value>
    </dc:identifier>

    use

    <dc:type>http://purl.org/dc/dcmitype/StillImage<;/dc:format>
    <dc:format>http://www.isi.edu/in-notes/iana/assignments/media-types/imag
    e/jpeg</dc:format>
    <dc:subject>http://lcsh.info/sh85118553#concept<;/dc:subject>
    <dc:identifier>urn:isbn:1234567890</dc:identifier>
    <dc:identifier>urn:nbn:nz:wtu:EP%2F1994%2F2454%2F10-F</dc:identifier>

    It makes the data less readable to humans, but allows DCMES data to be
    more machine-friendly - if the data contains a colon (:), then parse it
    to see if it starts with the namespace for encoding schemes you know
    about.

    This doesn't solve the question of which is the best URL to access the
    object, but it may help weed out ones that aren't.

    Apologies for going slightly off topic here...  Part of my motivation
    for pursuing self-encoding values is to make metadata records really
    simple, but still rich and decodable. 

    In Matapihi [6], we currently require partner organisations supply their
    metadata converted into a fairly complex RDF XML representation [yes, I
    know  the RDF is incorrect, but it pre-dates the DCMI recommendation].
    This complexity has been quite a hurdle for small organisations with no
    technical staff.  I'd like to be able to offer an alternative that is
    more straight-forward.

    It seems to me, people can grasp the concept of one XML element tag with
    one data value inside, but when you start adding attributes and embedded
    <rdf:value> elements, etc., etc. it all gets too complicated.
    So I am aiming to collect encoding scheme information within the data
    values, as can be seen in the above before/after samples.  It will be
    interesting to see if this improves the situation.  One major key
    prerequisite will be to start defining more namespaces (many encoding
    schemes don't have any defined).

    Thanx,
    Douglas Campbell
    National Library of New Zealand

    [1] http://www.w3.org/TR/webarch/
    [2] http://www.w3.org/TR/cooluris/
    [3] http://www.natlib.govt.nz/dr/drterms.html
    [4] http://dublincore.org/groups/collections/collection-application-profile/#colcldisAccessedVia
    [5] http://dublincore.org/documents/abstract-model/
    [6] http://matapihi.org.nz/

    >>> Mikael Nilsson <mikael@NILSSON.NAME> 21/05/08 03:37 >>>
    > Many repositories are now using multiple identifiers in each metadata
    > record.  This may include a handle for the metadata record, an ISBN, a
    > local library catalogue record number and a DOI to the published
    > version.  All these are valid identifiers and are useful for the
    > repository to store and use for its own purposes.  (Individual handles
    > for each object attached to the record generally use the DC:relation
    > element).

    Sounds like a reasonable scenario...

    > "Consumers" of repository records for other services often use simple
    > DC as the harvested format because it is the lowest common
    > denominator.  A key requirement of the "consumers" is to link back to
    > the digital object or the record for the digital object.  For example,
    > the Australasian Digital Theses Program service (which I manage), uses
    > OAI-PMH to harvest sets of simple DC metadata theses records from
    > members` repositories.  After loading, the central service uses the
    > harvested identifier to provide a hypertext link to the local
    > repository record so a user can click through to the PDF.
    >
    > However, simple DC does not allow consumer services to readily
    > distinguish between DC:identifiers.  In particular, which resolves to
    > the local record.  My preference goes slightly beyond this as I would
    > like to know which resolves to a record that has the object attached,
    > i.e. it identifies the record for the object not just a source or
    > original metadata record that does not have an object attached.

    It seems to me that the functional requirements you mention are in
    inescapable conflict - "simple DC" (=using only properties from the
    DCMES) does not support a feature that you need. Something needs to give
    here: either DCMES is not enough, or you need to make do with what you
    have.

    > Alternatively, identifiers could be self-describing such as
    > urn:isbn:9781591583066, urn:doi:10.1108/00242530810865484.  Could we
    > add ooi for "original object identifier" (doi is taken)?  With such
    > self-description, consumer services could filter, configure,
    > manipulate or resolve with safety and confidence for any repository.

    There are multiple issues here.

    1. It seems to me that you are seeing records using dc:identifier for
    both the object identifier and identifiers for the record. That may
    be one of the things creating confusion - you won't know what you get
    when you click, the object or just a description. That's not an ideal
    situation. The definition is pretty clear on this point: dc:identifier
    is intended to give identifiers for the object only.

    This practice, in my opinion, should be strongly discouraged, as it
    severely undermines the possibility of using the data reliably in many
    machine-processing situations. In essence, it lowers the quality of
    the metadata significantly.

    2. Whether or not a particular URI actually resolves to a digital object
    or not isn't really a feature of the URI itself (not even all http: URIs
    denote digital objects, see for example http://purl.org/dc/terms/creator
    ...), so I'd hesitate before encoding such information into a URN scheme.

    3. There are numerous other problems with the idea of setting up and
    maintaining a URN scheme, but I will leave that discussion open.

    To me, it seems like you are lacking features in the metadata model. I
    can see how there might be a need for refinements of the dc:identifier
    property. For example, I can imagine subproperties like

    ex:resourceLocation (value type: URI), for web locations

    ex:localIdentifier (value type: string), for identifiers that are not
    very useful outside the scope of a particular application.

    ex:globallyUniqueIdentifier (value type: URI), for unique identifiers
    that don't resolve to the object, but are still global in scope

    etc....

    I would support if the DC-identifiers community decided to develop a
    set of such refinements to dc:identifier.

    Also, I'd encourage the system developers to stop limiting themselves to
    the DCMES. Your example shows clearly that the use cases require more,
    so it's difficult to see why that restriction is still in place.

    The alternative, as you describe, is to develop a carefully crafted set
    of heuristics. It seems clear to me that such a method will not scale
    very well.