innovation in metadata design, implementation & best practices

Makx, Stu,

A quick-and-dirty analysis of files ending in ".rdf" (i.e.,
the metadata files) on http://dublincore.org makes it immediately
clear why searches on the metadata yield such terrible results:

-- A total of 744 items on http://dublincore.org/ have metadata.

-- Of these 744 items, 212 are in the snapshot of the dublincore.org Website that
   was archived on February 2001 (see
   http://dublincore.org/usage/meetings/2004/03/ISSUES/WEBSITE/no-index-archive.html).
   The entire tree http://dublincore.org/archives/ should
   probably be excluded from indexing -- this step alone would
   probably improve the quality of the metadata search by 30%!

-- Of the remaining 532 items, 323 are of historical versions and/or
   historical materials for particular workshops or conferences (see
   http://dublincore.org/usage/meetings/2004/03/ISSUES/WEBSITE/no-index.html).
   These should not be indexed.

-- Of the remaining 209 items, 74 should for various reasons definitely not be
   indexed (see http://dublincore.org/usage/meetings/2004/03/ISSUES/WEBSITE/no.html),
   for example:

   -- They are at a level of granularity too fine for indexing
      (e.g., the biographies of BoT members, which can be
      discovered from the http://dublincore.org/about/ page).

   -- They are in the http://dublincore.org/advisoryboard/ or
      http://dublincore.org/trustees/
      trees, which should not be discoverable through a public
      metadata search of the DCMI Web site.

-- There is a large category of things that should definitely
   not be indexed because they are obsolete or superseded, but
   which nonetheless form part of the DCMI historical record
   and therefore should be discoverable by other means (see
   http://dublincore.org/usage/meetings/2004/03/ISSUES/WEBSITE/maybe.html).
   For at least some of these resources there may not be a
   citation path to public Web pages. I am not sure what
   to do about these. One idea would be to create a page
   http://dublincore.org/archives/obsolete-documents.html,
   linked to http://dublincore.org/archives/, where one could
   simply link these documents without any sort of explanation
   or maintenance required. Perhaps Harry or Lance could use my
   file as a basis for doing this.

-- This leaves just 79 resources that should be indexed today (see
   http://dublincore.org/usage/meetings/2004/03/ISSUES/WEBSITE/yes.html).

I noted the following problems:

    http://dublincore.org/documents/dcmi-ieee-mou/ -- does not have a date!
    http://dublincore.org/documents/dcmi-structure/ -- is actively unhelpful
    http://dublincore.org/groups/admin/
    http://dublincore.org/groups/agents/ -- Is this group still active and is Stu still the chair?
    http://dublincore.org/groups/biz/ -- Is this still active??
    http://dublincore.org/groups/kernel/ -- is this still active??
    http://dublincore.org/resources/bibliography/ -- is this being maintained??
    http://dublincore.org/links/ -- why does Admin Core imply it is a "DCMI" document?
    http://dublincore.org/meetings/ - is this being maintained?
    http://dublincore.org/news/communications/deliverables.shtml -- is this still being maintained?
    http://dublincore.org/news/newsletter/ - looks like this is no longer being maintained?
    http://dublincore.org/news/projects.shtml - looks like this is no longer being maintained
    http://dublincore.org/sitemap.shtml - is this generated (and updated) automatically?
    http://dublincore.org/templates/rdf/example.shtml - bad link!

Tom