Notes on FOAF and DC
This document addresses an action on DanBrickley and TomBaker to "investigate and assess feasibility of using FOAF for DC agent descriptions".
FOAF and the draft DC Agents requirements
Background
The Dublin Core community have periodically addressed the topic of Agent description for over a decade. While the original 15 DC terms were historically regarded as being for the description of (broadly) "document-like" objects, the need in the DC community to describe agents associated with those objects has been clear since the early days of the initiative. For example, at the Dublin Core Workshop in Helsinki, in October 1997, UKOLN/AHDS work on deploying DC was presented. Their report,
Discovering Online Resources. Unifying Resource Discovery Metadata for the Humanities: An Application Based Upon the Dublin Core articulated some common expectations from the metadata community at the time. For example, properties such as "personalName", "corporateName", "affiliation", "email", and constructs for postal addresses, phone numbers, fax etc were required. It was generally recognised that these kinds of property were both useful, needed ... but yet not really properties of the core "document-like" object being described in some Dublin Core record.
This tension between application needs and practical scoping of the DC was central to DC discussions and decisions over the last decade. Dublin Core took as a scoping constraint the ideal that community extensions and refinements should in some sense be "optional" and "dumb-downable", so that all applications and users of DC would at least share some commom base understanding of a DC-based record. If DC were to take a very liberal approach to extensibility, allowing community refinements/qualifiers for DC terms which really encoded properties of various related resources (buildings, telephones, people), ... the core value of DC could be endangered. Intuitively, the phone number of some contributor to some document is not always usefully considered to be a direct property of that document. To avoid "polluting" the values of the 15 core DC terms with such related-but-different metadata, the DC community articulated the "1 to 1" principle, which was an attempt to formalise this constraint to guide the development and community extension of Dublin Core.
At the same time that these discussions were taking place, W3C's RDF initiative was maturing. First publically presented at the October 1997 DC Workshop in Helsinki, RDF provided a clean and formally-grounded model for describing arbitrary properties of arbitrary things, using a data model that made clear their inter-relationships and allowed for many potential ambiguities to be avoided.
From 2000, the FOAF project began experimenting with the use of RDF for describing people and agents in the Web. Unlike DC, organizationally FOAF was an informal collaboration amongst members of the RDF and Semantic Web developer community. While standards-based, it had a more experimental style and adopted the strategy of trying various terms in public to see which were adopted, and documenting them often only in retrospect. FOAF was designed as a pure RDF vocabulary, and as such expected to be used alongside other RDF vocabularies such as DC. The vocabulary evolution model of FOAF was itself derrived from the DC experience, ... in particular the decision to version at a term level rather than at the namespace level came after observing the deployment issues around DC's move from a 1.0 to a 1.1 namespace. The initially ad-hoc approach to FOAF deployment has created some legacy around some parts of the FOAF design, in particular the properties for structured names.
As of May 2007, the FOAF specification defines 12 classes and 53 properties. Each of these is assigned a "term status" of "stable", "unstable" or "testing", corresponding to a set of (currently loosly defined) expectations about likelihood of further change. This term status vocabulary is also used at W3C by the SKOS project within the Semantic Web Deployment Working Group, and in future versions of FOAF is likely to be complimented by use of the OWL "Depracted" construct, for terms (eg. "geekcode") whose status in the spec is likely to be downplayed due to lack of adoption or frivolous nature. Discussions are in progress, facilitated by W3C's Semantic Web Coordination Group, regarding the problem of long-term persistence of the FOAF namespace, and in particular the xmlns.com domain name that it depends upon. Mechanisms being discussed including sharing of the DNS registrar password with a group of advisors, periodic "heartbeat" progress reporting to bodies such as W3C SWCG and DCMI, and offsite archival of relevant documentation.
The current version of the FOAF specification (in HTML and RDF) can be found at
http://xmlns.com/foaf/spec/. At each revision, a dated snapshot is published and archived. Within each version of the specification, hypertext anchors are available for "permalinks" to each description of each FOAF term. This design allows for the vocabulary to evolve gradually, with a historical record of former documentation always being available. A full CVS version history of the files is also kept.
The remainder of this document assesses the utility of the FOAF specification with respect to the DCMI community's draft Agent requirements document.
1. Background/Discussion
There is some ambiguity with this issue. The principle question is whether we are trying to ‘describe’ agents or ‘identify’ them?
Dublin Core is focussed on discovery, yet discovery is a hard concept to use for vocabulary scoping, since any property of any related object can in principle aid discovery. In particular, as noted in the Agents requirements, the identification of agents through the provision of agent descriptions can support discovery. FOAF similarly combines a concern for agent description with agent discovery. In particular, the FOAF discussions have technical design have had a strong emphasis on the need for formally grounded, flexible and pluralistic approaches to agent identification. To accomplish this, FOAF makes use of the W3C OWL language, and defines certain FOAF properties as "inverse functional" properties. For example, the foaf:homepage, foaf:mbox and foaf:isPrimaryTopicOf properties are considered "inverse functional". Technically, this means that there can be at most one thing in the world that has any given value for one of these properties. Terms such as foaf:homepage, foaf:weblog and various Instant Messager properties address the problem of identifying "modern" Web users; the foaf:mbox_sha1sum term is a quirky but widely used mechanism for indirectly identifying people in terms of a number derrived unambiguously from the address of a mailbox they are the primary owner of. In addition, the foaf:primaryTopic property and its inverse, foaf:isPrimaryTopicOf allow for the indirect identification of agents through describing them in terms such as "the person thatis the primary topic of the document whose URI is http://en.wikipedia.org/wiki/Isambard_Kingdom_Brunel".
Using such techniques, FOAF provides an approach to identification-by-description that is (a) formally grounded, in terms of the semantics provided by OWL and RDFS (b) extensible, in that the same techniques can be used with new terms as they emerge, whether in FOAF or from other namespaces (c) pluralistic: a typical FOAF description can use any combination of identifiers and reference-by-description techniques. The FOAF identification approach is designed to be consistent with Web Architecture and allows for (but does not require) the use of URI identifiers for people and other agents. The document "
identifying things in FOAF" describes the approach taken in a little more detail.
The DC Agent requirements goes on to note that "Agent descriptions, therefore, serve two purposes: description and identification.", identifying the following specific purposes of identifying agents.
-
disambiguate different agents who have shared or similar attributes (such as name, etc);
-
recognise when agents are the same, despite appearing to be different, for example different presentations of the same name, pseudonyms, etc.;
-
contact the correct agent associated with a resource;
-
and collocate all the works of any specific agent.
FOAF addresses these requirements, to the extent possible given some specific dataset. Basic identity reasoning can be conducted purely by following the semantics of the OWL constructs used (inverse functional etc). Richer (and less formally guaranteed) disambiguation strategies can also be used. Two FOAF descriptions, for example, might mention each a person called "John Smith" who was born on the same day, and who works for the same corporation. FOAF allows this commonality to be expressed, yet doesn't offer any formal guarantee that the descriptions are in fact describing the same person. This is perhaps likely, probably, yet not implied by the meaning of the terms used in the description. W3C's RDF querying language, SPARQL, can be used to express matches such as these - for example finding entries in a database of people descriptions where properties such as name, birthday and workplace match.
The Agent requirements draft argues:
So the resource description/discovery community needs an agent core because the DC element set does not allow a sufficiently precise description of an agent to support the above functions.
A FOAF perspective here might be slightly different; a little more "meta". Rather than there just being a core of properties for describing people when identifying people, we also need some higher-level strategies, such as the use of OWL's "inverse functional" mechanism, to allow additional properties from other parties to be acknowledged as uniquely identifying. Having said that, a lot can be done with the basic properties defined in FOAF, in particular primaryTopic can link a person to the ID of a document that is known to identify them.
Scope
This document aims to set out the requirements and the metadata elements needed for unambiguously describing OR identifying the agents associated with resources. Agent descriptions may be contained within DC metadata records, or linked to the DC metadata records for particular resources as an associated metadata description. It is not within the scope of this document to consider the issue of where agent descriptions should be located. The functional requirements set out in this document will form the basis for development of a core set of metadata elements for describing agents.
These constraints are consistent with the FOAF design. As an RDF vocabulary, FOAF descriptions can be mixed, partitioned and inter-linked quite freely.
For the purposes of this document agents are defined as persons (author, publisher, sculptor, editor, director, etc.) or groups (organization, corporation, library, orchestra, country, federation, etc.) that have a role in the lifecycle of a resource.
FOAF defines a term,
foaf:Agent as well as a short, non-exhaustive list of sub-classes of Agent. These are:
*
foaf:Person *
foaf:Organization *
foaf:Group
FOAF does not currently define detailed terms such as "sculptor". Instead, the expectation is that lists such as the
MARC relator terms would be used.
FOAF does define one specific relationship in this area:
foaf:maker (and an inverse, foaf:made). The foaf:maker property relates something to a foaf:Agent that foaf:made it. The FOAF specification currently recommends that dc:creator be used only for simple string values. This recommendation should be updated as the DCAM and RDF encoding are finalised. There is an
entry in the FOAF wiki on the motivation for defining foaf:maker; briefly, it was created to ensure a simple, regular construct that did not have as many deployment variations as dc:creator, to lower the burden on applications that encounter the property.
We also point out the constraints of the various data protection acts which ensure that there is only a limited amount of data that can legally be recorded about persons. So dates and location may be problematic for living people unless their explicit permission to include such data is obtained.
As an RDF vocabulary (rather than e.g. an XML format), FOAF does not make mandatory the inclusion any particular information. It defines the meaning of terms, rather than the required content of documents. Consequently it can be used differently in different institutional or legal settings.
3. Entities
We define two classes of agents in this document: 1. Person: an individual human being, living or dead; and 2. Group: a set, either existing or defunct, of individual entities acting collectively.
These correspond well to foaf:Person and foaf:Group. In FOAF, a Group is a group of Agents rather than necessarily of Persons. Furthermore, a Group is itself an Agent, and can therefore be used (where appropriate) within FOAF descriptions wherever an Agent is expected. FOAF provides some technical machinery (again based on W3C OWL) for characterising the membership criteria for a Group based on their properties (defined using RDF terms). This aspect of FOAF is likely to evolve to make better use of new technology under development at W3C (eg. RIF rules, SPARQL queries, OWL 1.1).
4. Attributes
Each class of entity has associated with it a set of attributes or characteristics that serve to identify that entity unambiguously from all other entities of either class.
FOAF, as an RDF vocabulary, has the notion of "property" at its heart. In RDF, properties are defined in terms of the classes they make sense to be used with, ... rather than a class defining in any exhaustive or centralised way the list of properties/attributes it expects. In practical terms, we can read "attribute" in the Agent requirements as "property" in the FOAF/RDF sense with no loss of meaning.
4.1 Attributes of a Person
This document defines the attributes of a person as the following: identifier name dates title affiliation location email other information
4.1.1. Identifier A scheme, numeric or alphabetic, or a combination of the two, used to identify unambiguously a specific individual agent. No such schemes yet exist. This element will allow for the use of such schemes when and if they are developed. 4.1.2 Name The name or names by which the person is known, including alternative names. 4.1.3 Dates May include date of the person's birth and/or death, or floruit dates (ie. an indication of the period in which the person was known to be active in a given field of endeavour). 4.1.4 Title A word or phrase used to identify the rank, office, nobility, honour, etc. of the person. 4.1.5 Affiliation The name of the organization, institution, company, or other body with which the person was or is associated, or by whom the person was employed or contracted. 4.1.6 Location Information about the person’s principal area of residence over time. Context may be indicated by the use of appropriate qualifiers (for example: Lived in Canberra 1991-2005). 4.1.7 Email Email address or addresses currently assigned to the person at the time of the description. 4.1.8 Other Information Any additional significant information about the person that is needed to unambiguously identify that person.
4.2 Attributes of a Group
This document defines the attributes of a group as the following: legal number name jurisdiction location dates web site other information
4.2.1 Legal number Any official number assigned by a public authority that is used to identify the group. 4.2.2. Name Names by which the group is or was known. May include other forms of the name and changes of name over time. 4.2.3 Jurisdiction The legal name of the judicial and administrative entity which has jurisdiction over the territory in which the group operates. 4.2.4 Location The place from which the group operated. 4.2.5 Dates Dates indicating the period the group operated. May include such things as date of founding and dissolution, date of legal mandate establishing the group, etc. 4.2.6 Web Site The http address of the world wide web site operated by the group. 4.2.7 Other Information Any additional significant information about the group that is needed to unambiguously identify that group.
COMMENTS
[If you have any comments, views, opinions about the review please add them here]