---------------------------------------------------------------------- Date: Mon, 22 Aug 2005 16:49:09 +1200 Reply-To: Douglas Campbell Sender: DCMI Date Working Group From: Douglas Campbell Subject: Re: Registering ISO 8601 To: DC-DATE@JISCMAIL.AC.UK Eric, et. al., My apologies for only contributing to this list in fits-and-starts, but here's another "fit"... ;-) As you probably know, I do not support the registration of the whole of ISO 8601 as an encoding scheme, so [luckily] I'm not about to relitigate the email sent to the UB. To add further weight to that email, I would point out that DCMI has a goal of "simplicity of creation and maintenance" (the only reference I could find to this was in section 1.2 of the Usage Guide [1]) - I think we need to find an easier-to-use solution than registering the whole of ISO 8601. So, looking forward, it appears there are three main contenders: 1. ISO 8601 profile(s) 2. XML Schema dates 3. Um, keep looking...? I'd like to try attacking the issue from a different angle: If we adopted 8601 profiles or XML schema, what, if any, requirements could be solved? Warning, long post ahead - I decided to set aside some time for analysis of ISO 8601 and XML Schema dates. I apologise that it's in an email (so harder to read) but I didn't realise I would end up doing all this when I started. But go ahead, jump straight down to the conclusions part at the end of this post now [I know you were going to anyway]... :-) To revise our requirements from the workplan: a) B.C.E. dates [DC-Lib; DC CLD WG] b) Questionable dates [DC-Lib] c) Approximate dates [DC-Lib] d) Open-ended date ranges [DC CLD WG] e) Non-Gregorian dates f) Large dates (e.g., geologic periods, astronomical time) g) Soft termini (i.e. the outer bounds for one or more termini is known or can be associated with a known period, but one or both of the exact boundaries of the event referenced are not known) h) Elapsed time less than date range interval (i.e. the duration is less than the complete interval between two termini, as in an intermittent activity) And other requirements I can remember: i) short format without hyphens, eg. YYYYMMDD [DC-Lib] j) filename-friendly eg. no slash [Douglas Campbell] k) accuracy to fractions of a second [I just thought of it] l) date ranges (normal, not open-ended) [DC-General?] - this isn't explicitly in our list but should be as it's one of the reasons the DC-Date WG restarted - I have always contended that these are valid with "W3CDTF" encoded dates anyway, but others disagree I now have a copy of ISO 8601:2004. It would appear the following would be possible using ISO 8601: 1. Continuation of the W3CDTF format [obviously] 2. Compact format (requirement i) - using 8601's "basic" format 3. Date ranges (requirement l) - we could now explictly cater for them 4. File-friendly (requirement j) - section 4.4.2 notes that "In certain application areas a double hyphen is used as a separator instead of a solidus [slash]" - I'm not sure if that means we can use it, but I quite like the idea of "2001-01-03--2005-06-10", many people find the slash confusing and don't intuitively think it means a range, this reminds me of the Liberty Date Recommendation [2] which decided to replace the T with an underscore just because it was "found visually obscurring and distracting by many users"! eg "2000-02-14_18:42" 5. Large dates (requirement f) - by extending the number of digits allowed in the year component, including positive/negative - though this is only usable where dates can be converted to a specific year, also the number of digits in the year component needs to be predetermined so we may need a profile for 4 digits (normal), 6 digits, and 8 digits...? 6. BCE dates (requirement a) - by using the positive/negative component of point (5), though some care may be needed in encoding BCE dates (I seem to remember a discussion that 8601 "-0008" may not be 8 BCE?), also it makes a "+" mandatory for "ordinary" CE dates 7. Dates prior to 1583 - years 0000 to 1582 are only allowed to be used in 8601 with mutual agreement 8. Accuracy to fractions of a second (requirement k) - we would need to specify how many decimal places, all places must be included when a fraction is used, eg. "...T03:59:59,030" (comma is 8601's preferred seperator) 9. Some approximate dates (requirement c) - these are only possible to the extent an 8601 date can officially be truncated, eg "1999-06" means some time in June 1999, "1999" means sometime in 1999, "19" means sometime in the 1900s - it's not possible to use "195" meaning the 1950s, or to encode outside the scope of a single unit (century/year/month/day) in a single date expression eg. "sometime in 1920-1930" 10. Maybe open-ended dates (requirement d) - um, this would be seriously bending 8601, but what if we said a date plus a zero-length duration meant open-ended, eg. "1995/P0Y"???? This would clash with valid representations of "0 years from 1995", but a statement of zero years seems to mean very little to me 11. Other 8601 extensions deemed useful - eg. Week number (1995-W02-5), ordinal day in the year (1995-102), duration (1995/P2Y - a period of 2 years beginning from 1995), or abstract duration as suggested by Pete for dcterms:extent (P0Y2M - 2 months) Of our main 8 requirements (a-h), 8601 profiles can really only satisfy 2 (or so). I should point out that I don't think that is a reason to abandon it, just that it can only be part of the solution. 8601 profiles can be developed fairly quickly (it is just configuring from existing specifications rather than dreaming up the specifications) - aside from the naming part ;-) I think we should make them (as others have noted) as we have been unsuccessful in finding existing ones. Who knows, they may become widely adopted since there doesn't seem to be much competition out there! I'll move on to the XML Schema soon, but first, a cut at the new profiles.. . NB: I'm assuming we'll incorporate as many "by mutual agreement" options as possible/necessary as we are defining a mutual agreement. 1. DCDATE (DCDTF?) - the basic date format most people will use, based on W3CDTF with some 8601 extensions. Format sample: +YYYY-MM-DDThh:mm:ss,sss+hh:mm--+YYYY-MM-DDThh:mm:ss,sss+hh: mm Notes * Allow years 0000 to 9999 * Plus or minus prefix so can represent BCE dates (NB: the plus sign is mandatory for CE dates) * Can add fractions of a second (I chose 3 digits arbitrarily) * Can drop lower order components (as per W3CDTF) for reduced accuracy (eg. +1999-06), additionally can reduce year to "+YY" indicating century * "--" for ranges separator (instead of forward slash), though could be confusing (for human readers only) if the second date begins with a minus? 2. DCDATE-COMPACT - same as DCDATE with separators removed. Format sample: +YYYYMMDDThhmmsssss+hhmm--+YYYYMMDDThhmmsssss+hhmm 3. DCDATE-LARGE - same as DCDATE with extra year places (total 6 digits). Format sample: +YYYYYY-MM-DDThh:mm:ss,sss+hh:mm--+YYYYYY-MM-DDThh:mm:ss,sss +hh:mm Notes * I have arbitrarily chosen 6 digits which may cover a lot of common large dates? * We may need yet another profile called "very large" with, say, 10 digits??? * Would we also need DCDATE-LARGE-COMPACT for 6 digits but without hyphens? * I left the full accuracy down to fractions of seconds so it is possible to encode all your dates within this single scheme For each of these above we would need to consider if we wanted to add in alternatives like week number, ordinal day, or duration (including the zero duration as a hack for open-endedness). We could also look at a profile for duration, but this is a lower priority. These three or so profiles would give us a breather by solving some of the more basic date encoding problems. I realise it could result in a whole lot of date data being generated that could be inconsistent with our "final final solution", but we have no date in sight for that - in the meantime at least dates will be being created that comply with a major international Standard. On the XML Schema front, I have to admit I haven't really used these so it's just from reading the specifications [3]... xsd:duration * basically an ISO 8601 duration eg "P8Y" (8 years) or "PT3H" (3 hours) * appears to extend it by allowing negative duration xsd:dateTime * says "inspired" by 8601 * basically looks like an 8601 date (with hyphens and plus/minus prefix) but the number of year digits is completely variable (minimum of 4) and the + prefix for CE dates is not needed (actually it's prohibited) * all components must be present down to the second (fractions of seconds are optional) * the timezone (which is optional) must include the minutes xsd:time * basically the part of an 8601 date after the "T", including optional fractions of seconds and optional timezone specifier xsd:date * basically the part of an 8601 date before the "T" including optional minus prefix * adds an optional timezone specifier at the end eg "1991-12-05+02:00" xsd:gYearMonth * basically an 8601 year and month with 4+ year digits and optional minus prefix * adds an optional timezone specifier at the end xsd:gYear * basically an 8601 year with 4+ year digits and optional minus prefix * adds an optional timezone specifier at the end xsd:gMonthDay * described as "a set of one-day long, annually periodic instances", eg. "3rd of May" * no minus prefix allowed as the missing year is represented using a hyphen, eg. "--MM-D" * adds an optional timezone specifier at the end xsd:gDay * described as "a set of one-day long, monthly periodic instances", eg. "the 15th of the month" * no minus prefix allowed as the missing year/month is represented using hyphens, eg. "---D" * adds an optional timezone specifier at the end xsd:gMonth * described as "a set of one-month long, yearly periodic instances", eg. Halloween is in "the month of November" * no minus prefix allowed as the missing year/month is represented using hyphens, eg. "---D" * adds an optional timezone specifier at the end So, XML Schema dates offer a similar array of options to ISO 8601 profiles. The nice things they offer are * any number of digits for the year (4+) * use minus for BCE dates, but don't need plus sign for CE dates The limitations are basically around the reduced accuracy options: * no 2 digit century option * offers year/month/day level accuracy options, but a date plus time must be complete down to the seconds level (ie. can't just go to the hour or minute) So if we to use them, I would anticipate we'd use: * xsd:dateTime (YYYY-MM-DDThh:mm:ss) * xsd:date (YYYY-MM-DD) * xsd:gYearMonth (YYYY-MM) * xsd:gYear (YYYY) Maybe we'd use xsd:duration for dcterms:extent? Since XML Schema dates are closely related to ISO 8601 dates, they only solve around the same number of our requirements: a) BCE dates c) Approximate dates (only to the extent of a single component (year/month/ day) in a single date f) Large dates k) accuracy to fractions of a second However, they do not, by themselves, offer date ranges or the short format (no hyphens). CONCLUSIONS ISO 8601 profiles: * need to be defined ...and named ;-) * probably a standard general-purpose profile plus specialist profiles for particular/advanced usages [samples defined above] * catering for BCE and large dates requires placeholders/symbols to be added to "ordinary" dates (eg. leading zeros and plus sign prefix) * offers (closed) date ranges * offers reduced accuracy levels from seconds up to centuries * offers a short format option (eg. YYYYMMDD) * may be able to encode open-ended dates by bending it a little (eg. YYYY-MM-DD/P0Y) * may be able to offer file-friendly format using double-hyphens instead of the slash in ranges XML Schema dates: * are already defined so could be deployed quickly * offer an elegant method for BCE and large dates (just add digits or add a minus prefix as necessary) * do not offer date ranges * do not offer century, hour, or minute levels of reduced accuracy * do not offer a short format (eg. YYYYMMDD) * personally I'm not sure about the appropriateness of using a format designed for a particular markup language over an international Standard Both: * can only solve 2-3 of our 8 main requirements Next steps: * Are the suggested ISO 8601 profiles practical? usable? * Start multiple threads about naming these profiles ;-] * Given the low number of requirements 8601/XML schema solve, are either really worth pursuing? * Should we consider following the XML Schema and Pete's "Open Date Range Format" [4] route of using ISO 8601 only as an "inspiration"?? I guess we could make atomic (single) date expressions to be ISO 8601 compliant but we would build our own structure (syntax) around them for ranges, approximations, questionability, etc.??? This may be able to account for most of the requirements apart from non-Gregorian dates (which I think is worth parking to one side for now). My thoughts: * I quite like the idea of specifying ISO 8601 profiles - they can pick up where W3CDTF left off (this route has a precedent) and may well be picked up quite widely as there seems to be a lack of competition! * But then I also like the idea of a new schema using ISO 8601 atomic dates (best of both worlds) - but this will take a lot longer as it will require (generate) much more debate * We need to keep the DCMI goal of "simplicity of creation and maintenance" front of mind Douglas Campbell National Library of New Zealand [1] http://dublincore.org/documents/usageguide/#whatis [2] http://www.lyberty.com/meta/date_format.html [3] http://www.w3.org/TR/xmlschema-2/ [4] http://www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf/ ---------------------------------------------------------------------- Date: Thu, 25 Aug 2005 13:13:03 +1200 Reply-To: Douglas Campbell Sender: DCMI Date Working Group From: Douglas Campbell Subject: Re: Registering ISO 8601 To: DC-DATE@JISCMAIL.AC.UK Misha, As Kelly noted, comma and period are valid, with a preference for comma. They don't state the reason for that preference, but I assume it is because that is more common globally (as is the 24 hour clock apparently)?? ISO 8601:2004 has a number of places where the number of digits is variable (indicated by underlining the character/position that can have zero or more digits) eg. years or decimal fractions. It's tricky to interpret whether the number of digits must be always present or is just a maximum... Section 3.5 - Expansion "By mutual agreement of the partners in information interchange, it is permitted to expand the component identifying the calendar year, which is otherwise limited to four digits. This enables reference to dates and times in calendar years outside the range supported by complete representat ions, i.e. before the start of the year [0000] or after the end of the year [9999]." Section 3.6 - Leading zeros "If a time element in a defined representation has a defined length, then leading zeros shall be used as required." Section 3.7 - Mutual agreement "Some of the representations identified in this International Standard are only allowed by mutual agreement of the partners in information interchange . Such agreement should ensure that fields in which the representation may occur are not allowed to hold other representations that cannot be unambiguously distinguished from the agreed representation." Section 5 - Date and time format representations "Underlining of characters in a date and time format representation, to represent zero or more of the underlined characters in the derived date and time representation (in accordance with 3.4.2), is only permitted if, at the time of interchange of the date and time format representation, the number of characters in the derived date and time representation is not known." Section 4.2.2.4 - Representations with decimal fraction "If the magnitude of the number is less than unity, the decimal sign shall be preceded by two zeros in accordance with 3.6." "The interchange parties, dependent upon the application, shall agree the number of digits in the decimal fraction. The format shall be [hhmmss,ss], [hhmm,mm] or [hh,hh] as appropriate (hour minute second, hour minute, and hour, respectively), with as many digits as necessary following the decimal sign. A decimal fraction shall have at least one digit." [NB: in this quote the last s, m, or h in square brackets is underlined.] Unfortunately the example only includes a single decimal place. So I can't work out whether the size can vary dynamically or leading (or trailing) zeros must be used. I think there is an ISO 8601 discussion list - maybe we should ask the question there? I went looking for the XML Schema 1.1 draft but I couldn't find it - just a list of requirements from 2003 and a public discussion listserv? My comments about XML vs ISO probably stem from my [failed] attempt to use XLink within RDF. I had been trying to define rich links within dc:relatio n in RDF/XML using XLink, but it was pointed out to me that XLink is defined relative to the XML syntax not (necessarily) at a higher/semantic level (eg. RDF). Since then I've been more careful about considering what context a particular thing is relevant to, especially as Dublin Core metadata can exist in many different contexts/syntaxes other than XML. Thanx, Douglas >>> Misha Wolf 22/08/05 21:19:05 >>> Hi Douglas, Thanks for a very thorough analysis. A few quick notes ... When we wrote W3CDTF in 1997, we included decimal fractions of a second, using the period (".") as the decimal separator. I'm sure that was allowed at the time by ISO 8601. From what you write, they seem to have changed their minds since then. Are you quite sure about that? We also allowed an arbitrary number of digits after the decimal separator. Why are you proposing that we choose some fixed number of digits? Ditto for years. Have you, by any chance, looked at the XML Schema 1.1 draft? I haven't studied it in any detail, but I have noted that quite a lot of work has gone into the date/time sections. While I wouldn't suggest that we use XML Schema 1.0, due mainly to the lack of a compact syntax (requested by the Library folks), my eyebrows went up when reading your: > personally I'm not sure about the appropriateness of using a > format designed for a particular markup language over an > international Standard First of all, it is unusual to call XML "a particular markup language", as it is a foundation for the creation of markup languages. AFAIK, it is only the 2nd widely adopted foundation markup language. Furthermore, I think that XML has been adopted as an ISO standard, though I don't know the number and searching the ISO site and Google for "XML ISO standard" generates too many hits. Misha ---------------------------------------------------------------------- Date: Thu, 25 Aug 2005 15:34:54 -0400 Reply-To: "Rebecca S. Guenther" Sender: DCMI Date Working Group From: "Rebecca S. Guenther" Subject: Re: Registering ISO 8601 To: DC-DATE@JISCMAIL.AC.UK Just a reality check here. I think we said we would separate the 2 issues: 1. Use of the ISO 8601 basic notation (Douglas calls it "short form") 2. Various expressions of dates that are not covered by ISO 8601 This all started with the DC Library Application Profile (yes, that was the requirement), because there are numerous places in library metadata that use the ISO 8601 basic format in millions of records and we wanted a way to say that the encoding scheme used was that alternative specified in ISO 8601 rather than the more eye readable one with hyphens (which is specified in the profile called W3CDTF). My proposal to the Usage Board at the time asked for just the notation called ISO 8601 basic and left open the question of other kinds of dates that were needed. I proposed to call it "iso8601" and be specific in the definition that all that is meant is that the encoding uses the alternative also known as ISO 8601 basic. That maybe wasn't a good idea because that is when we got into all the issues of all the other things implied by saying ISO 8601. So maybe at the time I should have called it something else and maybe there would have been some mechanism to specify that my encoding scheme is that YYYYMMDD format. But that was like 3-4 years ago and we're still discussing it. So leaving aside all the other kinds of dates we want to be able to express, can we please just think of a new name for what I mean as the first step? Or shall I pursue this some other way and not through this DC-Date group? I know I'm not the only one who needs this. ---------------------------------------------------------------------- Date: Thu, 25 Aug 2005 21:41:10 +0100 Reply-To: Misha Wolf Sender: DCMI Date Working Group From: Misha Wolf Subject: Re: Registering ISO 8601 To: DC-DATE@JISCMAIL.AC.UK Hi Rebecca, First of all, it is good that we did not define some subset of ISO 8601 and call it "iso8601", as this would be extremely misleading, given the widespread use, by DC communities, of quite a different subset of ISO 8601. Thinking of a name is not the only issue, though. As has been highlighted by the recent correspondence, there are lots of other issues, eg: - may durations be used? - which decimal separator is used? - is a timezone compulsory? - in those places where the standard allows a variable number of digits, how many digits may be used? So, I suggest there are (all together) these tasks: 1. Within the ISO 8601 space: - disregarding the compact/verbose form, we should decide the answers to the various questions - we should then spin off two profiles of ISO 8601 which differ *only* in whether they use the compact or the verbose form I say this because, it doesn't make sense to, eg: - use a period as a separator in the verbose scheme, but a comma in the compact scheme - allow 4-digit years in the verbose scheme, but 17-digit years in the compact scheme 2. Outside of the ISO 8601 space, we should solve the remaining problems. Misha