Tools for Complex and Ambiguous Dates at SFO Museum

This is a blog post by aaron cope that was published on January 14, 2021 . It was tagged golang, tools and edtf.

timetable: Lyon - Saint Exupéry Airport. Paper, ink. Gift of the William Hough Collection, SFO Museum Collection. 2009.108.042

Today, we are pleased to announce the release of four tools for working with Extended DateTime Format (EDTF) dates. Three of the four tools are general-purpose utilities that build on one another. The fourth is an SFO Museum-specific application of these tools and is being shared in the spirit of generosity and as a reference implementation for how these tools might be adapted for other use-cases.

EDTF is a standard for representing complex, and sometimes ambiguous, dates developed under the umbrella of the Library of Congress and describes itself this way:

EDTF defines features to be supported in a date/time string, features considered useful for a wide variety of applications

Date and time formats are specified in ISO 8601, the International Standard for the representation of dates and times. ISO 8601-2004 provided basic date and time formats; these were not sufficiently expressive to support various semantic qualifiers and concepts than many applications find useful. For example, although it could express the concept “the year 1984”, it could not express “approximately the year 1984”, or “we think the year is 1984 but we’re not certain”. These as well as various other concepts had therefore often been represented using ad hoc conventions; EDTF provides a standard syntax for their representation.

Further, 8601 is a complex specification describing a large number of date/time formats, in many cases providing multiple options for a given format. Thus a second aim of EDTF is to restrict the supported formats to a smaller set.

EDTF functionality has now been integrated into ISO 8601-1:2019 and ISO 8601-2:2019, the latest revisions of ISO 8601, published in March 2019.

go-edtf

A pioneer in avionics, United was the first transportation company to install a two-way, coast-to-coast teletype service. Set up in 1938, it linked stations all along the Main Line from New York to San Francisco. A radio relay system was also used to track airplane movement. As an airplane passed radio receivers located along each route segment, a signal was automatically transmitted to the ground station’s flight dispatch clock. The dispatcher would then move a pin with the corresponding flight number one position closer to the center of the clock board to plot the flight’s progress. See "Flying the Main Line: A History of United Air Lines", on display, post-security, in Terminal 3. - Instagram, September 19, 2017

The first tool we’ve released is a Go language package for validating, and parsing EDTF date strings. It implements all three “compliance” levels defined in the 2019 EDTF specification and includes a comprehensive test suite.

It also defines a custom data structure called an EDTFDate to parse and expose the semantics encoded in a date string. An EDTFDate is comprised of two “date spans” each of which contains a “date range” containing an upper and lower date. For example, the EDTF string “1950-06-XX/2004-07-03” which denotes a date “sometime between June, 1950 and July 03, 2004” is represented with the following date ranges and date spans:

EDTF component Span Range DateTime Timestamp
1950-06-XX Start Lower 1950-06-01T00:00:00Z -618105600
1950-06-XX Start Upper 1950-06-30T23:59:59Z -615513601
2004-07-03 End Lower 2004-07-03T00:00:00Z 1088812800
2004-07-03 End Upper 2004-07-03T23:59:59Z 1088899199

While “1950-06-XX/2004-07-03” is a good and compact way to represent a complex date structure there aren’t any databases (yet) that know how to parse EDTF strings or make them searchable. By decomposing an EDTF string in to inner and outer ranges of dates and timestamps (specifically Unix timestamps) we are able to easily store and query these values in any database that can index numeric values.

Additionally it is a model that allows people query for records using EDTF strings as their input parameters. For example if someone wanted to search for records during the “1950s” their query term would be the EDTF date string “195X”. The outer limits of that date are -631152000 and -315619201 (1950-01-01T00:00:00Z and 1959-12-31T23:59:59Z, respectively). Assuming a database that has been indexed with numeric columns for “start” and “end” dates the query for records in “the 1950s” becomes a simple greater-than and lesser-than operation like this, which would capture the record above whose date is encoded as “1950-06-XX/2004-07-03”:

SELECT * FROM records WHERE start >= -631152000 AND end <= -315619201

Which is a lot easier than teaching your database what “the 1950s” are.

The EDTF specification does all the work of defining the rules and semantics for encoding complex and ambiguous dates in to well-defined and structured strings and the go-edtf package does the work of decomposing those strings in to values and flags that can be manipulated by computers.

The EDTF specification has a number of rules for expressing uncertainty, precision and other properties describing a date. These details are out of scope for this blog post but all of these properties are surfaced by the code in the go-edtf package. How those properties are stored or used in a given setting are decisions left to the organization consuming EDTF date strings.

go-edtf-wasm

Token: Braniff International. Plastic, ink. Gift of the Captain John B. Russell Family, SFO Museum Collection. 2012.147.135

As the name suggests the go-edtf tools are written in the Go programming language. Not everyone wants to, or can, use Go for their programming needs so we have released a second tool to expose the core functionality of the go-edtf package as WebAssembly binaries.

WebAssembly (wasm) is a standard that defines the rules for exporting code written in one programming language so that it can be safely consumed and executed by another programming language. Not all programming languages support building or running wasm binaries yet but one that does is Go. Another is JavaScript and is supported by all the major web browsers.

The second tool we’ve released is called go-edtf-wasm and it is code for building wasm binaries that expose specific pieces of functionality from the go-edtf package. Today there is only one wasm binary that exports a parse_edtf function for parsing an EDTF string in to a JSON-encoded EDTFDate object, as described above. Here’s an abbreviated example of how we might load the parse.wasm binary and then use it later on in a web application to parse an EDTF date entered in a form:

const go = new Go();

let mod, inst;

WebAssembly.instantiateStreaming(fetch("/wasm/parse.wasm"), go.importObject).then(
    async result => {
        mod = result.module;
        inst = result.instance;
	await go.run(inst);
    }
);
var edtf_el = document.getElementById("edtf");
var edtf_str = edtf_el.value;

var rsp = parse_edtf(edtf_str);
var edtf_date = JSON.parse(rsp)

And just like that all the code that we’ve written code in Go is available to tools not written in Go. A tool like parse.wasm can be useful for doing client-side validation of EDTF dates before submitting them to a search endpoint or for converting those dates into their numeric (or datetime) equivalents and passing those to servers that aren’t able to run the native go-edtf code themselves.

We are just getting started investigating the different uses for WebAssembly binaries and are excited to see where they take us.

go-edtf-http

Pocket calendar: Qantas Airways, 1954. Paper, ink. Gift of Thomas G. Dragges, SFO Museum Collection. 2015.160.748

The third tool we are releasing is go-edtf-http and provides convenience methods for creating Go-language HTTP “handlers” for exposing functionality defined in go-edtf as machine-readable web services. For example, here is the simplest implementation of that, a simple API-over-HTTP web server:

import (
	"github.com/sfomuseum/go-edtf-http/api"
	"net/http"
)

func main() {

	api_parse_handler, _ := api.ParseHandler()
	api_valid_handler, _ := api.IsValidHandler()
	api_matches_handler, _ := api.MatchesHandler()

	mux := http.NewServeMux()
	mux.Handle("/api/parse", api_parse_handler)
	mux.Handle("/api/valid", api_valid_handler)
	mux.Handle("/api/matches", api_matches_handler)

	http.ListenAndServe("localhost:8080", mux)
}
$> curl -s 'http://localhost:8080/api/valid?edtf=Jan,%203%201985'
false

$> curl -s 'http://localhost:8080/api/valid?edtf=1985-01-03'
true

$> curl -s 'http://localhost:8080/api/matches?edtf=1985-01-03/1987'
{"level":0,"feature":"Time Interval"}

go-sfomuseum-edtf

Logbook: Stanley Henry Page. Paper, ink. Gift of Charles Page, SFO Museum Collection. 2010.174.023

The fourth and final tool we’ve released is specific to SFO Museum. It is provided “as-is” in a spirit of sharing and generosity and as an example of how all the other tools can be combined in to an application tailor-fit for an organization (SFO Museum). Like the other tools go-sfomuseum-edtf is written in the Go programming language and has two parts:

  • Code and tools for converting strings written using SFO Museum’s rules and conventions for encoding dates in to EDTF strings and EDTFDate objects.
  • A simple web application that allows people to enter SFO Museum date strings and convert them in to EDTF strings and EDTFDate objects. The application also exposes this functionality via machine-readable web services and uses the code in the go-edtf-http package for functionality that isn’t specific to SFO Museum.

SFO Museum has an existing practice, a muscle-memory, around how dates are represented in our collection and it is unrealistic, impractical and impolite to suddenly ask everyone to adopt ETDF dates. Overall, EDTF might be a better representation for dates and if everyone adopted it we, as a community, could make “short work” of many inoperability problems involving time but it is unlikely that EDTF will be adopted immediately by most organizations.

Until then we need tools to meet the existing practice half-way and to be able to translate the dates already in use in to their EDTF equivalents. That is what go-sfomuseum-edtf aims to do for SFO Museum. Here’s an example of some of the patterns that SFO Museum uses for dates and their EDTF representations:

SFO Museum date EDTF date
04/1972 1972-04
c. 03/12/1984 1984-12-~03
early 1970s 1970-01/1970-04
c. early 1950s 1950-~01/1950-~04
Early 1960s 1960-01/1960-04
1930s 193X
c 1980s ~198X-01-01/~198X-12-31
c. 2018- 2020 ~2018/~2020

And here’s a screenshot of the web application we built to convert SFO Museum dates in to EDTF dates:

The tool itself is available on the Mills Field website at: https://millsfield.sfomuseum.org/edtf/

This tool exists to demonstrate 1) How we can use EDTF dates without asking staff to enter EDTF dates or change their practice and 2) To show staff how to translate (a) to (b) when it’s necessary. I also anticipate it will be useful for testing new date strings we adopt or when we discover patterns that aren’t handled by the code yet.

Logbook: Stanley Henry Page. Paper, ink. Gift of Charles Page, SFO Museum Collection. 2010.174.018

The Extended DateTime Format may not be the “single biggest” improvement, or change, to the cultural heritage sector in recent memory but I believe that once more people realize what it is and what it does, and the tools that exist for working with EDTF dates, that it will be a big deal. It is a big deal because these tools represent a bridge between the necessary ambiguity of representing time and the practical mechanics of searching for things by time in a… well, timely manner.

The sfomuseum/go-edtf packages are not the first or only software implementations of EDTF but we hope they will be a valuable contribution and another step towards building a common toolbox of small focused tools by and for the cultural heritage community.