Using WebAssembly to parse EDTF date strings using a Go library in Python

This is a blog post by aaron cope that was published on February 16, 2023 . It was tagged python, golang, edtf and webassembly.

Negative: San Francisco International Airport (SFO), road construction. Negative. Transfer from San Francisco International Airport, SFO Museum Collection. 2011.032.2564

This is a short technical blog post to note the release of the py-edtf Python library for parsing date strings formatted using the Library of Congress’ Extended DateTime Format (EDTF), which was formalized as part of the ISO-8601 standard in 2019. What is notable about this release is that none of the actual code for parsing EDTF strings is happening in native Python code. Instead the py-edtf package is invoking code written using the Go programming language and which has been compiled in to a WebAssembly binary and bundled with the rest of its source code.

We wrote about the underlying Go package for parsing EDTF strings in the Tools for Complex and Ambiguous Dates at SFO Museum blog post. We also wrote about invoking that Go code from JavaScript in the Reverse-Geocoding in Time at SFO Museum blog post. Now it’s possible to do the same in native Python code. Under the hood, there’s quite a lot going on but using the Python library to parse EDTF strings is as easy as:

import edtf.parser

p = edtf.parser.Parser()
dt = p.parse("2004-06-XX/2004-07-03")

We are releasing this package alongside the go-edtf and go-edtf-wasm package (which is used to create the WebAssembly binary bundled with the Python code) in a spirit of generousity. We believe that widespread adoption of EDTF by the cultural heritage sector would be beneficial to all but recognize that it is a complex specification to interpret in code and that not everyone is in a position to support code written in the Go programming language. The py-edtf package allows SFO Museum to share the code it has written in Go with others who may prefer to write their code in Python. This ability to share code across languages using the WebAssembly binary format is novel because it embodies both the theory and the practice of “small focused tools”, by and for the cultural heritage sector, that we’ve written about previously:

The cultural heritage sector needs as many small, focused tools as it can produce. It needs them in the long-term to finally reach the goal of a common infrastructure that can be employed sector-wide. It needs them in the short-term to develop the skill and the practice required to make those tools successful. We need to learn how to scope the purpose of and our expectations of any single tool so that we can be generous of, and learn from, the inevitable missteps and false starts that will occur along the way.

WebAssembly is a nascent technology so it is still a bit rough around the edges, not uniformly supported across the wide array of programming languages in use and definitely not suitable for certain kinds of complex applications yet. However when it does work, as in the case of parsing and validating EDTF date strings, it points to a whole new set of exciting possibilities.

Negative: San Francisco International Airport (SFO), runway construction. Negative. Transfer from San Francisco International Airport, SFO Museum Collection. 2011.032.0520