The Accession Numbers Project

This is a blog post by aaron cope that was published on December 30, 2021 . It was tagged golang, swift, ios and accessionnumbers.

Boarding pass: Air France, Concorde. Paper, ink. Gift of G. Robert “Bob” Hamrdla, SFO Museum Collection. 2011.066.057

The goal of the “Accession Numbers” project is to compile a catalog of machine-readable patterns for identifying and extracting accession numbers in arbitrary bodies of text for as many museums and cultural heritage organizations as possible. The project is a continuation of some experimental work, started by the Cooper Hewitt in 2014 to use optical character recognition (OCR) software to extract accession numbers from photographs of wall labels. Here’s a quote from a blog post describing that work:

Conveniently, accession numbers are so unlike any other element on a wall label as to be almost instantly recognizable. If we can piggy-back on Tesseract to do the hard work of converting pixels in to words then it’s pretty easy to write custom code to look at that text and extract things that look like accession numbers. And the thing about an accession number is that it’s the identifier for the thing a person is looking at in the museum.

Full disclosure: I wrote that blog post. Although all the work in that post was focused on wall labels at Cooper Hewitt we imagined it was an approach that could be extended easily enough to other museums. Due to time and other circumstances it was an idea that germinated but then remained dormant until this year after Apple announced the introduction of Live Text in iOS 15. What features like Live Text, and Google’s Lens product before it, take care of is handling all the hard work of performing OCR operations on a photograph even, and especially, when that photograph was taken at a funny angle or in poor lighting conditions.

The operating system can take as its input a photograph and return all the text that it finds in that image which is pretty exciting in and of itself but becomes especially interesting when we observe that:

…accession numbers are so unlike any other element on a wall label as to be almost instantly recognizable.

The reason this is interesting is that people take photographs of wall labels in museums, usually with their phones. If that phone’s operating system is capable of extracting text, without the need for a person to install any additional software, then it opens up a whole world of avenues for taking those pictures of wall labels and connecting them to their online counterparts of the objects they describe.

Importantly, nearly all of these avenues are open for any museum to pursue and there is value and benefit in developing tools and applications that can be adopted, and adapted, by as many organizations as possible. Even if SFO Museum chose to approach these opportunities selfishly there is value in ensuring that anything we build, for ourselves, also works in other museums. Everyone who passes through SFO and SFO Museum is coming from or going to somewhere else. There are a lot of museums in all those somewhere elses and it would be pretty great if anything we developed for reading wall labels in our galleries worked in the galleries of the place where visitors are traveling to.

We’ve been investigating some of those avenues and they are discussed in detail below.

accession-numbers

Boarding pass: Royal Nepal Airlines. Paper, ink. Anonymous gift, SFO Museum Collection. 2010.013.002.

If we are trying to develop tools that are not specific to any one organization the first piece in the puzzle is a catalog of machine-readable patterns for identifying and extracting accession numbers. That’s what the sfomuseum/accession-numbers repository is. It doesn’t provide any functionality but aims to be a common reference for accession numbers that any application can use in an automated fashion.

It is organized as a collection of JSON-encoded files whose names match the domain of the institution they describe. The simplest defintion files contain an organization_name and organization_url property as well as one or more patterns (and tests to validate each pattern) stored in a patterns array. This approach tries to reflect the reality that no two organizations will have the same needs or concerns in how they choose accession numbers and that trying to derive a universal pattern applicable to all wall labels is almost guaranteed to end in failure.

Here is an abbreviated version of rijksmuseum.nl.json data file for the Rijksmuseum in Amsterdam:

{
    "organization_name": "Rijksmuseum",
    "organization_url": "https://www.rijksmuseum.nl",
    "object_url": "https://www.rijksmuseum.nl/en/collection/{accession_number}",    
    "whosonfirst_id": -1,    
    "patterns": [
    {
	    "label": "common",
	    "pattern": "((?:\\-?(?:[A-Z]{1,}|(?:\\d[A-Z]{1,}))\\-[0-9A-Z]+(?:(?:\\-[0-9A-Z]+){1,})?))",
	    "tests": {
		"AK-BR-325": [
		    "AK-BR-325"
		],
		"2RP-F-2001-7-1020-41": [
		    "2RP-F-2001-7-1020-41"
		],		
		"Title(s) Ovoid vase with a red glaze\\nObject type vase\\nObject number AK-BR-325\\nDescription Eivormige vaas van porselein met een dunne nek met trompetvormige mond, bedekt met een monochroom, licht gecraqueleerd rood (sang de boeuf) glazuur. Binnen de voetring deels geglazuurd; liprand is wit. Monochromen.": [
		    "AK-BR-325"
		],
		"This is an object\\nGift of Important Donor\\nAK-NM-13198-C\\n\\nThis is another object\\nAnonymouts Gift\\nV-570\\n2RP-F-2001-7-1020-41\\nBronze": [
		    "AK-NM-13198-C",
		    "V-570",
		    "2RP-F-2001-7-1020-41"
		]
	    }
	}
    ]
}

As mentioned, the schema for definition files requires that all records contain organization_name, organization_url and patterns properties.

More complex definition files may include one or more properties containing URI Templates (RFC 6570) that can be used to derive URLs referencing detailed information about an object. These might be an object_url property for deriving the web page for an object on an institution’s website or oembed_profile or iiif_manifest properties for retrieving a machine-readable oEmbed profile or IIIF Manifest for that object.

Definition files may also include an optional whosonfirst_id property pointing to the Who’s On First record, and geographical information, for that institution. These data could then be used to automatically determine which definition file to use when extracting accession numbers from a text based on a person’s location.

go-accession-numbers

Boarding pass and ticket jacket: TWA (Trans World Airlines). Paper, ink. Gift of Jan Boelke, SFO Museum Collection. 2002.029.013

This is a Go language package package for parsing accession-number definition files and using them to extract identifiers from text. For example:

package main

import (
	"encoding/json"
	"fmt"		
	"github.com/sfomuseum/go-accession-numbers"
	"os"
)

func main() {

	var def *Definition
	
	r, _ := os.Open("fixtures/sfomuseum.json")

	dec := json.NewDecoder(r)
	dec.Decode(&def)

	texts := []string{
     		"2000.058.1185 a c",
		"This is an object\nGift of Important Donor\n1994.18.175\n\nThis is another object\nAnonymouts Gift\n1994.18.165 1994.18.199a\n2000.058.1185 a c\nOil on canvas",
     	}

	for _, t := range texts {

		matches, _ := accessionnumbers.ExtractFromText(t, def)
		
		for _, m := range matchess {
			fmt.Printf("%s (%s)\n", m.AccessionNumber, m.OrganizationURL)
		}
     	}

This would yield:

2000.058.1185 a c (https://sfomuseum.org/)
1994.18.175 (https://sfomuseum.org/)
1994.18.165 (https://sfomuseum.org/)
1994.18.199a (https://sfomuseum.org/)
2000.058.1185 a c (https://sfomuseum.org/)

This package is used by the twilio-handler application, used to make accession number extraction available via SMS messaging, described below.

swift-accession-numbers

Boarding pass: Pan American World Airways. Paper, ink. Gift of Barnaby Conrad III, SFO Museum Collection. 2001.038.150.

This is a Swift language package package for parsing accession-number definition files and using them to extract identifiers from text. For example:

	import Foundation
	import AccessionNumbers

	let decoder = JSONDecoder()

	guard let url = URL(string: "file:///usr/local/data/sfomuseum.org.json") else {
		// error handling goes here
	}
	
	var data: Data
        var def: Definition

	do {
	        data = try Data(contentsOf: url)
	} catch (let error) {
		// error handling goes here		
	}

	do {
		def = try decoder.decode(Definition.self, from: data)
	} catch (let error) {
		// error handling goes here	
	}

        let candidates  = [Definition]( def )
	let text = "This is an object\nGift of Important Donor\n1994.18.175\n\nThis is another object\nAnonymouts Gift\n1994.18.165 1994.18.199a\n2000.058.1185 a c\nOil on canvas"
	
        let rsp = ExtractFromText(text: text, definitions: candidates)

	switch rsp {
	case .failure(let error):
		// error handling goes here	
	case .success(let matches):

		for m in matches {
			print("\(m.accession_number) (\(m.organization))")
		}
	}

This package is used by the ios-label-whisperer application, used to make accession number extraction available as an iOS application, described below.

twilio-handler

Boarding pass: Korean Air Lines. Paper, ink. Gift of the Frank J. Lichtanski Collection, SFO Museum Collection. 2011.061.0324.

The twilio-handler application provides an HTTP server to listen for and respond to messages, containing text with accession numbers to extract, sent via SMS to Twilio-operated phone-number-as-a-service endpoints. When you create a phone number with Twilio people can send that number an SMS message and Twilio will forward the message to an HTTP endpoint, in this case the twilio-handler server, and that then returns the response as an SMS reply.

This server can be run locally or using the AWS Lambda + API Gateway pattern. For example this is how it would work locally sending it a Twilio-style message:

$> ./bin/twilio-handler -definition-uri 'file:///usr/local/sfomuseum/accession-numbers/data/sfomuseum.org.json'
2021/12/20 12:01:11 Listening on http://localhost:8080

$> curl -X POST -H 'Content-type: application/x-www-form-urlencoded' -d 'Body=Hello world 1994.18.175' http://localhost:8080
The following objects were found in that text:
https://collection.sfomuseum.org/objects/1994.18.175

Here’s what that looks like with a real phone number when combined with the iOS Live Text functionality integrated in to the messaging application:

When you scan a wall label with your camera the operating system extracts the text from that image and inserts it as the body of your message.

When you send the message to the phone number operated by Twilio it relays the message to an instance of the twilio-handler server which extracts all the accession numbers it finds and then returns a list of object URLs (using the object_url URI template described above). That response is then sent back to your phone. Because the message contains URLs the operating system tries to “unfurl” or “expand” each URL in to a short preview.

The twilio-handler application is designed to work with any definition file in the accession-numbers repository so long as it contains an object_url URI template.

ios-label-whisperer

Boarding pass: Air China. Paper, ink. Gift of the William Hough Collection, SFO Museum Collection. 2009.108.164

The Label Whisperer is an experimental iOS prototype application that allows a person to scan wall labels from any organization cataloged in the accession-numbers repository. Scanned accession numbers can be “collected” to an on-device database and, depending on the functionality described in that organization’s definition file, used to load object web pages.

So far, all of the work on Label Whisperer has been focused on the underlying “plumbing” rather than design or user experience so it’s very rough around the edges still. As with most iOS applications there are a lot of details to account for and this remains a “margins of the day” project so things don’t always happen quickly. The goal is not specifically to build an iOS application that SFO Museum publishes (but maybe!) so much as to understand what is possible and importantly to prove that the definition files in the accession-numbers repository can be used to build an iOS application. Here’s a short video showing everything I’ve just described.

In an ideal world, the Label Whisperer application is something that a group of museums, and cultural heritage organizations, could collaborate on and maintain making available to all of our visitors as a common platform for collecting objects across all of our collections. If that’s interesting to you please get in touch!

“Fake Accession Number APIs”

Boarding pass: Pan American World Airways. Paper, ink. Gift of David A. Abercrombie in memory of Stanley A. Abercrombie, SFO Museum Collection. 2001.039.433

One of the things this work has revealed is that lots of museums don’t allow individual object web pages to be accessed using an accession number. You can usually specify an accession number as part of a general search and the query will return the object in question but it’s not possible to link directly to that object (using an accession number). Many cultural heritage organizations publish their collections as open data and those data usually contain both an object’s accession number and the ID associated with that object’s web page. With that in mind, I’ve been working on a side-project called “Fake Accession Number APIs” to create tools which use those open data resources to index, query and resolve accession numbers to their online counterparts.

For example, here’s how you would index the open data published by the National Gallery of Art (NGA):

$> bin/import \
	-database-uri 'sql://sqlite3?dsn=accessionumbers.db' \
	-source-uri nga:// \
	/usr/local/data/nga/opendata/data/objects.csv

# Time passes...

$> sqlite3 accessionumbers.db 
SQLite version 3.36.0 2021-06-18 18:58:49
Enter ".help" for usage hints.
sqlite> SELECT COUNT(object_id) FROM accession_numbers;
136612

This is how you would look up a given accession number from the NGA collection using the index you’ve just created:

$> bin/lookup \
	-database-uri 'sql://sqlite3?dsn=accessionumbers.db' \
	-source-uri nga:// \
	1994.59.10
	
89682

There is also a tool that provides an HTTP server (run locally or using the AWS Lambda + API Gateway pattern) which can resolve and redirect an accession number to its online resource.

$> ./bin/server -database-uri 'sql://sqlite3?dsn==accessionumbers.db'
2021/11/30 12:20:36 Listening for requests on http://localhost:8080

$> curl -s -I 'http://localhost:8080/redirect/?source-uri=nga://&accession-number=1994.59.10'

HTTP/1.1 302 Found
Content-Type: text/html; charset=utf-8
Location: https://www.nga.gov/collection/art-object-page.89682.html
Date: Mon, 20 Dec 2021 06:56:23 GMT

The fake-accession-number-apis project shouldn’t need to exist. However, until the day when it is no longer necessary it can serve as a low-cost and low-maintenance bridge enabling a collection’s online resources to work with tools that are derived from the accession-numbers project which can’t know about online resource identifiers since they aren’t printed on wall labels.

Photograph: San Francisco International Airport (SFO), United Airlines. Photograph. SFO Museum Collection. 2011.068.121

The ability for our visitors to connect their in-gallery experience of our collection objects with their online and digital counterparts using simple, readily available and familiar tools like their camera phone or their messaging application feels like an important step forward for the sector. All of these efforts are works in progress and we are sharing them now in a spirit of generosity. We welcome your comments and suggestions for how to improve things. If your museum or institution isn’t already cataloged in the accession-numbers repository we would love for you to tell us about it.

A special thanks to Bruce Wyman who has so far done the lion’s share of the work to compile patterns and tests for the catalog of museums and cultural heritage institutions in accession-numbers repository.