Airplanes and Walruses – The permanent collection of the SFO Aviation Museum and Library
A healthy slice of the permanent collection of the SFO Aviation Museum and Library is now available for browsing on the Mills Field website. This includes a little more than 23,000 object records of which 18,000 have images. This is still only a small part of the museum’s total holdings and we hope to get all, or most, of the remaining objects online shortly. Like all the images on the Mills Field website, every image from the permanent collection is “zoomable”.
For people who’ve been following on with this weblog you may have noticed that until now most of the work on the Mills Field website has focused on almost everything except the actual collection itself. We’ve done a lot of work around modeling SFO and its architecture over time, modeling airports, airlines and aircraft, thinking about maps and about flight data and building out our image processing pipeline. We’ve spent a lot of time on the subject of place, generally.
The reason is because these are all the things that every object in the collection intersects with. They are the support cast to the objects themselves that act as the boundaries that give the collection its shape. They are the things that people may already know and recognize which become entry points in to the collection for those who aren’t familiar with it or are unsure how to start exploring it. Each is an avenue to objects in the collection and, in turn, every object becomes a jumping off point to other parts of the collection.
Did you know that the SFO Aviation Museum and Library, shown below, is a reproduction of the airport’s 1930s passenger lobby, shown above?
We did not necessarily need to focus on the things that orbit the collection, in advance of the collection itself, first but in doing so we have ensured that they don’t get forgotten along the way. Focusing on these things ensures that the collection has a nest of relations and connections in to which it can land (airport puns, notwithstanding). It has also allowed us to work through the underlying thinking and the infrastructure details governing the Mills Field website, testing those ideas and assumptions with a number of small, indepedent, but very much related, pieces adjusting to circumstances and gotchas along the way.
Some of this work was hinted at in last year’s blog post Surface Areas – Photos and Depictions on the Mills Field Website. In that blog post, I wrote:
Depictions are the things in the photographs themselves. For example, this photo of the Pacific Coast League: The West Coast’s Major League 1903-1957 exhibition is also a picture of the gallery, the terminal, the building and the airport itself where the exhibition was mounted.
Which is the next step for the collection: Adding depictions, and relationships, for all the objects to all the other things in the collection they hold hands with. For example, we’ve already updated the pointers for objects in the collection associated with airlines like Pacific Southwest Airlines:
And Air Canada:
And aircraft, like the Airbus A320 (and Airbus the company):
And of course, airports like SFO:
We’ve also made these associations for JFK the airport, the Boeing 747 and 707 aircraft and by extension Boeing the company, Qantas Airways, the Super Bay Hangar at SFO, the Sikorsky-61 helicopter which was used by San Francisco Helicopter Airlines to shuttle passengers from SFO to the heliport in downtown San Francisco as well as San Francisco itself.
The goal is to associate unambiguous identifiers with objects in the collection to make it faster to find and group specific items (all the objects involving a particular aircraft) and to make it easier to perform complex queries across facets (all the objects involving that aircraft but belonging to a specific airline and of a certain medium).
We want to automate this process as much as possible so we’re doing this slowly, to begin, in order to determine how best to automate the process with the fewest robot-generated mistakes.
It’s important to remember that museums, and museum cataloging, have been around for a lot longer than computers and databases. Some of these problems may sound straightforward, or even uninteresting, to people who live and breathe modern computing systems. The reality for museums is that the metadata I’ve been talking about has never been recorded in a structured model outside of narrative text. It was written down and structured in a way to allowed museums to operate in a time before computers.
SFO Museum began life in 1980 so it has always been closer to the world of computers and databases than some other museums. It enjoys a mix of unambiguous pointers, and structured data as well as narrative language to describe an object’s properties. SFO Museum is fortunate because it established strict conventions and good practices for consistently identifying things in those texts. There are inevitable variations, and the occasional unfortunate spelling mistakes, but for many things, we can simply map a known string to a stable identifier and look for that string in an object’s title or description.
For example, mapping
"Pacific Southwest Airlines (PSA)" to airline ID 1159284973 :
> python ./depicts.py MATCH 1511923085 timetable: Pacific Southwest Airlines (PSA) MATCH 1511924223 photograph: San Francisco International Airport (SFO), Pacific Southwest Airlines (PSA), Lockheed L-1011 TriStar MATCH 1511928103 timetable: Pacific Southwest Airlines (PSA) MATCH 1511928105 timetable: Pacific Southwest Airlines (PSA) MATCH 1511924849 children's in-flight activity kit: Pacific Southwest Airlines (PSA), PSA in Flight Fun + Pack MATCH 1511924237 photograph: San Francisco International Airport (SFO), airfield MATCH 1511929463 ticket jacket: Pacific Southwest Airlines (PSA) MATCH 1511929415 baggage destination tag: Pacific Southwest Airlines (PSA) MATCH 1511926231 slide: Pacific Southwest Airlines (PSA), Douglas DC-9 Super 80, San Francisco International Airport (SFO) ...
This is not a fool-proof solution. We try to match airports in title and descriptions by search for its three-letter IATA code wrapped in parentheses. Unfortunately we’ve discovered that sometimes those same three letters might also be an airline’s three-letter ICAO designator.
For example, does
(SAL) represent El Salvador International Airport (SAL) or South African Airways (SAL) ? Stable and unambiguous identifiers, right?
Does the phrase
"Black and white photographic negative depicting ground-level airside view of San Francisco International Airport (SFO) Heliport" really depict “San Francisco International Airport (SFO)” ?
As you can see in the images above, we found out the hard way! Both problems have since been fixed but this is why we want to be able to assign stable and permanent identifiers for the relationships between things in the collection. We want to have an unambiguous way to refer to things that are independent of any particula/r textual representation.
It is tempting to think we can acheive better results using natural language processing (NLP), machine-learning or some combination of both. It’s unclear to me, though, whether we could realistically do so, in any acceptable timeframe.
I did some quick, preliminary tests with a handful of NLP toolkits and at least one of them thought SFO is a person. The question of whether or not SFO is a person would defintely be an interesting dinner-time conversation but for now let’s agree that it’s not. Ultimately, all the tools displayed similar or equivalent quirks parsing the metadata in our collection suggesting that I would spent as much, or more, time correcting for errors as I would trying to match fixed labels or patterns in text. Importantly, whatever the number of mistakes that either approach yields nothing about the NLP or machine-learning approaches suggests that it would result in more correct answers.
It may be that our work, today, is to bridge past museum practice with the present so that it might become the training set(s) to power the NLP and machine-learning software of the future. The work, today, is to build lots of tools that follow the maxim of “good enough is perfect” for teasing out the relationships between things in our collection and to develop the processes for quickly spotting and fixing errors when they occur.
Here’s an old screenshot of an earlier iteration of some of these tools to add depictions to Flickr photos. The next phase of the work around the collection will be to build new interfaces and improve exsiting ones for making these associations, targeted first at staff and gradually opening them up to friends of the Museum.
As with everything else, the metadata for the collection, the classifications used to sort and gather collection objects and the images of collection objects are published as open data through the sfomuseum-data GitHub organization:
There’s a lot more to write about the collection but we’ll save that for future blog posts. You can start exploring the collection by browsing the categories and subcategories it has been organized in to or by seeing where the random button takes you now that it includes objects from the collection.
In closing, I’ll leave you with my new favourite object from the collection: