Who's On First at SFO Museum

This is a blog post by aaron cope that was published on August 28, 2018 . It was tagged architecture, opendata, whosonfirst, sfo and maps.

Today, we are happy to announce the first release of historical building footprints and interior spaces, including galleries and public art, at the San Francisco International Airport (SFO) as an openly licensed dataset.

The data spans the years 1954 through 2018 and is published under the Linux Foundation’s Community Data License Agreement - Permissive 1.0 (CDLA). In the coming weeks and months we will both improve and update the data, adding similar records for the years preceding 1954.

We are publishing this data in five groupings. Three of them contain spatial data and descriptive properties about architectural elements at the airport and their surroundings. They are: Architectural features, Public art and something called “Who’s on First” (which is described below). The metadata groupings contain placetypes and properties (in the data) that are specific to SFO and SFO Museum.

These data are not an “application”, but rather the building blocks by with which many different applications might be built. Importantly they are the building blocks that allow a variety of different applications to be built while still remaining interoperable, providing a common framework for everyone to talk about the airport and the museum and our collection the same way.

Architecture

This data is published in raw form on GitHub: https://github.com/sfomuseum-data/sfomuseum-data-architecture. It is also available as a SQLite database from: https://millsfield.sfomuseum.org/distributions/sqlite/.

Architecture data consists of the following distinct placetypes: buildings, terminals, boarding areas, gates and galleries.

Buildings

As of this writing there is only “one” building included in the dataset: The SFO terminal complex. SFO has changed and evolved many times over the years and the data contains an atomic record for each instance, with pointers to “the SFO” that it replaced or was replaced by.

The SFO of today won’t be the SFO of tomorrow and we believe that it’s important to be able to unambiguously refer to a particular incarnation of the airport at a moment in time with the confidence that the details won’t change. Trying to speak about SFO in the 1970s is tricky when your only referent is the SFO of 2018.

There are lots of other buildings at the airport. If you’ve visited SFO recently you may have noticed that there’s a new hotel under construction facing the A terminal, in almost the same spot where the old Hilton Hotel used to stand in the 1960s. We plan to add both of those hotels and all the other buildings that have defined the airport complex, over the years, but the terminal building itself seems like the best place to start.

You can see all the buildings at: https://millsfield.sfomuseum.org/architecture/buildings/

Terminals

In 1954 there was a single terminal at SFO. In 1963 there were two. By 1979 another terminal had been added and in 2018 there are four terminals, and every one of them has been altered or reconfigured in some way over the years. As with buildings we’ve created stable and permanent identifiers for each terminal as it was built, or experienced a significant change, with a pointer to which SFO (the building) it was part of.

You can see all the terminals at: https://millsfield.sfomuseum.org/architecture/terminals/

Boarding Areas

Boarding areas are to terminals as terminals are to buildings. They have expanded and contracted and even swapped names with one another since 1954. For a short time there were two “Boarding Area A”s, one domestic and one international. As with the terminals, each boarding area has a pointer to its relative hierarchy (terminal, building) at that point in history.

You can see all the boarding areas at: https://millsfield.sfomuseum.org/architecture/boardingareas/

Galleries

Early work plotting SFO Museum galleries on our historic map imagery. Not seen is the recently opened video arts gallery but it's definitely there!

The gallery spaces are where the airport and the museum meet!

In 2018 there are 20 active gallery spaces at the airport but since the museum first opened in 1980 there have been over 50 such spaces.

We’ve documented all of them and associated each one with its relevant boarding area, terminal and building. Contemporary gallery spaces include precise location information while older galleries include approximate data, usually the centroid of its container boarding area or terminal.

If you’re starting to see a pattern in the relationship between the different placetypes you might be able to guess what comes after, or is “contained” by, the galleries. We’re saving that for a future release but it’s definitely on the way…

You can see all the galleries at: https://millsfield.sfomuseum.org/architecture/galleries/

Gates

Only 2018-era gates (and relationships) are available in this release. As time and circumstances permit we will add historical gate information.

Additionally, the location information for these gates in this data release reference the Federal Aviation Administration’s (FAA) idea of where the gate is which is not where most people think a gate is located. The FAA considers the location of a gate to be the point where a jetway meets the door of an airplane. Passengers tend to think that that a gate is located at the other end of the jetway, where it meets the airport itself.

Don’t worry, though: The airport (SFO) knows where all of these “different” gates are located and that data will be included in a future update to the data that SFO Museum publishes.

When you think about the jetways they themselves are their own kind of unique architectural space at an airport. It seems a little silly to try and model them along the same lines as terminals or boarding areas, if only because they move around so much. That said once we have stable coordinate data for either end of a jetway there is a larger avenue to imagine, or visualize, these otherwise liminal spaces.

You can see all the gates at: https://millsfield.sfomuseum.org/architecture/gates/

Public Art

All of the SFO Museum public art GeoJSON data converted in to an ESRI Shapefile using the go-whosonfirst-shapefile tool and then converted back in to GeoJSON using the ogr2ogr tool and then displayed on the geojson.io website... phew!

This data is published in raw form on GitHub: https://github.com/sfomuseum-data/sfomuseum-data-publicart. It will be available as a SQLite database shortly.

SFO Museum has over 100 works of public art, located throughout the airport, purchased through the San Francisco Arts Commission (SFAC).

The public art dataset is a lens on to the world of open data publishing. The data was sourced from records published in the Who’s On First California venues dataset. That data was imported in to Who’s On First (WOF) from the DataSF open data portal SF Civic Art Collection data file. SFO Museum has taken the WOF data and updated it with accurate location data and museum/airport specific metadata. We will contribute our changes back to the WOF venue dataset, because we want these works to be shared far and wide, but plan for our public art dataset to be the “source of truth” from now on.

You might be reading this and thinking “What is this thing called Who’s On First that keeps being mentioned?” All your questions are answered in the following sections.

You can see all of the public art at: https://millsfield.sfomuseum.org/publicart/

The rest of the world (aka “Who’s On First”)

This data is published in raw form on GitHub: https://github.com/sfomuseum-data/sfomuseum-data-whosonfirst. It is also available as a SQLite database from: https://millsfield.sfomuseum.org/distributions/sqlite/.

All of SFO Museum’s datasets hold hands with the Who’s On First gazetteer.

If you are not familiar with the term “gazetteer” it’s basically just a giant phonebook of places, rather than people. Instead of creating a museum-specific record for the city of San Francisco or the state of California and so on, we’ve chosen to use the Who’s On First records for those places.

The entire Who’s On First dataset contains 26 million records, of which there are over 700,000 administrative places. The sfomuseum-data-whosonfirst dataset is a repackaging of only those records that the SFO Museum datasets need to make sense. Currently that’s limited to a handful of places in San Francisco, California and the United States but more places will be added as the overall work we’re doing progresses.

This dataset bundles the following placetypes: campuses, postal codes, localities, counties, regions and countries.

sfomuseum-properties and sfomuseum-placetypes

These data are published on GitHub from: https://github.com/sfomuseum/sfomuseum-properties and https://github.com/sfomuseum/sfomuseum-placetypes respectively.

All of the datasets are described with a common set of metadata properties as well as any additional properties unique to the record in question or the source that published it.

These properties are denoted by using a {NAMESPACE}:{PREDICATE}={VALUE} syntax. For example: sfomuseum:placetype=terminal or sfomuseum:map_id=D12.

The sfomuseum-properties data is where all the properties whose namespace is sfomuseum: are described as machine-readable documents that can be used to generate human-friendly documentation or to validate data using software.

The sfomuseum-placetypes data is where all the SFO Museum -designated placetypes (buildings, terminals, etc.) are described as machine-readable documents that can be used to generate human-friendly documentation or to validate data using software.

These data compliment the whosonfirst-properties and whosonfirst-placetypes data respectively.

“Who’s On First”

These SFO Museum metadata are relevant because if you look at the record for, say, Boarding Area B (2017) you’ll see that it is described as "a boarding area (sometimes called a concourse)".

Inspecting at the raw data for that record reveals the following properties:

"properties": {
	"sfomuseum:placetype":"boardingarea",
	"wof:placetype":"concourse"
}	

So, which is it? The short answer is “both” which is an unsatisfying answer, not to mention a little confusing.

The longer answer is that every SFO Museum data record is also a Who’s On First (WOF) record, so in the eyes of WOF the boarding area is in fact a “concourse”.

The long answer is to say, more precisely, that all the SFO Museum data is compatible and holds hands with WOF data but is not part of the core WOF datasets.

In order to play nicely with WOF the record for Boarding Area A needs to assert that its WOF placetype is a “concourse”. In order for that same data to make sense in an SFO Museum context it also states that its SFO Museum placetype is a “boarding area”.

I’ve mentioned Who’s On First (WOF) a couple times already and said earlier that it is “a giant phonebook of places, rather than people”. An abbreviated version of WOF’s own “What is Who’s On First?” page states:

  • It is an openly licensed dataset. At its most restrictive, data is published under a Creative Commons By-Attribution (soon to be CDLA) license.
  • Every record in Who’s On First has a stable permanent and unique numeric identifier. There are no semantics encoded in the IDs.
  • At rest, each record is stored as a plain-text GeoJSON file.
  • Files are stored in a nested hierarchy of directories derived from their IDs.
  • There are a common set of properties applied to all records which may be supplemented by an arbitrary number of additional properties specific to that place.
  • There are a finite number of place types in Who’s On First and all records share a common set of ancestors. As with properties, any given record may have as complex a hierarchy as the circumstances demand but there is a shared baseline hierarchy across the entire dataset.
  • Individual records may have multiple geometries or multiple hierarchies and sometimes both.
  • Records may be updated or superseded, cessated or even deprecated. Once a record is created though it can never be removed or replaced.

Earlier this year, in a blog post about maps we said:

Location and place, as you might imagine, are core to an airport. They are essential to any museum and museum collection, really, but for the sake of this blog post we’ll just say that location and place are core to our museum.

With that in mind we have chosen to use Who’s On First (WOF) to describe all the places, out there in the world, that intersect with the airport, the museum and our collection. We have also chosen to use WOF to model things inside the airport, the museum and our collection. The data model used by WOF, its flexibility in accomodating metadata unique to SFO Museum and the large body of tools for working with WOF data specifically and GeoJSON more generally make it an attractive choice.

Full disclosure: I have been, and continue to be, intimately involved with the Who’s On First project since its creation. In order to explain why I think WOF is relevant in a museum setting I need to take a brief detour and tell a short Cooper Hewitt story:

Way back in 2012 when the Cooper Hewitt collection website was being built we took the limited geographic information (countries) associates with each object and “geocoded” it.

Geocoding is the name used to describe the process of taking a name or an address for a place and associating it with a latitude and longitude or some other fixed, stable identifier. Geocoding is the art of translating fuzzy, imprecise names or labels in to something that a computer can refer to unambiguously.

We used the Flickr geocoding API to convert place names in to “Where On Earth” (WOE) IDs. For example, the WOE ID for Germany is 23424829. One reason these IDs are useful is that they allow you to “list all the objects from Germany” without having to worry about spelling (“Germany” or “germany” or “Deutschland” or “Jermaniya” or “ປະເທດເຢຍລະມັນ” or “德国” and so on…) or the need to perform a potentially expensive spatial query. Instead, the problem of finding objects associated with a specific place is reduced to a simple key-value query where the value is a number, and computers and databases are really fast at finding things with numeric indices.

Fun fact: 12521721 is the WOE ID for SFO and you can see all the geotagged Flickr photos taken at SFO by visiting https://flickr.com/places/SFO/ or https://flickr.com/places/12521721/.

Now you can see all the objects in the Cooper Hewitt collection grouped by country. So far, so good. Except that about a week later one of the curators came down to the digital offices and said: “You know that object you say is from Germany? Well, it was actually Poland at the time…” It might have actually been the other way around. I honestly don’t remember. Fundementally, the issue is the same though.

Where On Earth (WOE) is the gazetteer used by Flickr and a precursor to Who’s On First (WOF). One of the important differences between the two gazetteers is support for historical places. WOE is a “linear” gazeteer that only focuses on the present moment. WOF on the other hand starts with the notion that while places change over time there are real and important use-cases to unambiguously refer to that place at a moment in time.

The metadata for the offending object from Germany was never updated because WOE doesn’t know about 19th century Poland. This situation can only be described as “not great” when it comes to museum collections.

Historical data

Here’s another example taken from Stephen Epps’ blog post Tackling Space and Time in Who’s On First. This animation depicts the three distinct instances of the former “Yugloslavia” during the 20th century and its evolution from one nation state to seven, during the mid 1990s to the present.

There are two important conceptual devices at play here:

Phase shifts

The first device is the notion of a “phase shift”. A phase shift describes when a given place ceases to exist or is replaced by a new place. For example, when the Kingdom of Yugoslavia became the Socialist Federal Republic of Yugoslavia in 1954.

It’s not always clear what constitutes a “phase shift” or why and that’s really a much larger philosophical question than we have time to address here. What is important for our purposes is describing the relationships that make up a phase shift.

WOF does this by ensuring that every record has a wof:supersedes and a wof:superseded_by property, with pointers to the other constituent records in a phase shift.

For example, between 1945 and 1991 the Socialist Federal Republic of Yugoslavia superseded the Kingdom of Yugoslavia but was then superseded by Croatia, Slovenia and a now-smaller version of its former self:

"properties": {
	"wof:supersedes": [
		1108955789		// Kingdom of Yugoslavia (1918-45)
	],
	"wof:superseded_by": [
		1108955787,		// Socialist Federal Republic of Yugoslavia (1991-06~ - 1991-09~)
		85633779,		// Slovenia (1991-06~/1992-01~)
		85633229		// Croatia (1991-05~/1991-06~)
	],	
}

Another example might be the “phase shift” that SFO underwent in 1979 with the addition of North Terminal and then again in 1981 with the addition of Boarding Area E:

"properties": {
	"wof:supersedes": [ 1159396339 ],	// "SFO (1979-81)" supersedes "SFO (1974-79)"
	"wof:superseded_by": [ 1159396327 ]	// "SFO (1979-81)" is superseded by "SFO (1981-83)"
}

The phase shifts described in our data only track major changes to the buildings or the terminals that passengers might experience. The opening or closing of a boarding area, for instance, but not behind-the-scenes changes like modifications to the baggage-handling infrastructure.

The “Extended DateTime Format” (EDTF)

Early work at mapping out and visualizing the relationships between all the terminals and boarding areas and "instances" of SFO.

The second conceptual device is the use of the Library of Congress’ Extended DateTime Format (EDTF) to describe dates. EDTF’s strength lies in its rich syntax to describe imprecise, uncertain and sometimes disputed dates. Did you notice the funny squiggles (~) at the end of some dates in the examples above? The EDTF specification states that:

The characters ‘?’ and ‘~’ are used to mean “uncertain” and “approximate” respectively, and in combination, i.e. ‘?~’, to mean “uncertain” as well as “approximate”.

Sometimes, as in the case of the former Yugoslavia there is active debate about when a place began or ceased to exist. Did Slovenia become newly independent in June 1991 or January 1992? People have and will continue to debate those details and sometimes simply noting the the dispute is more useful than being another voice trying to arbitrate the solution.

"properties": {
	"edtf:cessation": "open",			// "open" means current or on-going in EDTF-speak
	"edtf:inception": "1991-06~/1992-01~",		// sometime around the dates between June 1991 and January 1992 
}

Other times, circumstances don’t permit an exact answer. When was Boarding Area C rebuilt? We know that it was sometime after 1983 but also before 1988.

Most data, but especially cultural heritage data, reflect the art of managing imperfect or absent data and EDTF allows us to soften the sharp edges of that reality while saying something about a subject.

Ambiguity and absence

Approximate geometries

You may have noticed that some of the geometries, like the dates, for the SFO architecture data are rough and approximate. I like to refer to these geometries as “cardboard cutout” geometries.

We will update these geometries with accurate tracings in time (especially now that we have historic maps) but until we then we are publishing them as a “good enough is perfect”, or at least “something is better than nothing”, solution.

"properties": {
	"mz:is_approximate": 1
}

Imperfect geometries are denoted by the mz:is_approximate:1 property and you should adjust your use of the data accordingly. The easiest way to think about approximate geometries is to understand that they are more “there” than not.

For example the old Boarding Area A is guaranteed to be located inside the SFO campus and not, say, across the highway in San Bruno but it might not be exactly there.

“Null Terminal”

Some records in the data we’ve published belong to something called “Null Terminal”. Null Terminal is a MacGuffin and a play on the notion of Null Island, a fictional island located at geographic coordinates of 0.0, 0.0.

I like to think of “Null Island” as something that exists as a place without necessarily have spatial properties. For example, you might know that a venue is in a particular neighbourhood without knowing its exact coordinates. That venue might be considered to be “visiting Null Island”.

We use Null Terminal as a device to signal (or contain) places with uncertain geographies or locations that we can’t or won’t share with the public. For example, the entirety of the permanent collection or all the loan objects we’ve displayed over the years or individual works of public art that may not be on display at the moment.

Null Terminal isn't really a building. We might change its (SFO Museum) placetype in the future.

Null Terminal itself is part of the SFO campus but good luck trying to find it.

What’s next

We have been building a lot of scaffolding and plumbing to identify and promote those aspects of the airport that are considered “first-class objects” in our work and to ensure they have a stable and permanent (a “warm and fuzzy”) place to live on the internet.

What’s next is to continue folding in, and publishing, more and more of the museum collection. We’ve begun at the macro-level (maps) and narrowed the focussed to the airport (historic maps) and narrowed it again to specific architecture all the way down to galleries (this blog post).

What’s next is to find out what happens when the rest of the collection – remember SFO Museum is a museum and a library and an aviation collection with objects, people and places from everywhere in the world — gets to play in that same sandbox.