Capturing flight data at SFO and SFO Museum

This is a blog post by aaron.cope that was published on January 18, 2019 .

annual report: San Francisco International Airport (SFO), 1975/1977 [1 issue: 1975/1977], 1977, collection of SFO Museum, 2007.048.015

It’s a bit awkward and even a little tacky to start by quoting your past-self but here is a thing I said, way back in 2015, to frame everything will follow in this blog post:

To collect a Nest thermostat absent of any data, what does that tell you? It tells you it’s a beautiful piece of industrial design [but] maybe the museum should start thinking about some way of keeping [the data that a Nest collects] alongside the object, and maybe [that data] doesn’t need to be privileged in the way the object is.

It is this same spirit that we’re happy to announce that we have started to collect and publish flight data for all the arrivals and departures in and out of SFO.

This is not real-time or upcoming flight data. It is historical data compiled by harvesting flight data throughout the day, aggregating it overnight and finally publishing atomic records (yes, Who’s On First records) for every flight that graces our runways.

That’s interesting enough on the face of it but what we think is even more exciting is that every record contains pointers back to things already in the SFO Museum collection.

Things like the airlines that operated the flights, the gates they flew in and out of and the airports they went to or came from. With only a few exceptions all of the airlines and gates and airports that comprise any given flight, on any given day, all have a pre-existing relationship with the objects in our collection.

When we say that we want to “ensure that every aspect of a trip to SFO, and every facet of someone’s time spent in the airport, leads back to the museum’s collection” that definitely includes the flight you were on!

We’ll fix that one errant map tile covering Indonesia in the screenshot above shortly, I promise!

Each flight points to the specific instance of a relation (an airline, a gate, etc.) at that moment in time. This will become relevant as soon as the first phase of the Harvey Milk Terminal (or “new T1”) opens later this year. The meaning (or at least the context) of arriving at T1 in December 2019 won’t be the same as it was in January of the same year.

Eventually we will add aircraft and tail numbers as additional data becomes available. Piggybacking on our current data source has allowed us to get ahead of other work in progress (we’re at an airport after all so people are pretty busy keeping planes in the sky and stuff :-) and to imagine what working with flight data means in a museum context.

That’s why I started with this post with that old quote about the Nest. When I look out across the airfield at the terminals I see a great big Nest, one that produces a lot more data than a home thermostat. SFO Museum doesn’t really collect “the airport” as “an object” per se but our mandate is to tell the story of the airport so the definition of what the airport “is” can be a little… squishy.

Since airplanes flying from one place to another are pretty central to an airport’s purpose it seems important to capture the data around those activities specifically as a way to help people understand what SFO has done in the past and how those efforts have changed over the years. To show, as a museum, how the evolution of air travel has affected the airport and the community that it is a part of.

This is also early and experimental work so in the short-term you should adjust your expectations accordingly. We are going to “figure this out by doing it” but that probably also means there will be some mistakes along the way.

Importantly, these data are not being accessioned in to the collection yet. There is not a corresponding accession number for each flight on every day. It is reasonable to ask: Does it really make sense to collect every single flight? Even if it does, is this data better suited for a library or an archive, rather than the museum?

report: Mills Field Municipal Airport of San Francisco, Miscellaneous Data, 1928, Collection of SFO Museum, 2000.018.080 a c

Conveniently, SFO Museum is all three of these things so we’re well-positioned to craft an answer but they remain valid questions. Does this data really tell us anything in the moment and if not have museums ever collected these sorts of data as a bet on future-histories rather than as a reflection of the past?

In recent years a number of initiatives in the cultural heritage have, with the help of their audiences, compiled impressive catalogs of past data ranging from the Old Weather project to transcribe historic ship’s logs from the 19th and early 20th centuries to the New York Public Library’s What’s on the Menu? project which does the same for restaurants.

Where there are efforts to capture contemporary data, like the Galaxy Zoo project which asks participants to classify images of galaxies, they generally aren’t undertaken by a cultural heritage organization. Scientists do it all the time; collecting sample data, year over year, for future use and research but do museums?

All things being equal are we even in a position to store and manage (and care for) the volume of data that an airport like SFO produces? In 2019, SFO is the 7th busiest airport in the United States so there are a lot of flights, and more keep coming every day.

The short and simple answer to the question of simply keeping all this data, from a technology standpoint, is “absolutely” but this remains uncharted territory for the cultural heritage sector whether it’s infrastructure or practice or policy. Collecting (as in little-C collecting and not captial-C museum-grade collecting) and publishing and working with flight data as part of the Mills Field website is our way to poke at some of these questions to help understand how we should answer them.

We have been gathering data for all of the flights in and out of SFO since January 01, 2019. We also have data going back through December 2018 which we’ll process shortly.

The ingest process hasn’t been fully automated yet. Things are generally updated daily there aren’t fixed intervals for new data yet.

Flights can be filtered by year (/YYYY) or month (/YYYY/MM) or day (/YYYY/MM/DD). For example:

In addition to unique records for each flight we also rollup all the flights for a given flight number. For example:

You can’t do any kind of filtering (dates, arriving, departing) on /flights/FLIGHT_NUMBER URLs yet. You can also see flights for a specific gate, optionally showing all the airports that flights have arrived from or are traveling to:

You can also reference gates by the actual gate number. In 2019 ID 11159157793 is the same as Gate D55 so all those links reference the same place. If gate D55 ever moves, or otherwise changes, that label might point to a different ID.

It is possible to see all the unique airlines that have flown in or out of a gate too:

By default, we only show airlines with actual planes at the gates but you can also see all the other airlines that had code-sharing flights (with the plane-operating airline) out of a gate by appending /all to the URL. For example:

And, we can show all the flights for a given airline. For example, all the Air Canada flights, in and out of SFO:

Or even:

Remember 1159283597 is the same as AIR CANADA is the same as ACA is the same as AC. Those are the SFO Museum ID, ICAO callsign and code and IATA code respectively, each of which refers to Air Canada. I hope that someone, somewhere, is working on a way to do for airline codes and the ✈️ emoji what the Unicode Consortium worked out for countries and flag emojis.

Flights for airlines can be filtered by date as well as by arrivals and departures:

Those results can be filtered again to list only those flights that an airline is operating themselves, using the undocumented ?principals=1 flag:

Although we are mentioning the flag here it will remain undocumented for the time-being because the combinatorial hoo-hah (not to mention copy writing) to handle date filters, arrivals and departures and code-shares starts to add up and, overall, is still pretty low on the list of priorities.

Oh, and flights in and out of airports (assuming one end of the journey is SFO) of course:

annual report: San Francisco International Airport (SFO), 2004 [1 issue: 2004], 2004, collection of SFO Museum, 2005.038.001 a b

There are syndication feeds describing the “shape” of the flight data for each of the last 15 days in both the RSS and Atom feed formats.

As always we are publishing this data as a SQLite database on own web site and as raw Who’s On First -style records under the sfomuseum-data GitHub account. Because of the volume of flight data we are planning to bundle things in smaller monthly batches with the following naming convention: sfomuseum-data-flights-YYYY-MM. For example:

As mentioned earlier each flight is published as Who’s On First -style (WOF) record where every feature’s geometry is a MultiPoint containing the centroid for the departure and arrival airports. The label geometry (or centroid) is the gate at SFO that the flight arrived at or left from. In the rare case when we are unable to determine a gate number we use SFO’s label centroid.

As with airlines and enterprises and then photos, we are continuing to overload the semantics of the WOF placetype property. The wof:placetype for a flight is said to be an event and the sfomuseum:placetype is, unsurprisingly, a flight. We have some ideas about how better to address this proliferation of made-up WOF placetypes but we’ll save that for another time.

Please remember that this remains highly experiental work right now. If you are planning to try using it, for anything at all, understand that everything is in flux and subject to change and may still break along the way.

If you do end up playing with the data and build something you’d like to share please let us know. Enjoy and ✈️ ✈️ ✈️ !

negative: San Francisco International Airport (SFO), runway construction and airplane on approach, 1958, collection of SFO Museum, 2011.032.0468