Past flight data at SFO and SFO Museum (2006 - 2018)

This is a blog post by aaron cope that was published on March 20, 2020 . It was tagged sfo, faa, flightdata, history, opendata, airplanes, airlines and airports.

annual report: San Francisco International Airport (SFO), 1990 [1 issue: 1990], Paper, ink. Transfer, SFO Museum Collection, 2001.057.001.001-.003

Today we’re happy to announce the availability of historical flight data in and out of SFO for the years 2006 through 2018. That brings the total number of flights published to just under 4.9 million! These data are available as raw GeoJSON files via the sfomuseum-data GitHub organization:

We’re also in the process of producing per-year SQLite distributions for these flight data, as we did for 2019, and will update this blog post with links when they are complete.

Pre-2019 data is sourced from the records of the aiport’s Noise Abatement department with the following differences from more recent flight data:

  • Coordinate and elevation data for a flight’s takeoff or approach are not currently included. We have that data and will make it available soon. It involves some additional processing and we decided that it was worth publishing the data as-is, with pointers and coordinate data for arrival and departure airports, first.

  • We don’t know the tail numbers for any of these flights but we do know the airline and aircraft model for every flight from 2006 to 2018. For example, there have been 139,000 Boeing 747-400 flights in and out of SFO since 2006!

  • Code-sharing information between airlines is unknown.

  • Gate information is unknown, but we have ensured that each flight is associated with the SFO Terminal Complex as it existed at the time of arrival.

  • Some of the sfomuseum:flight_id property values will change to better reflect a flight’s departure date. The reasons why are discussed below.

negative: San Francisco International Airport (SFO), architectural diagram, Negative. Transfer, SFO Museum Collection. 2011.032.1972

The sfomuseum:flight_id property is a unique identifier that we use to key flights from external data sources with flights that have been recorded and published as sfomuseum-data-flights-YYYY-MM records. It is a compound identifier consisting of the departure date, a flag indicating whether the flight is arriving at or departing from SFO, the three-letter ICAO airline code and the flight number.

For example the key 20100127-A-CPA-872 identifies Cathay Pacific flight 872 which arrived at SFO, departing Hong Kong on January 27, 2010. The flight itself landed at SFO at 12:53 which means it would have left Hong Kong around nine or ten o’clock in the morning that “same day”. That’s the funny part about flying back to California from Asia. As the clock flies it only seems to take a couple of hours despite being in the air for twelve, or more, hours.

annual report: San Francisco International Airport (SFO), 1972/1973 [1 issue: 1972/1973], Paper, ink. Transfer, SFO Museum Collection, 2007.048.014

The problem is that because of distance and timezones we know that some flights arriving at SFO will have departed the previous day. The Noise Abatement data only has dates and times for arrivals at or departures from SFO. They don’t have data for flights departing another airport on their way to SFO. In the absence of that data every sfomuseum:flight_id property for flights arriving at SFO was derived using the date the airplane operating that flight touched down in San Francisco.

It should be possible to calculate an approximate departure date and time using a combination of average air speed, great circle distance between airports and timezone information. There are enough variables in this scenario that there would probably still be some mistakes, namely an inaccurate departure date, but it would also still be closer to reality than what we’ve got now. At the same time it would require just enough work that, like the coordinate data for takeoffs and approaches, it seems more valuable to get the bulk and the broad stokes of the data published now and to revisit these details later. If anyone else would like to take up the challenge before we do we would welcome your contributions and pull requests!

annual report: San Francisco Public Utilities Commission, 1953/1954 [1 issue: 1953/1954], Paper, ink. Transfer from City and County of San Francisco, SFO Museum Collection, 1999.281.003

Is it a little surreal to be publishing and writing about historical flight data at a time when airplane travel is contracting, and in many places grinding to a halt, in order to stem the spread of the Covid-19 virus? It’s definitely something for sure but it also seems important to remember that these data help tell the story of the airport. Just as the impact of contemporary travel bans and airlines cutting back on routes will be visible in the flight data for 2020 what else might we understand about the history of SFO, and of commercial aviation, in earlier data?

Importantly, maybe we won’t (or can’t) see those stories today but that doesn’t mean that someone else won’t see something in the future. We are preemptively betting that the future will see the proverbial forest where the present, perhaps, can only see the trees.

“The phrase “designing for patience” is meant to reflect the reality that cultural heritage institutions no longer enjoy a monopoly on the general public’s attention. This shift away from the academy towards large commercial enterprises as well as the proliferation of smaller niche publishers, all developing and promoting ever more cultural production, has been underway for decades. In recent years, and particularly with introduction of the internet and low-cost mobile computing, the shift has picked up speed often being likened to a “firehose” in both its intensity and, increasingly, the inability to meaningfully make any sense of it.

“I believe that the broader mission of the cultural heritage sector, and the humanities generally, dictates that we should not be competing with the immediacy of the firehose. Instead we should use digital technologies to develop the infrastructure to ensure that our collections and our holdings are available and accessible and relevant long after the firehose has passed. To be confident enough to believe that people will revisit an idea, to be patient enough to wait for them and to be sustainable enough to make both a reality.”

Cope, Aaron. “Capacity Planning for Meaning.” MW20: MW 2020. Published February 21, 2020. Consulted March 19, 2020. https://mw20.museweb.net/paper/capacity-planning-for-meaning/


With that in mind, we’d also like to take this opportunity to announce that we are collecting and publishing the output of the Federal Aviation Agency’s (FAA) Airport Status API. We’ve actually been collecting these data since August, 2018 and recently made them public through the sfomuseum-data GitHub organization. The status API is polled every 15 minutes and archived if and when its output is different from the last recorded result. As I write this, there are nearly 15,000 status reports spanning two and a half years.

Like flight data these records are modeled as Who’s On First documents and FAA specific data is stored in properties with an faa: prefix. For example:

{
  "id": 1528787713,
  "type": "Feature",
  "properties": {
    "edtf:cessation": "2020-02-21T16:22:54",
    "edtf:inception": "2020-02-21T16:22:54",
    "faa:credit": "http://weather.gov/",
    "faa:delay": false,
    "faa:delay_count": 0,
    "faa:last_update": "Last Updated on Feb 21 2020, 3:56 pm PST",
    "faa:temperature_c": 19.4,
    "faa:temperature_f": 67,
    "faa:temperature_raw": "67.0 F (19.4 C)",
    "faa:visibility": 10,
    "faa:wind_direction": "North",
    "faa:wind_raw": "North at 0.0",
    "faa:wind_speed": 0,
    "geom:area": 0,
    "geom:bbox": "0.000000,0.000000,0.000000,0.000000",
    "geom:latitude": 37.62,
    "geom:longitude": -122.37,
    "mz:is_current": 0,
    "sfomuseum:placetype": "weather",
    "sfomuseum:uri": "SFO/2020/02/21/20200221T162254.json",
    "src:geom": "flysfo",
    "wof:belongsto": [
      102527513, 
      102191575, 
      85633793, 
      102087579, 
      85922583, 
      102085387, 
      554784711, 
      85688637
    ],
    "wof:concordances": {
      "iata:code": "SFO",
      "icao:code": "KSFO"
    },
    "wof:country": "US",
    "wof:created": 1582302174,
    "wof:geomhash": "6e11b039d229c1f441fd06b875619d83",
    "wof:hierarchy": [
      {
        "campus_id": 102527513,
        "continent_id": 102191575,
        "country_id": 85633793,
        "county_id": 102087579,
        "locality_id": 85922583,
        "postalcode_id": 554784711,
        "region_id": 85688637
      }, 
      {
        "campus_id": 102527513,
        "continent_id": 102191575,
        "country_id": 85633793,
        "county_id": 102085387,
        "region_id": 85688637
      }
    ],
    "wof:id": 1528787713,
    "wof:lastmodified": 1582330976,
    "wof:name": "FAA status for SFO, 2020-02-21T16:22:54",
    "wof:parent_id": 102527513,
    "wof:placetype": "custom",
    "wof:repo": "sfomuseum-data-faa-2020",
    "wof:superseded_by": [],
    "wof:supersedes": []
  },
  "bbox": null,
  "geometry": {"coordinates":[-122.37,37.62],"type":"Point"}
}

As mentioned the data is available, and updated throughout the day, via the sfomuseum-data GitHub organization. There aren’t currently plans to publish these data as SQLite distributions but if there is a need or a desire we can revisit that decision.

We look forward to seeing the tales that might be fashioned from the stories these data tell.

timetable: TWA (Trans World Airlines), cargo, Paper, ink. Gift of the William Hough Collection, SFO Museum Collection, 2004.106.143