The Case of the Missing (Istanbul) Airport

This is a blog post by aaron cope that was published on August 01, 2022 . It was tagged whosonfirst and golang.

Postcard: globe, Pan American Airways, Miami. Paper, ink. Gift of Thomas G. Dragges, SFO Museum Collection. 2015.166.1228

On the surface this is a blog post documenting the steps to add a new record (an airport) to a catalog of geographic places (the sfomuseum-data-whosonfirst GitHub repository). Scratching the surface, though, it’s really a blog post about how SFO Museum supplements and extends the Who’s On First to meet the needs of our online efforts.

Much of what follows has been discussed in the Who’s on First at SFO Museum and SFO Museum, Who’s On First and Airports blog posts from 2018. To recap: SFO Museum publishes its openly licensed data as Who’s On First -style GeoJSON records. We do this because place is inherent to the airport (SFO) and to SFO Museum’s aviation collection. Who’s On First allows the museum to connect its data with a global database of places using a plain-text format (GeoJSON) that is universally supported by developer and GIS (geographic information system) tools alike.

We source data for places outside the airport campus directly from Who’s On First and we author and manage data inside the SFO campus modeling these records as though they were Who’s On First records. This allows the two datasets to “just work” together. It is possible to associate our efforts not just with the rest of the world but also with any other project using Who’s On First to identify places and without the hassle of determining whether our specific records should be part of the core Who’s On First dataset.

One class of places “outside” the SFO campus that we definitely care about are the other airports servicing flights in and out of our airport. You might think that if there was ever a class of places which already had canonical identifiers it would be airports. Without getting in to all the reasons why, except to point out that three-letter IATA airport codes are periodically re-used, I’ll just say that its often harder to distinguish one airport from another than it should be.

Model airplane: Turkish Airlines, Boeing 777-300ER. Plastic, paint, wood, metal. Transfer from City and County of San Francisco, Office of the Mayor, SFO Museum Collection. 2017.073.001 a d

To account for this problem we do a bunch of work behind the scenes to map airports, airlines and aircraft, regardless of how they choose to identify themselves, to their corresponding Who’s On First record (and ID). That’s fine so long as there is an existing Who’s On First record for every airport …but what if there isn’t? The number of new airports opening every year is pretty low but it’s often greater than zero. This turned out to be the case with Istanbul Airport which opened in 2018 and whose absence (in Who’s On First data) we only noticed recently. Oops.

This is what we did to rectify that absence, told in two parts. The first part documents how SFO Museum imports existing Who’s On First records in to its own data catalog. The second part documents what to do when there is a place SFO Museum wants to import but that doesn’t have a record in Who’s On First yet. These examples are, by and large, tailored to SFO Museum’s needs but we are sharing them in the spirit of generosity to demonstrate how we manage a custom dataset of places on top of the “core” Who’s On First data.

Adding a new airport to sfomuseum/sfomuseum-data-whosonfirst

Airmail flight cover: Qantas Empire Airways, Boeing 707. Paper, ink. Gift of Mrs. Siusiadh Rasmussen, SFO Museum Collection. 2003.013.001

As mentioned, airport data is sourced from the whosonfirst-data repositories stored on GitHub. The first thing to do, then, is figure out which respository a given airport record is stored in. You can use the Who’s On First Spelunker to look up this data.

Note that some airports are stored in the whosonfirst-data-admin-xy repository because at the time of the “great splitting of the whosonfirst-data-admin repository in to per-country repositories” that airport’s country of origin wasn’t able to be determined. While importing airports that are in the -admin-xy repository is a good opportunity to move the record in to the correct repository that is not strictly necessary.

Although data is sourced from the Who’s On First (WOF) project, SFO Museum maintains local copies of WOF records in its own sfomuseum-data-whosonfirst repository. We do this for two reasons. First, we only need a subset of the total number of WOF records so it’s easier to keep the records we care about in a dedicated repository. Second, we apply SFO Museum-specific properties to those records; changes which are not relevant or suitable to be merged in to the “core” WOF dataset.

The go-sfomuseum-whosonfirst package was written to provide tools for importing data from WOF in to the sfomuseum-data-whosonfirst repository. For example:

$> cd /usr/local/sfomuseum/go-sfomuseum-whosonfirst

$> ./bin/import-feature \
	-reader-uri 'github://whosonfirst-data/whosonfirst-data-admin-{COUNTRY}' \
	{ID} {ID} {ID}

The import-feature tool performs the following tasks:

  • It will retrieve the record for each (WOF) ID specified, as well any relevant ancestors for that ID (region, country).
  • For each ID fetched it will create a corresonding JSON file in the sfomuseum-data-whosonfirst/properties folder. These JSON files are meant to contain any additional SFO Museum-specific properties or property values that should be overwritten (for example sfomuseum:placetype or wof:repo).

Once the records have been imported make sure to commit them to the sfomuseum-data-whosonfirst repository:

$> cd /usr/local/data/sfomuseum/sfomuseum-data-whosonfirst
$> git add {NEW FILES}
$> git commit -m "Add ..." {NEW FILES}
$> git push origin main

Commiting the changes is relevant to the go-sfomuseum-airfield package which provides pre-compiled lookup tables for things related to the SFO airfield (airlines, aircraft, airports). By default these tables are built by fetching the sfomuseum/sfomuseum-data-whosonfirst repository from GitHub. For example:

$> cd /usr/local/sfomuseum/go-sfomuseum-airfield
$> make compile
$> git commit -m "recompile data" .
$> git push origin main

Commiting the changes (to the go-sfomuseum-airfield package) is also relevant because a lot of other tools that use those lookup tables build them on the fly by fetching the serialized tables over the wire from GitHub; for example flight data. Committing these lookup tables to GitHub and then retrieving them over the wire allows us to update airfield data without involving the time-consuming process of updating every other package that uses go-sfomuseum-airfield.

Adding a new airport to whosonfirst/whosonfirst-data-admin-*

Poster: Pan American World Airways, Routes of the Flying Clipper Ships. Paper, ink. Gift of the Pan Am Association, SFO Museum Collection. 2000.058.0206 a b

Sometimes (not often) there are new airports which haven’t been added the Who’s On First (WOF) project yet. This is an example of how to create a basic record for such a record, in this case Istanbul Airport in Turkey. The first step is to clone the whosonfirst-data-admin-tr repository:

$> git clone \
	--depth 1 \
	git@github.com:whosonfirst-data/whosonfirst-data-admin-tr.git \
	/usr/local/data/whosonfirst-data-admin-tr

The next step is to build a SQLite database, with relevant spatial tables, that we can use to perform “point-in-polygon” operations to determine the new airport’s parent and ancestors. Use the tools in the go-whosonfirst-sqlite-features-index package to create this database:

$> cd /usr/local/whosonfirst/go-whosonfirst-sqlite-features-index

$> ./bin/sqlite-index-features \
	-all \
	-timings \
	-dsn /usr/local/data/whosonfirst-data-admin-tr.db \
	/usr/local/data/whosonfirst-data-admin-tr

For Turkey it takes about 3-4 minutes to create the /usr/local/data/whosonfirst-data-admin-tr.db database. These spatially-enabled databases were previously discussed in the Reverse-Geocoding in Time at SFO Museum blog post, published in 2021.

Once the new spatial database has been created you need to reference it when invoking the wof-create tool, which is part of the whosonfirst/go-whosonfirst-exportify package. For example:

$> cd /usr/local/whosonfirst/go-whosonfirst-exportify

$> ./bin/wof-create \
	-writer-uri repo:///usr/local/data/whosonfirst-data-admin-tr \
	-resolve-hierarchy \
	-spatial-database-uri 'sqlite://?dsn=/usr/local/data/whosonfirst-data-admin-tr.db' \
	-geometry '{"type":"Point","coordinates":[28.727778,41.262222]}' \
	-string-property 'properties.wof:placetype=campus' \
	-string-property 'properties.wof:country=TR' \
	-string-property 'properties.wof:name=Istanbul Airport' \
	-string-property 'properties.wof:repo=whosonfirst-data-admin-tr' \
	-int-property 'properties.mz:is_current=1' \
	-string-property 'properties.edtf:inception=2018-10-29' \
	-string-property 'properties.edtf:cessation=..' \
	-string-property 'properties.src:geom=wikipedia'

This will create a new, and minimal, record for the Istanbul Airport which can then be updated by hand as necessary. For example, this preliminary record has been created with only a Point geometry for the airport rather than a polygon depicting the geometry of the airport’s campus.

For testing and debugging purposes you can emit the new record to STDOUT but assigning the -writer-uri flag like this:

$> ./bin/wof-create \
	-writer-uri stdout:// \
	{OTHER OPTIONS}

Commit the new record (in this case to the sfomuseum-data-admin-tr repository) and then import it in to the sfomuseum-data-whosonfirst repository as described above.

Properties

Timetable: Turkish Airlines, summer schedule. Paper, ink, metal. Gift of Thomas G. Dragges, SFO Museum Collection. 2015.167.755

One reason we clone “core” Who’s On First records in to a sfomuseum-data-whosonfirst repository is that we append those records with a variety of SFO Museum-specific properties. The Who’s On First project aims to publish online and machine-readable definitions for any and all properties defined across its entire corpus. Given a property in a Who’s On First record it should be possible to easily derive the URL for its definition in the whosonfirst-properties respository. For example the URL for the wof:placetype property is:

Where wof:placetype becomes wof/placetype.json and the corresponding definition file looks like this:

{
    "id": 1158807975,
    "name": "placetype",
    "prefix": "wof",
    "description": "Represents a common, common-optional, or optional string value of what WOF calls a record's administrative division type.",
    "type": "string"
}

These property definition files build on work that the Cooper Hewitt proposed, in 2013, for cultural heritage organizations to publish a common “glossary.json” file describing the keys, labels and properties they use to describe their collections.

It’s long been something of a holy grail of museums, and cultural heritage institutions, to imagine that we can define a common set of metadata standards that we will all use and unlock a magic (pony) world of cross-institutional search and understanding. The shortest possible retort to this idea is: Yes, but no.

We can (and should) try to standardize on those things that are common between institutions. However it is the differences – differences in responsibilities; in bias and interpretation; in technical infrastructure – that distinguish institutions from one another. One need look no further than the myriad ways in which museums encode their collection data in API responses to see this reality made manifest.

I choose to believe that there are good and valid, and very specific, reasons why every institution does things a little differently and I don’t want to change that. What I would like, however, is some guidance.

In the same way that SFO Museum layers its own data on top of Who’s On First records we have created a separate collection of SFO Museum-specific property definition files stored in the sfomuseum-properties repository to supplement properties already described in the whosonfirst-properties repository. We manage things using the index-properties tool in the go-whosonfirst-properties package. For example, we can index all the properties in all the sfomuseum-data-* repositories storing only those not already found in the whosonfirst-properties/properties folder (and excluding properties starting with misc:) like this:

$> ./bin/index-properties \
	-exclude 'misc\:.*' \
	-alternate /usr/local/whosonfirst/whosonfirst-properties/properties \
	-properties /usr/local/sfomuseum/sfomuseum-properties \
	/usr/local/data/sfomuseum-data-*

As of this writing many of the SFO Museum property definition files are incomplete, lacking human readable descriptions. Those will follow in time. An incomplete record is not ideal but it is better than no record at all. An imcomplete record having announced its presence can be updated and improved. It is difficult to say the same of a record whose existence isn’t recorded anywhere. Like, for instance, the no-longer missing record for Istanbul Airport or all the records the sfomuseum-data-flights- repositories that pointed to the wrong airport and which have since been updated.

Software mentioned in this blog post

There’s a lot of code working behind the scenes to make all of this possible. Here is the list of software packages explicitly referenced in this blog post: