Geotagging at SFO Museum: Protomaps, search and reverse-geocoding

This is a blog post by aaron cope that was published on May 03, 2021 . It was tagged golang, geotagging, whosonfirst, geocoding, reverse-geocoding, placeholder and maps.

See more calendars in "The #YearsFlyBy: Airline Calendars" on display, pre-security, in the Aviation Museum. Twitter, June 09, 2017

It’s been a while, a whole year in fact, since I’ve written anything about the geotagging project at SFO Museum. The final post – “part 11” – alluded to in all the other posts still hasn’t been written yet, at least not formally. The argument that I started laying out in parts 1 through 10 and that was meant to be wrapped up in part 11 has instead been addressed in other subsequent posts like Small focused tools and iOS Multi-screen Starter Kit.

Negative: San Francisco International Airport (SFO), cargo operations. Negative. Transfer, SFO Museum Collection. 2011.032.0895.

The argument was also at the center of a talk I delivered at MCN 2020, titled The elephant in the room: Build it or buy it? During that talk I said:

I believe we can’t afford to not tackle the staffing problem because some amount of in-house development is the only way we will make our digital initiatives sustainable beyond any single capital fund-raising cycle.

But there is also a need to change the way we do that in-house, and in-sector, development.

Historically, one of the things that has characterized software projects built to be used by more than a single institution is that they are overly broad. They try to do too much and account for too many customizations necessary to account for too many different institutions. They are too much of everything and not enough of anything.

We need more small, focused tools that don’t try to be all things to all people, all the time.

As a very practical first step to finding a way out of the build-versus-buy hole we’ve dug for ourselves we need to develop a new practice for the way we build the tools we use internally so that they might be split in to smaller pieces that can be shared by as many people as possible externally.

This idea of small, focused tools, of recognizing when and where work can be split in to smaller, shareable pieces, has been at the heart of the work of the geotagging project. Looking back on the earlier posts it’s easy to feel overwhelmed by all number of different moving pieces being described.

The bad news is that, when you look closely, there really are that many moving pieces when it comes to something like geotagging photos. Commercial ventures spend a lot of time to hide those complexities from their customers and those are valuable efforts but too often they come with strings attached, usually in the form of vendor lock-in and an over-bundling of features and functionality.

In designing his soup tureen and ladle, Chris Knight stated his intention to “take a breath and move away from [his] confrontational work to make a simple, functional piece of silverware” and “a small comment on the cross-cultural nature of modern society in the millennium…all nationalities have their soup.” Instagram, January 29, 2019

The good news is that nearly all of those moving pieces, from the underlying data to the tools to operate on those data, are within the reach of cultural heritage as low-cost and open-source alternatives to the commercial offerings. We may still need to stitch those pieces together to meet the needs of a specific institution but at least they are within reach now.

That stitching together, and the building and designing of a toolkit of parts which facilitates that stitching together, should be the work of anyone who says the words “digital technologies” and “cultural heritage” right now. It is extra work, for sure, but I don’t see how the cultural heritage sector will escape the corner it’s painted itself into, when it comes to the cost of developing and maintaining its digital initiatives, without doing that extra work.

Negative: San Francisco Airport, 25th anniversary celebration. Negative. Transfer, SFO Museum Collection. 2011.032.0289.

The geotagging application, implemented in the go-www-geotag package, has been designed as a kind of layer-cake with those goals in mind:

  • Something simple - What is the simplest application to complete the act of geotagging, independent of where the data being geotagged comes from or is published to?
  • Something complex - How does that simpler application need to be built to enable more complicated workflows, like allowing for a variety of input and output sources?
  • Something specific – How does the complex application evolve to meet the specific needs and goals of SFO Museum.

This post will discuss three updates to the go-www-geotag package at the “something simple” layer. As it happens these changes don’t actually have anything to do with geotagging photos. These are all features and functionalities that could apply broadly to the requirements of geotagging anything. They are:

  • An updated user-interface and improvements to how search (or “geocoding”) results are displayed. These should be considered as improvements on what was done before rather than the finished project. The changes you see now reflect some of the ongoing fluidity in the requirements and demands necessitated by the rest of the application.
  • Support for the Protomaps mapping libraries. Protomaps, a project by Brandon Liu, is the distillation of about fifteen years worth of different efforts related to online mapping into one handy toolkit. It is early days for the project but its potential for the cultural heritage sector is very, very exciting. Brandon’s post Should web maps be centralized services? is definitely worth reading and touches on many of the same issues we’ve discussed about maps on this blog.
  • Support for reverse-geocoding things that have been geotagged. Reverse-geocoding is a subject I’ve been writing about frequently these days and I am pleased to say that those efforts have been able to be intergrated in to the larger geotagging project. This is a concrete example of “smaller, shareable pieces” being stitched together in to a bespoke application.

The go-www-geotag package contains a tool called server that launches a web application for geotagging photos. It can be started like this:

$> bin/server \
	-map-renderer protomaps \
	-protomaps-tile-url file:///usr/local/data/pmtiles/sfo.pmtiles \
	-enable-oembed \
	-oembed-endpoints 'https://millsfield.sfomuseum.org/oembed/?url={url}&format=json' \
	-enable-point-in-polygon \
	-spatial-database-uri 'sqlite://?dsn=/usr/local/data/sfomuseum-architecture.db'
	
2021/04/28 13:14:41 Listening on http://localhost:8080

If you visit http://localhost:8080 in your web browser you’ll see something like this:

In this configuration the server application is a simple three-pane interface for geotagging images.

  • The left-hand panel contains a map and a Leaflet.GeotagPhoto camera control for defining a focal point and a field of view. The Leaflet.GeotagPhoto extension, originally developed by NYPL Labs, is the first “smaller, shareable piece” that comprises the geotagging application. We wrote about it in Geotagging Photos at SFO Museum, Part 1 – Setting the Stage.

  • The middle panel contains controls for loading images and for filtering and selecting reverse-geocoding results (that are updated as the camera’s focal point changes).

  • The right-hand panels contains serialized GeoJSON data representing information about the camera’s focal point and field of view as well as any reverse-geocoding data, if present, encoded in the wof:parent_id and wof:hierarchy properties. These properties are pointers to specific geographic locations encoded as Who’s On First records.

Here’s a reminder of what an earlier version of the user interface looked like:

The server application as configured does not have any way to publish data outside of copy-pasting the raw GeoJSON data in to another document.

Related: Geotagging at SFO Museum, Part 3 – What Is the Simplest Thing?, Geotagging at SFO Museum, Part 6 – Writers, Geotagging at SFO Museum, Part 7 – Custom Writers and Geotagging at SFO Museum, part 9 – Publishing Data.

Walking through the parameters, step-by-step, the first thing being configured is the toolchain for rendering map tiles. In this example we’re using Protomaps.js which renders map tiles using a single “PMTiles” database file, specified in the -protomaps-tile-url flag. These database files can be hosted locally or on a remote file-server including cloud storage services like Amazon’s S3.

	-map-renderer protomaps \
	-protomaps-tile-url file:///usr/local/data/pmtiles/sfo.pmtiles \

The sfo.pmtiles database file contains all the geographic data necessary to render maps for the area around SFO. It is only 3MB in size and doesn’t require any specialized software to produce individual map tiles. These are some of the reasons I am so excited about Protomaps, particularly for the cultural heritage sector. It offers the practical means for actively preserving map data associated with projects and, importantly, a way to preserve historical snapshots of that data. This is especially important to our museum where the physical footprint of the “SFO” in SFO Museum changes every couple of years.

You can use the Protomaps bundle/download tool to create your own, customized, Protomaps database files.

Related: Protomaps: A new way to make maps with OpenStreetMap.

The second set of options will enable controls in the user interface for loading images from a remote OEmbed endpoint. The URI in the -oembed-endpoints is where the application will resolve image requests.

	-enable-oembed \
	-oembed-endpoints 'https://millsfield.sfomuseum.org/oembed/?url={url}&format=json' \

Related: Geotagging at SFO Museum, Part 5 – Images

The final set of flags enables “reverse geocoding” (sometimes called “point-in-polygon”) lookups. These happen whenever you move the camera’s focal point and are performed using the go-whosonfirst-spatial packages, reading data stored in a SQLite database specified by the -spatial-database-uri flag.

	-enable-point-in-polygon \
	-spatial-database-uri 'sqlite://?dsn=/usr/local/data/sfomuseum-architecture.db'

The SQLite databases are created using the wof-sqlite-index-features tool in the go-whosonfirst-sqlite-features-index package, reading data from one or more Who’s On First style repositories of data. For examples, this is how the sfomuseum-architecture.db database would be created reading data from the sfomuseum-data-architecture repository:

$> ./bin/wof-sqlite-index-features \
	-all \
	-dsn /usr/local/data/sfomuseum-architecture.db \
	/usr/local/data/sfomuseum-data-architecture/
	
2021/04/22 12:49:41 time to index paths (1) 2.702893585s

As you move the camera around the map the list of places containing the camera’s location is updated in the center panel. When you select one of them metadata about that place, defined in the SQLite database, is appended to the structured data recorded in the right-hand panel.

Related: Reverse-Geocoding in Time at SFO Museum

Here’s another example of how the server application might be configured to run:

$> bin/server \
	-map-renderer tangramjs \
	-nextzen-apikey {NEXTZEN_APIKEY} \
	-enable-oembed \
	-oembed-endpoints 'https://millsfield.sfomuseum.org/oembed/?url={url}&format=json' \
	-enable-point-in-polygon \
	-spatial-database-uri 'sqlite://?dsn=/usr/local/data/sfomuseum-architecture.db' \
	-enable-placeholder \
	-placeholder-endpoint http://localhost:3000

Visiting http://localhost:8080 in your web browser, you’ll see something like this:

There are two important differences in the left-hand panel containing the map. The first is that the map tiles look different. That’s because we are using the Tangram.js library to render the map. The map data itself is provided by the Nextzen project.

	-map-renderer tangramjs \
	-nextzen-apikey {NEXTZEN_APIKEY} \

Enabling the use of maps using Tangram and Nextzen is done by assigning the -map-renderer and -nextzen-apikey flags respectively. In order to use Nextzen tiles you’ll need to provide a valid Nextzen developer API key which can be created at https://developers.nextzen.org/.

Related: Maps (and map tiles) at SFO Museum and More recent old maps (and the shapes in the details).

The second difference is that the map has a search box in the upper right-hand corner. Searching for places (also called “geocoding”) is assumed to be handled by the Placeholder search engine. In future releases of the go-www-geotag tool other search engines may also be supported. The Placeholder serivce is not bundled with the go-www-geotag so that’s assumed to be have been set up and run separately. We’ve written about doing that in the Using the Placeholder Geocoder at SFO Museum blog post.

	-enable-placeholder \
	-placeholder-endpoint http://localhost:3000

Enabling and specifying the Placeholder endpoint are handled by the -enable-placeholder and -placeholder-endpoint flags, respectively. This is what the geocoding interface used to look like:

Related: Using the Placeholder Geocoder at SFO Museum and Geotagging at SFO Museum, Part 4 – Search.

In this example we are searching for the Gowanus neighbourhood in Brooklyn. In the first example, using Protomaps to render map tiles, geocoding for “Gowanus” and moving the map to New York would yield an empty map since the sfo.pmtile PMTiles database only contains data for the area around SFO.

In this second example we are using Nextzen map tiles which have global coverage but if you look closely at the screenshot above you’ll see that now the reverse -geocoding (or “point in polygon”) queries don’t yield any results. That’s because the database we are using for reverse-geocoding only contains data from the sfomuseum-data-architecture repository.

There’s a database for geocoding, a database for reverse-geocoding and a database for cartographic representations. Each of these databases is necessary and this is one of the reasons why applications that involve “geography” can be so complicated.

The reverse-geocoding databases are made using the wof-sqlite-index-features tool which we can use to create a new database of architectural elements at SFO and all the neighbourhoods (and microhoods) in the United States. For example:

> ./bin/wof-sqlite-index-features \
	-timings \
	-all \
	-iterator-uri 'git://?include=properties.wof:placetype=(neighbourhood|microhood)&include=properties.sfomuseum:placetype=.*&include_mode=ANY' \
	-dsn /usr/local/data/us-neighbourhoods-sfo.db \
	https://github.com/whosonfirst-data/whosonfirst-data-admin-us.git \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git

The wof-sqlite-index-features tool does the work of building the database but the -iterator-uri flag specifies where and which data to include in that database. In this example we are specifying that the data will come from one or more Git repositories and that only records with a wof:placetype property of “neighbourhood” or “microhood” or records with any sfomuseum:placetype property should be included:

git:// \
	?include=properties.wof:placetype=(neighbourhood|microhood) \
	&include=properties.sfomuseum:placetype=.* \
	&include_mode=ANY' 

The data themselves are downloaded directly from GitHub:

	https://github.com/whosonfirst-data/whosonfirst-data-admin-us.git \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git

This new database will take a little while to create because the whosonfirst-data-admin-us repository contains a lot of data and is very large.

11:11:19.951935 [wof-sqlite-index-features] STATUS time to index geojson (1079) : 189.07048ms
11:11:19.952052 [wof-sqlite-index-features] STATUS time to index supersedes (1079) : 94.258935ms
11:11:19.952056 [wof-sqlite-index-features] STATUS time to index spr (1079) : 536.995955ms

...time passes

11:41:20.060021 [wof-sqlite-index-features] STATUS time to index geometry (58231) : 13.182348266s
11:41:20.060025 [wof-sqlite-index-features] STATUS time to index properties (58231) : 35.27526872s
11:41:20.060029 [wof-sqlite-index-features] STATUS time to index all (58231) : 31m0.139303097s
2021/04/29 11:42:09 time to index paths (2) 31m50.004958231s

The exact amount of time it takes to create the database will depend on your computer. The important point is that we’ve created a custom database of openly-licensed geographic places using two different sources filtering the data from both.

Related: Reverse-Geocoding in Time at SFO Museum and Updating (and reverse-geocoding) GPS EXIF metadata.

Once created we can spin up the server tool again, specifying the new database in the -spatial-database-uri flag:

$> bin/server \
	-map-renderer tangramjs \
	-nextzen-apikey {NEXTZEN_APIKEY} \
	-enable-oembed \
	-oembed-endpoints 'https://millsfield.sfomuseum.org/oembed/?url={url}&format=json' \
	-enable-point-in-polygon \
	-spatial-database-uri 'sqlite://?dsn=/usr/local/data/us-neighbourhoods-sfo.db' \
	-enable-placeholder \
	-placeholder-endpoint http://localhost:3000

Now when we perform our geocoding query for “Gowanus” and the map jumps to Brooklyn there are both map tiles and reverse-geoding results for that location:

I chose to use Gowanus as an example to highlight that these parts of the geotagging application are designed to be flexible and project-agnostic. SFO Museum doesn’t have anything in its collection related to the neighbourhood of Gowanus but we do have a lot of objects related to a lot of places outside of the aiport campus that need to be queried and visualized.

“Terra-Techne“ consists of six suspended “tectonic plates,” each representing a different continent. From the departures level, passengers look up to see multiple iconic California landscape suspended from the underside of each continent. From the mezzanine level, the viewer looks down on a different visual narrative; each continent is topped with a terra cotta “circuit board.” Each circuit board depicts a different design in the evolution of the silicon chip. Instagram, January 24, 2020.

As far as the “small, shareable pieces” go two of these three new features are standalone pieces of code that can be integrated in to other projects. Support for Protomaps is handled by the go-http-protomaps middleware package:

And support for reverse-geocoding is made possible using the go-whosonfirst-spatial packages:

The improved interface elements for querying the Placeholder geocoding service is still bundled with the go-www-geotag package but will be spun out in to a standalone Leaflet plugin shortly.

An Aircraft Marshaller or signal person is a person who visually communicates directions between the pilot of an aircraft and ground crew in order to direct the aircraft into a particular space or position. Twitter, August 02, 2019

Next steps for the geotagging project will be centered on an investigation of whether there are practical ways to encode or publish geotagging data that don’t require writing data to a remote services needing credentialed logins. This includes:

  • Encoding geotagging information back in to the photo being geotagged using the tools described in the Updating (and reverse-geocoding) GPS EXIF metadata post.

  • Maintaining an abstracted “database” of geotagged images inside the web browser or the server application itself that could emit multiple (geotagged) records as a CSV or GeoJSON FeatureCollection file.

  • Adding support for “dragging and dropping” photos in to the application to start the geotagging workflow. Related to this would be changes to allow any image URL to be retrieved for geotagging. Related still would be changes to allow importing images to geotag using a CSV file or some other structured data file.

If some or all of these things are feasible then it will be worth investigating whether the application, inclusive of all its data files, can be bundled and deployed as a standalone and containerized (AWS) Lambda function and made public via an (AWS) API Gateway function, similar to the approach we took with the demonstration applications described in the Reverse-Geocoding in Time at SFO Museum post.

The thinking is that small and bespoke geotagging applications, with bounded geographic requirements yielding small to manageable amounts of (Protomaps) tile and reverse-geocoding data, could be configured to consume CSV files worth of data that are easily exported from a collections management database and then used to produce new CSV files with more data easily imported in to a collections management database.

Containerized Lambda functions may contain up to 10GB worth of data which probably isn’t enough for a geotagging application that needs to support global coverage but 10GB is still a lot of room – a lot of world – to work with. These applications won’t be completely self-contained until the Placeholder server can be bundled with everything else but, today, that can still be tomorrow’s problem.

Brochure: TWA (Trans World Airlines), Paris. Paper, ink. Gift of the Paul Edward Fair Estate, SFO Museum Collection. 2006.059.378.