Sweet spots between the extremes

This is a blog post by aaron.cope that was published on November 07, 2018 .

time converter: Philippine Air Lines, Collection of SFO Museum, 2002.011.043

This is a technical blog post about map tiles, caching, third-party services, so-called “serverless” computing and sustainability. It’s also about improvements to open-source software for managing all of that stuff.

If you’ve been following this weblog recently you’ve seen that as we add more and more elements related to the collection we’ve also been adding more and more places at the same time. Places out there in the real world where people live and go about their lives. And since every place demands a map that means we’re adding even more and more.. and still more map tiles to the Mills Field website.

In the Maps (and map tiles) at SFO Museum blog post we discussed how we’re using a combination of the openly-licensed Nextzen map tiles, custom rendering software and Amazon Web Services’ (AWS) infrastructure to produce and host the maps you see on this site.

We make a point of caching the custom tiles we derive from the source Nextzen data as they are requested. Fresh tiles, though, can sometimes take a while to render particularly in dense areas like big cities with a lot of map data to process. A lot of airports are near big cities and since we care a lot about airports this is not an abstract problem for us.

airmail flight cover: Pan American World Airways, Malaysia, Collection of SFO Museum, 2012.149.2336 a b

To address this issue we’ve added a new command-line tool to the go-rasterzen code that allows us to pre-seed all of the tiles, at all of the different zoom levels, needed to show a place on the map.

We’ve named this tool rasterzen-seed and it can be configured to process tiles locally or to invoke an instance of the rasterd tool configured as an AWS Lambda function allowing all the heavy lifting to happen on a computer in Amazon’s data centers.

Here is how you would run the tool to pre-render all the tiles between zoom levels 2 and 16, locally on your own computer, for the bounding box containing the larger SFO campus storing the output in a folder named cache:

$> ./bin/rasterzen-seed -mode extent -extent '-122.405228 37.604481 -122.355044 37.645194' \
	-min-zoom 2 -max-zoom 16 -nextzen-apikey {NEXTZEN_APIKEY} \
	-fs-cache -fs-root cache \
	-seed-worker local -seed-png \
	-timings
	
15:50:58.758870 [rasterzen-seed] STATUS Time to seed tile (16/10489/25361) 948.46835ms
15:50:58.802463 [rasterzen-seed] STATUS Time to seed tile (16/10488/25364) 1.600181332s
... and so on

And here’s how you would do the same thing but off-loading all of the work to an AWS Lambda function:

$> ./bin/rasterzen-seed -mode extent -extent '-122.405228 37.604481 -122.355044 37.645194' \
	-min-zoom 2 -max-zoom 16 -nextzen-apikey {NEXTZEN_APIKEY} \
	-seed-worker lambda -lambda-function {LAMBDA_FUNCTION} \
	-lambda-dsn 'credentials={AWS_CREDENTIALS} region={AWS_REGION}' \
	-fs-cache -fs-root cache \	
	-seed-png \
	-timings
	
15:50:58.987440 [rasterzen-seed] STATUS Time to seed tile (16/10484/25364) 4.284314644s
15:50:59.010505 [rasterzen-seed] STATUS Time to seed tile (15/5246/12684) 4.306041647s
15:50:59.010518 [rasterzen-seed] STATUS Time to seed tile (15/5242/12685) 1.222798064s
... and so on

Going forward there are plans to integrate Amazon’s queuing service in to the workflow so that even more of the process can be handled independently of local computing resources.

The rasterzen-seed tool is configured to store tiles in variety in cache sources but a cache source is not required. If you don’t specify any caching sources as we did in the example above then tiles will be rendered and cached (or not) depending on how your Lambda function is configured, typically to an (AWS) S3 bucket.

negative: San Francisco International Airport (SFO), noise abatement equipment, Collection of SFO Museum, 2011.032.2445

You might do this if you want to serve PNG tiles but don’t want to suffer the sometimes very long wait times generating raster tiles in production or you don’t want to deal with the hoop-jumping around AWS Lambda / API Gateway integrations, particularly when it involves images. Instead you could use rasterzen-seed to pre-render all your tiles ahead of time and serve them as static files from an S3 bucket configured as a website.

Maps are about places and as we’ve discussed in earlier blog posts SFO Museum uses and extends the Who’s On First gazetteer (WOF) to describe the places it cares about. By design, the go-rasterzen code doesn’t know anything about the Who’s On First gazetteer. All it knows about are individual map tiles and geographic extents.

We’ve built a second tool on top of the rasterzen-seed tool for pre-rendering tiles for an index of Who’s Of First documents. It’s called… wof-rasterzen-seed.

The wof-rasterzen-seed can consume any index supported by the go-whosonfirst-index package but for our purposes the most common index is a Git repository containing Who’s On First records. Here’s how we would seed all the tiles between zoom levels 5 and 8 for every place in the sfomuseum-data-architecture repository:

./bin/wof-rasterzen-seed -nextzen-apikey {NEXTZEN_APIKEY} \
	-seed-worker lambda -lambda-function {LAMBDA_FUNCTION} \
	-lambda-dsn 'credentials={AWS_CREDENTIALS} region={AWS_REGION}' \
	-min-zoom 5 -max-zoom 8 -timings -seed-all \
	-mode repo /usr/local/data/sfomuseum-data-architecture

The only difference from the rasterzen-seed tool is the change in the semantics of the -mode command line flag.

Under the hood the wof-rasterzen-seed handles all the work of reading Who’s On First documents and calculating the bounding boxes that are used to determine which map tiles to fetch. For example wof-rasterzen-seed will generate separate extents for the continental US and Hawaii and Puerto Rico rather than a single bounding box that combines all three places since that would contain… a lot of water.

architectural drawing: San Francisco Airport and Treasure Island Airport, vicinity map, Collection of SFO Museum, 1999.046.012

We hope you find rasterzen-seed (and wof-rasterzen-seed) as useful as we do. They are tools that allow us to take advantage of third-party services without going “all in” and running the risk of getting trapped.

The map tiles produced by the Nextzen project are born out of the hard experiences of similar past services shutting down or being commercialized so there is little immediate chance of them going away. Likewise it’s almost impossible, in 2018, to imagine Amazon disappearing and the promises and benefits of so-called “cloud” and “serverless” computing are both real and tangible.

Tomorrow always remains a mystery, though. Whether it’s a lapsed domain name renewal taking a service offline or a change in pricing or terms of use making a service unaffordable or a service simply being sunsetted building on top of “platforms” is always risky. Risky and often prohibitively expensive depending on how hard it is, or how long it takes, to mitigate a service you rely on no longer being so reliable.

master plan and report: San Francisco International Airport (SFO), Dreyfuss & Blackford, Quinton Engineers, Ltd., Collection of SFO Museum, 2002.040.002

Pre-caching tiles doesn’t necessarily mean we’ll have a copy of the entire planet if Nextzen closes its doors but it does mean that we should have copies of the tiles that are immediately relevant to our work. Likewise processing and rendering those tiles locally might take a lot longer than it does doing the same work in Amazon’s “cloud” but that is very different than not being able to do that work at all because the logic for doing so can’t be untangled from a third party’s infrastructure.

There are sweet spots between all of these extremes and one of the goals with tools like go-rasterzen is to help find them. These are small steps, for sure, but it is forward momentum all the same.