Geotagging at SFO Museum, part 9 – Publishing Data

This is a blog post by aaron cope that was published on May 07, 2020 . It was tagged sfo, collection, geotagging, oauth2 and golang.

menu: Air France. Paper, ink. Gift of Adan Wong, SFO Museum Collection. 2006.034.004.

This is the ninth of an 11-part blog post about geotagging photos in the SFO Museum collection. In the last post I wrote:

We’ve since built a third geotagging application that sits on top of the first two (go-www-geotag and go-www-geotag-whosonfirst) that is tailored exclusively to the needs of SFO Museum. Following the naming conventions we’ve adopted to date it is called go-www-geotag-sfomuseum and I’ll write more about it in the future.

This post is about the go-www-geotag-sfomuseum application. It adds support for authenticating users and publishing data to any service that supports the OAuth2 standard. Wikipedia describes OAuth2 as:

[A]n open standard for access delegation, commonly used as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords.

This is as good and succinct a definition, as any, of OAuth2 but it’s still a little hard to make concrete so when Wikipedia says:

…as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords.

Think of it as:

…a way for me to grant the go-www-geotag-sfomuseum access to my stuff on GitHub but without giving the go-www-geotag-sfomuseum application the password I use to log in to github.com.

GitHub though is just one of many services with support for OAuth2, so we’ve written our code to be specific to the standard and not any one implementation.

For the purposes of this post we’re using GitHub to demonstrate our work because we already publish all our open data on GitHub so it is useful for our geotagging application to be able to write directory to their API. In the future we might update our geotagging application to talk to SFO Museum’s own OAuth2 and API endpoints.

menu translator: TWA (Trans World Airlines). Paper, ink. Gift of Thomas G. Dragges, SFO Museum Collection. 2014.095.205.

Before we get too far in to the details it is important to distinguish a service’s support for OAuth2 from that same service’s application programming interface (API):

  • OAuth2 only concerns itself with the mechanics of authenticating a user, confirming that they’d like to allow another application to do stuff on their behalf and generating the necessary credentials for that application to do so.
  • An API defines the “stuff an application can do” on a user’s behalf and typically requires valid OAuth2 credentials to ensure a request (to do stuff) is legitimate.

For the purposes of the go-www-geotag-sfomuseum application we want to:

  • Register our geotagging application as a valid GitHub OAuth application. The details of this are out of scope for this post but GitHub has excellent documentation on the subject.
  • Authenticate all users using the GitHub OAuth2 endpoint.
  • Retrieve and store a valid GitHub OAuth2 “access token” that can be used to publish geotagging data using the GitHub API endpoint.

The go-www-geotag-sfomuseum application doesn’t know anything about the GitHub API or even that it’s talking to GitHub. It only knows about OAuth2 services (also called providers) and access tokens that it will pass to a writer.Writer instance, as described in part six and part seven of this series, which is expected to handle API requests.

Here is how the different pieces fit together today:

In future releases we’d like to make that relationship tree look like this:

The goal would be to allow other groups who are building their own custom geotagging applications to take advantage of the OAuth2 plumbing we’ve developed for SFO Museum’s geotagging application. It’s a little soon for that and one way to explain why is to demonstrate what’s necessary to start the SFO Museum geotagging application.

It looks something like this:

$> cd go-www-geotag-sfomuseum
$> go build -mod vendor -p bin/server cmd/server/main.go

$> bin/server \
	-nextzen-apikey {NEXTZEN_API_KEY} \
	-enable-placeholder
	-placeholder-endpoint {PLACEHOLDER_API_KEY} \
	-enable-oembed \
	-oembed-endpoints 'https://millsfield.sfomuseum.org/oembed/?url={url}' \
	-enable-writer \
	-writer-uri 'whosonfirst://?writer={whosonfirst_writer}&reader={whosonfirst_reader}&update=1&source=sfomuseum' \
	-whosonfirst-writer-uri 'githubapi://sfomuseum-data/sfomuseum-data-collection?access_token={access_token}&prefix=data/' \
	-whosonfirst-reader-uri 'githubapi://sfomuseum-data/sfomuseum-data-collection?access_token={access_token}&prefix=data/' \
	-enable-oauth2 \
	-oauth2-scopes 'user,repo' \
	-oauth2-client-id "constant://?val={OAUTH2_CLIENT_ID}&decoder=string" \
	-oauth2-client-secret "constant://?val={OAUTH2_SECRET}&decoder=string" \
	-oauth2-cookie-uri "constant://?val=debug&decoder=string" \
	-server-uri 'mkcert://localhost:8080'
	
2020/05/05 11:42:37 Checking whether mkcert is installed. If it is not you may be prompted for your password (in order to install certificate files)
2020/05/05 11:42:40 Listening on https://localhost:8080

That’s a lot to take in but the first thing to notice is that there isn’t really anything that’s inherently SFO Museum specific. The default configurations and bundled functionality (for example the WhosOnFirstGeotagWriter writer described in part seven of this series) reflect the needs of the museum but that’s about it. This is why I am hopeful about eventually decoupling most of the functionality in the go-www-geotag-sfomuseum application in to a standalone go-www-geotag-oauth2 package.

menu: KLM (Royal Dutch Airlines). Paper, ink. Gift of Thomas G. Dragges, SFO Museum Collection. 2014.095.193.

Until then, let’s take a look at the first set of new flags, highlighted in blue, from the example above:

-writer-uri 'whosonfirst://?writer={whosonfirst_writer}&reader={whosonfirst_reader}&update=1&source=sfomuseum' \
-whosonfirst-writer-uri 'githubapi://sfomuseum-data/sfomuseum-data-collection?access_token={access_token}&prefix=data/' \
-whosonfirst-reader-uri 'githubapi://sfomuseum-data/sfomuseum-data-collection?access_token={access_token}&prefix=data/'

The -writer-uri flag uses a URI-based syntax to distinguish different writers and query parameters to describe their properties. Sometimes those properties might be other URI-style descriptors. If that’s the case those URI-style properties need to be encoded so they don’t get mistaken for other properties in the root -writer-uri URI.

For example:

-writer-uri 'whosonfirst://?writer=githubapi%3A%2F%2Fsfomuseum-data%2Fsfomuseum-data-collection%3Faccess_token%3D%7Baccess_token%7D%26prefix%3Ddata&reader=githubapi%3A%2F%2Fsfomuseum-data%2Fsfomuseum-data-collection%3Faccess_token%3D%7Baccess_token%7D%26prefix%3Ddata&update=1&source=sfomuseum'

It can be hard to read or write URL-encoded strings and the optional -whosonfirst-writer-uri and -whosonfirst-reader-uri flags are included as a convenience so that you don’t have to.

If they are present and not-empty they will be automatically encoded and used to replace the {whosonfirst_writer} and {whosonfirst_reader} placeholder strings in the value of the -writer-uri flag.

I’ll discuss the githubapi:// URIs and their access_token={access_token} placeholer strings, in the example above, later on in this post.

menu: China Airlines. Paper, ink, fabric. Gift of Thomas G. Dragges, SFO Museum Collection. 2001.080.104.

The second set of flags tells the application to enable support for OAuth2 and to define the scope of that support as user and repo. These are arbitary values defined by the OAuth2 provider which, in this case, is GitHub, and we are asking that our OAuth2 application be allowed to validate the user and have access to repositories the user controls.

-enable-oauth2 \
-oauth2-scopes 'user,repo'

Speaking of GitHub, you may have noticed that there’s almost nothing in these new flags that indicates we want to use it as an OAuth2 provider. How does the geotagging application know to use GitHub as an OAuth2 provider?

It knows because go-www-geotag-sfomuseum assigns default values for the -oauth2-auth-url and -oauth2-token-url flags in its own application code. Here’s an abbreviated example from the go-www-geotag-sfomuseum/app/flags.go library:

func AssignSFOMuseumFlags(fs *flag.FlagSet) error {
	fs.Set("enable-map-layers", "true")
	fs.Set("oauth2-auth-url", "https://github.com/login/oauth/authorize")
	fs.Set("oauth2-token-url", "https://github.com/login/oauth/access_token")
}	

This is an example of what I mean when I say the go-www-geotag-application reflects the needs and preferences of the museum. We know that for our purposes we’re only ever going to publish to a single OAuth2 provider so there is no harm in assigning fixed values for these flags.

menu: United Air Lines. Paper, ink. Gift of David A. Abercrombie, in memory of Stanley A. Abercrombie , SFO Museum Collection. 2001.039.151.

The third set of flags are where the OAuth2 application credentials are assigned as well something called -oauth2-cookie-uri which I’ll cover in a moment.

-oauth2-client-id "constant://?val={OAUTH2_CLIENT_ID}&decoder=string" \
-oauth2-client-secret "constant://?val={OAUTH2_SECRET}&decoder=string" \
-oauth2-cookie-uri "constant://?val=debug&decoder=string"

The value for each of these flags is a URI string whose syntax is defined by the Go Cloud Development Kit (CDK) runtimevar package which describes itself as:

…an easy and portable way to watch runtime configuration variables. Subpackages contain driver implementations of runtimevar for various services, including Cloud and on-prem solutions. You can develop your application locally using filevar or constantvar, then deploy it to multiple Cloud providers with minimal initialization reconfiguration.

Why use runtimevar style URI strings for these flags and not others? First and foremost to distinguish the sensitive nature of these flags from the rest. In the example above the value for each flag is a plain-text string which might be okay for local development. In a production environment we might want to store and retrieve these values using Amazon’s AWS Parameter Store service which gates access to a limited set of roles or applications.

Should the go-www-geotag applications standardize on the use of runtimevar style URI strings internally for all command line flags? Probably and that probably means automatically converting values that don’t have a known URI scheme (or prefix) to use the constant://?val= syntax.

The details of this effort is a little more complicated than they might seem on the surface and out of scope for this post. The Go CDK also defines a separate abstraction layer around secrets and related services, distinct from “runtime variables”, that bears investigation. For these reasons the use of runtimevar URIs is limited to the flags described above.

menu: Air New Zealand, Business Class. Paper, ink, metal. Gift of Adan Wong, SFO Museum Collection. 2006.034.010.

The fourth and final flag is the -server-uri flag and looks like this:

-server-uri 'mkcert://localhost:8080'

The go-www-geotag applications use the go-http-server package for serving web applications because it offers a simple interface for running applications in different environments. The mkcert:// URI scheme is described as:

A thin wrapper to invoke the mkcert tool to generate locally signed TLS certificate and key files. Once created this implementation will invoke the tls:// scheme with the files create by mkcert.

The documentation for the mkcert tool says that:

Using certificates from real certificate authorities (CAs) for development can be dangerous or impossible (for hosts like example.test, localhost or 127.0.0.1), but self-signed certificates cause trust errors. Managing your own CA is the best solution, but usually involves arcane commands, specialized knowledge and manual steps. mkcert automatically creates and installs a local CA in the system root store, and generates locally-trusted certificates.

If all of this talk about certificates and authorities and localhost is brand new to you the Let’s Encrypt project’s “Certificates for localhost” document is a good place to start.

Using the mkcert:// prefix in the -server-uri flag will automate the work necessary for our geotagging application to run, locally, using an encrypted connection:

2020/05/05 11:42:40 Listening on https://localhost:8080

menu: Finnair, Business Class. Paper, ink, metal. Gift of Ian H. Dally of Auckland, New Zealand, SFO Museum Collection. 2007.086.059.

To understand why we go to all this trouble I need to explain how the OAuth2 integration works in the go-www-geotag-sfomuseum application. OAuth2 support is provided using the go-http-oauth2 package. In broad strokes it exports three web application “handlers”:

  • The first sends a user to the OAuth2 provider to authenticate and authorize their account with the go-www-geotag-sfomuseum application.
  • The second handles authorization responses from the OAuth2 provider and stores that authorization (in the form of an “access token”) in an encrypted cookie on the user’s browser.
  • The third is a “middleware” handler, like those discussed in parts two, three and eight of this series, that ensures the presence and validity of the encrypted access token cookie. If the cookie is missing the user is redirected to the first handler causing them to be sent to the OAuth2 provider to reauthorize access to their account.

Here’s an abbreviated version of what that looks like in code:

import (
	"context"
	"flag"
	geotag_app "github.com/sfomuseum/go-www-geotag/app"
	sfom_app "github.com/sfomuseum/go-www-geotag-sfomuseum/app"
	oauth2_flags "github.com/sfomuseum/go-http-oauth2/flags"
	oauth2_www "github.com/sfomuseum/go-http-oauth2/www"
)

func main() {

	fs, _ := geotag_app.CommonFlags()
	sfom_app.AppendSFOMuseumFlags(fs)

	flags.Parse(fs)
	sfom_app.AssignSFOMuseumFlags(fs)

	ctx := context.Background()
	oauth2_opts, _ := oauth2_flags.OAuth2OptionsWithFlagSet(ctx, fs)

	editor_handler, _ := geotag_app.NewEditorHandler(ctx, fs)
	editor_handler = oauth2_www.EnsureOAuth2TokenCookieHandler(oauth2_opts, editor_handler)

	writer_handler, _ := sfom_api.WriterHandler(...)
	writer_handler, _ = geotag_app.AppendCrumbHandler(ctx, fs, writer_handler)
	writer_handler = oauth2_www.EnsureOAuth2TokenCookieHandler(oauth2_opts, writer_handler)
	
	auth_handler := oauth2_www.OAuth2TokenCookieAuthorizeHandler(oauth2_opts)
	token_handler := oauth2_www.OAuth2AccessTokenCookieHandler(oauth2_opts)

	mux := http.ServeMux()     
	mux.Handle("/", editor_handler)
	mux.Handle("/writer", writer_handler)	
	mux.Handle("/signin/", auth_handler)
	mux.Handle("/auth/", token_handler)	
}

Encrypted cookies are handled using the go-http-cookie package, which uses the secretbox and memguard packages for locking and unlocking the OAuth2 access token itself. The -oauth2-cookie-uri flag mentioned above is how you configure the encrypted cookie.

Encrypted cookies are defined using their own URI-style syntax in the form of:

encrypted://?name={NAME}&secret={SECRET}&salt={SALT}

For example:

encrypted://?name=c&secret=MqDptuvnXwyODSaw7NTtiDYsp9KlLKAR&salt=P2eJxT9L5sJAeMxLKaOgIfF89NkDdvoJ

And then finally, because we are using runtimevar URI strings for OAuth2 cookie URIs:

-oauth2-cookie-uri "constant://?val=encrypted%3A%2F%2F%3Fname%3Dc%26secret%3DMqDptuvnXwyODSaw7NTtiDYsp9KlLKAR%26salt%3DP2eJxT9L5sJAeMxLKaOgIfF89NkDdvoJ&decoder=string"

Unlike the -writer-uri flag described above there is still no convenience flag for defining the ?val= parameter for OAuth2 cookie URIs as a separate non-URL-encoded string.

If the value of the OAuth2 cookie URI is the string “debug” then the application will automatically configure an encrypted cookie URI with random values. This can be useful for testing things but remember that you will need to clear your cookies (for localhost:8080) everytime you restart the geotagging application because otherwise you won’t be able to decrypt the last cookie that the application set.

menu: United Air Lines. Paper, ink. Gift of David A. Abercrombie, in memory of Stanley A. Abercrombie , SFO Museum Collection. 2001.039.148.

Should we be storing a user’s OAuth2 access token in a cookie, encrypted or not? Absolutely not if the cookie is unencrypted, definitely not in a production environment and probably not for local development if the local environment is not using an encrypted connection.

In a production environment there are better, more secure, places to store an access token. It is still necessary to store some kind of cookie or session variable to distinguish one user from another but that can be the key which looks up an access token in a database with restricted access. If and when we migrate our geotagging application to a production environment we’ll need to create three new go-http-oauth handlers that interact with a more secure storage layer for OAuth2 access tokens.

In the meantime we’ve been able to use the cookie-based storage to demonstrate how the go-http-oauth package should work with a web application without knowing the details of the underlying web application. We’ve also been able to develop a geotagging application that publishes data to an OAuth2 endpoint without having to set up, configure and maintain a separate database for OAuth2 access tokens.

Whether or not it’s overkill to require an encrypted connection for locally run web applications is an open question. Given that this is an application which should be deployed using encrypted connections in a production environment it’s useful to be able to mirror production conditions as much as possible locally. There is also the cautionary tale of the Firesheep plugin, first released in 2010, before encrypted connections for websites were the norm:

Firesheep was an extension for the Firefox web browser that used a packet sniffer to intercept unencrypted session cookies from websites such as Facebook and Twitter. The plugin eavesdropped on Wi-Fi communications, listening for session cookies. When it detected a session cookie, the tool used this cookie to obtain the identity belonging to that session. The collected identities (victims) are displayed in a side bar in Firefox. By clicking on a victim’s name, the victim’s session is taken over by the attacker.

In 2020 do you really need to worry about other applications eavesdropping on HTTP traffic for locally run services? Probably and probably not. Probably not because if it’s happening on your computer you might have bigger problems to attend to first. Probably because, from a technical perspective, it’s entirely possible and not very hard to do.

Locally created and signed certificates, like the ones produced by the mkcert tool above, will only go so far in protecting you from these sorts of eavesdropping attacks. It’s why the OAuth2 access token cookie is also encrypted separately and why the geotagging application also employs time-sensitive “crumbs” to mitigate cross-site request forgery attacks.

menu: SAS (Scandinavian Airlines System). Paper, ink. Gift of John J. Larish, SFO Museum Collection. 2007.002.024.

In part seven of this series I wrote:

The reader={READER} and writer={WRITER} parameters in the -writer-uri flag are themselves URIs for creating abstract reader and writer implementations specific to the Who’s On First project, the details of which are out of scope for this post. In this example we’re saying “read WOF data from a local directory” and “write WOF data to the console”.

In this post we want, instead, to say “read and write WOF data from the sfomuseum-data-collection GitHub repository using the go-reader-github and go-writer-github packages respectively”.

That’s what the -whosonfirst-writer-uri and -whosonfirst-reader-uri flags, in the earlier examples, signal. Instances of go-reader-github and go-writer-github both require a valid OAuth2 access token when they are created. Since we don’t know the value of the access token when our geotagging application starts we instead represent the access token using a placeholder string that gets swapped out on demand.

For example:

githubapi://sfomuseum-data/sfomuseum-data-collection?access_token={access_token}&prefix=data/

Normally the geotagging application creates a writer.Writer instance, as discussed in parts six and seven of this series, when the application is started. In the go-www-geotag-sfomuseum application we wait until there is a request to publish data before creating a new writer.Writer instance swapping out the placeholder text with value of the current OAuth2 access token.

Here’s an abbreviated example in code:

func WriterHandler(wr_uri string) (http.Handler, error) {

	fn := func(rsp http.ResponseWriter, req *http.Request) {

		defer req.Body.Close()

		ctx := req.Context()
		token, _ := oauth2_www.GetOAuth2TokenContext(req)

		wr_u, _ := url.Parse(wr_uri)

		if wr_u.Scheme == "whosonfirst" {

			wr_q := wr_u.Query()

			wr_reader := wr_q.Get("reader")
			wr_writer := wr_q.Get("writer")

			wr_reader, _ = url.QueryUnescape(wr_reader)
			wr_writer, _ = url.QueryUnescape(wr_writer)

			wr_reader = strings.Replace(wr_reader, "{access_token}", token.AccessToken, -1)
			wr_writer = strings.Replace(wr_writer, "{access_token}", token.AccessToken, -1)

			wr_reader = url.QueryEscape(wr_reader)
			wr_writer = url.QueryEscape(wr_writer)

			wr_q.Set("reader", wr_reader)
			wr_q.Set("writer", wr_writer)

			wr_uri = fmt.Sprintf("whosonfirst://?%s", wr_q.Encode())
		}

		wr, _ := writer.NewWriter(ctx, wr_uri)
		
		uid, _ := sanitize.GetString(req, "id")
		geotag_f, _ := geotag.NewGeotagFeatureWithReader(req.Body)
		wr.WriteFeature(ctx, uid, geotag_f)

		return
	}

	h := http.HandlerFunc(fn)
	return h, nil
}

The access token itself is stored in the HTTP request’s context instance, having been assigned by the EnsureOAuth2TokenCookieHandler middleware handler (in an earlier example above). This allows our geotagging application to remain unconcerned with the details of how or where access tokens are stored. This enables us to change the implementation details around those access tokens without changing any code in our geotagging application.

To wrap things up here are a series of screenshots demonstrating this work, starting with the network traffic for the authentication flow:

Upon loading the application, no encrypted access token cookie is found, so I am redirected first to the /signin page and then on to GitHub. Since I am already logged in to GitHub and have authorized the geotagging application I am sent directly back to the /auth page which finishes up the authorization process, stores an encrypted access token cookie and then redirects me back to the / page where I began.

Here’s a screenshot of me geotagging a photo taken from the roof of the Terminal Building while it was under construction. It looks exactly like the application described in part five of this series:

Here’s a screenshot I took while debugging things to remind myself of the outstanding work we need to do to better trap and report error conditions. For example, a 401 Bad credentials response from the GitHub API should not be interpreted as WRITE OKAY by the geotagging application:

And here’s a screenshot of that geotagging information written out as a Who’s On First document, as described in part seven of this series, and successfully published to GitHub:

And finally, here’s the result of all that work on the Mills Field website with a map of the same vantage point sixty-six years later:

In the next post we’ll discuss some of our efforts to try and shield museum staff from most, if not all, of the complexity that publishing data has introduced.