go-iiif version 2.1

This is a blog post by aaron cope that was published on December 05, 2019 . It was tagged aws, golang and iiif.

negative: San Francisco Airport, meteorological equipment, c. 1931, Collection of SFO Museum, 2011.032.0124

Today, I am happy to announce that go-iiif version 2.1.0 has been released (although version 2.1.0 has already been superseded by version 2.1.1 to address a bug). A quick refresher:

IIIF is an acronym for the International Image Interoperability Framework, a project driven by public institutions and private companies in the cultural heritage sector to produce common standards and interfaces (APIs) for accessing and working with collections material.

go-iiif is software written in the Go programming language that implements the IIIF Image API and SFO Museum has been using go-iiif to process the images in its collection. We’ve written about IIIF before in the following blog posts:

Version 2.1 of go-iiif introduces two relatively minor but important features: Custom URI functions and support for fsnotify filesystem notifications.


Custom URIs

negative: San Francisco International Airport (SFO), aerial, 1961, Collection of SFO Museum, 2011.032.0688

go-iiif version 2.1 introduces the notion of a URI “function” which is an optional piece of user-defined code that converts a string in to a go-iiif-uri URI object. The default URI function simply parses a raw string in to a URI based on its scheme and the available drivers that are loaded. Custom URI functions allow you to perform additional steps between a raw input string and a final URI.

The use of URI strings to encode image paths and specific processing instructions is incredibly powerful. On the other they hand can be problematic as filenames: Some operating systems disallow the use of (URI) schemes as prefixes; others aren’t able to determine a file’s content type if its name contains query parameters; no matter what they are long and a bit ugly.

SFO Museum processes its images in what is essentially a bucket brigade of specialized tools that are chained together using signals and triggers. While there is nothing AWS specific about this workflow we use the S3 service to store images and S3 triggers to invoke Lambda functions for processing those images. It looks something like this:

We need to store files with names like 1511248153_10795.tif that will be interpreted as file:///1511248153_10795.tif but ultimately need to be converted in to URIs like idsecret:///1511248153_10795.tif?id=1511248153 before they are handed off to the code that finally manipulates each image.

Here’s an annotated example of our sfom-iiif-process tool which defines a custom URI function for doing just that. Error handling has been removed for the sake of brevity.

We begin with import statements for loading the Go packages we want to use and a main() function that is the entrypoint for our tool:

package main

import (
	"context"
	"errors"
	"fmt"
	iiifuri "github.com/go-iiif/go-iiif-uri"
	_ "github.com/go-iiif/go-iiif/native"
	iiiftools "github.com/go-iiif/go-iiif/tools"
	"github.com/whosonfirst/go-whosonfirst-mimetypes"
	"path/filepath"
	"strconv"
	"strings"
)

func main() {

Next, we define a function for parsing a raw string in to a new go-iiif-uri.URI instance. Inside this function the first thing we do is try to parse the raw string in to any kind of URI instance before we start working with it.

Assuming it’s something we recognize we then use the value of its Origin() method as the starting point for creating a new URI instance:

	uri_func := func(raw_uri string) (iiifuri.URI, error) {

		u, _ := iiifuri.NewURI(raw_uri)
		origin := u.Origin()

We perform a little bit of extra sanity-checking to ensure this file “smells or at least looks” like an image:

		ext := filepath.Ext(origin)
		ext_ok := false

		for _, t := range mimetypes.TypesByExtension(ext) {

			if strings.HasPrefix(t, "image/") {
				ext_ok = true
				break
			}
		}

		if !ext_ok {
			return nil, errors.New("Not an image")
		}

Then we parse the filename looking for a specific pattern, one that allows us convert the first part of that pattern in to a 64-bit integer ID:

		parts := strings.Split(origin, "_")

		if len(parts) != 2 {
			return nil, errors.New("Invalid URI string")
		}

		str_id := parts[0]
		id, _ := strconv.ParseInt(str_id, 10, 64)

Remember, this is what works for us and doesn’t need to be how you do things. The point of the URI function is to allow for bespoke customization as the circumstances require.

Finally we use the (64-bit integer) ID and the origin filename to create a new IdSecretURI and use it as the response value to our custom URI function:

		str_uri := fmt.Sprintf("%s?id=%d", origin, id)
		str_uri = iiifuri.NewIdSecretURIString(str_uri)

		return iiifuri.NewIdSecretURI(str_uri)
	}

Inside the main loop we pass our custom function to the new NewProcessToolWithURIFunc function which will return a (processing) tool instance that we run as usual:

	tool, _ := iiiftools.NewProcessToolWithURIFunc(uri_func)
	tool.Run(context.Background())
}


fsnotify, Lambda and “sustainable” triggers

flight information packet: Continental Airlines, c. 1985, Collection of SFO Museum, 2011.126.462

The other addition in go-iiif version 2.1, is the ability to run both the process and tileseed tools in “fsnotify” mode.

fsnotify is the name of a Go package that provides “cross-platform file system notifications” for applications. These notifications, in a go-iiif context act as triggers allowing us to automatically invoke image processing when a file is added to a folder.

Under the hood when the code receives a fsnotify.Create event for a path it passes that path to the URI function defined for that tool. If the function returns a valid go-iiif-uri URI then it is processed. Here’s an abbreviated version of what’s happening:

if event.Op == fsnotify.Create {

	abs_path, _ := filepath.Abs(event.Name)

	rel_path := strings.Replace(abs_path, root, "", 1)
	rel_path = strings.TrimLeft(rel_path, "/")

	// 't' here is the go-iiif Tool instance returned by
	// iiiftools.NewProcessToolWithURIFunc(uri_func) or
	// iiiftools.NewProcessTool() <-- which uses iiiftools.DefaultURIFunc()

	u, err := t.URIFunc(rel_path)

	if err != nil {
		log.Printf("Failed to parse path '%s' (%s)', %s\n", rel_path, abs_path, err)
		continue
	}

	err = ProcessMany(ctx, process_opts, u)

	if err != nil {
		log.Printf("Failed to process '%s' ('%s'), %s", rel_path, u, err)
		continue
	}
}

Using our sfom-iiif-process tool as an example all we need to do to enable filesystem notifications is pass in the -mode fsnotify flag at startup. For example:

$> go run -mod vendor cmd/sfom-iiif-process-image/main.go \
	-config-source 'file:///usr/local/sfomuseum/go-sfomuseum-iiif/docs/config' \
	-instructions-source 'file:///usr/local/sfomuseum/go-sfomuseum-iiif/docs/config' \
	-report -report-source 'file:///usr/local/data/media/pending' \
	-mode fsnotify	
	
2019/11/26 13:27:19 Watching '/usr/local/data/media/pending'

Now imagine that we use the Go Cloud blob package to write an image named 1511248157_10801.tif in to the /usr/local/data/media/pending folder.

The first thing we see is a temporary file (created by the Go Cloud package), so we can skip it.

2019/11/26 13:27:47 fileblob353713886 file:///fileblob353713886
2019/11/26 13:27:47 Failed to parse path 'fileblob353713886' (/usr/local/data/media/pending/fileblob353713886)', Not an image

Next is a file called 1511248157_10801.tif.attr. These .attr files are created by the Go Cloud package and are not images so we can safely skip them too:

2019/11/26 13:27:48 1511248157_10801.tif.attrs file:///1511248157_10801.tif.attrs
2019/11/26 13:27:48 Failed to parse path '1511248157_10801.tif.attrs' (/usr/local/data/media/pending/1511248157_10801.tif.attrs)', Not an image

Hey look an image! This is the file that will be used to produce the derivative images defined in our (processing) “instructions” file:

2019/11/26 13:27:48 1511248157_10801.tif file:///1511248157_10801.tif

Followed by a whole lot of not-images:

2019/11/26 13:27:50 1511248157.geojson file:///1511248157.geojson
2019/11/26 13:27:50 Failed to parse path '1511248157.geojson' (/usr/local/data/media/pending/1511248157.geojson)', Not an image
2019/11/26 13:27:50 1511248157.geojson.attrs file:///1511248157.geojson.attrs
2019/11/26 13:27:50 Failed to parse path '1511248157.geojson.attrs' (/usr/local/data/media/pending/1511248157.geojson.attrs)', Not an image
2019/11/26 13:27:50 process.json file:///process.json
2019/11/26 13:27:50 Failed to parse path 'process.json' (/usr/local/data/media/pending/process.json)', Not an image
2019/11/26 13:27:50 process.json.attrs file:///process.json.attrs
2019/11/26 13:27:50 Failed to parse path 'process.json.attrs' (/usr/local/data/media/pending/process.json.attrs)', Not an image

The changes to support fsnotify events have also led to the addition of a new -report-source flag.

This flag tells the processing code where to store a machine-readable report detailing which files were created. Those files used to be stored alongside the derivative images but sometimes those derivative images are created in a nested directory tree and the underlying fsnotify package does not currently support watching directories recursively.

You may have noticed, in the diagram above, that files written to the box labeled “reports” trigger a separate process called the sfom-iiif-process-report tool. Nothing about the fsnotify package is go-iiif specific so it can be used with any files. In our use case we use it to watch for newly created processing reports and then use those files to update the metadata associated with our image records.

The processing and tile-seeding tools have long supported a -mode lambda flag which allows them to be invoked as (AWS) Lambda functions. This allows us to process lots of images using Amazon’s servers, using triggers when files are copied to S3, for a fraction of the cost that it would take to do the same work on dedicated servers.

The -fsnotify flag makes it possible to develop and test simple and complex S3-trigger-and-Lambda based workflows locally before deploying them to AWS. This alone makes it a useful feature in the short-term.

Longer-term, and importantly, it also means the workflows we develop aren’t inextricably bound to Amazon services. Knowing that we don’t have to use AWS and knowing that there is an alternative avenue for accomplishing the same work in the future, should we ever need it, goes a long way towards making it easier for us to want to use AWS in the present.

flight information packet: Continental Airlines, c. 1985, Collection of SFO Museum, 2011.126.462