SFO Museum | Mills Field

Asserting claims: Cryptographically signing the digital artifacts we produce

Why is SFO Museum signing the vector embeddings that we’re producing? To provide an additional guarantee that the data we are publishing is, in fact, the data we published. Vector embeddings are, by their nature, basically impossible to “spot check”. Their size, shape and volume lend themselves to subtle, often imperceptible, corruption whether those changes are introduced by malice or negligence. By providing digital signatures for these embeddings we can provide the means to ensure that the data have not been tampered with after they’ve been downloaded from our website.

This is a blog post by aaron cope. It was published on June 15, 2026 and tagged golang, embeddings, pgp, c2pa and rustlang.

Shared vector embeddings updates

There is a lot of ground to cover in this blog post: The Met publishing their own vector embeddings, SFO Museum publishing 1152-dimension vector embeddings for its images, SFO Museum producing 1152-dimension vector embeddings for NGA and MoMA, and a whole bunch of updates to the tooling used to generate and query vector embeddings targeting local-first and consumer-grade hardware.

This is a blog post by aaron cope. It was published on May 27, 2026 and tagged roboteyes, machine-learning, collection, duckdb, golang, parquet, bleve, embeddings, aws, s3 and s3vectors.

OEmbeddings - What is the least amount of metadata necessary for shared vector embeddings?

This is a blog post describing a proposal for a set of common attributes to include with shared vector embeddings. These common attributes are meant to be the least amount of metadata necessary to provide a simple preview and suitable attribution for the item (an image or text) for which vector embeddings have been produced.

This is a blog post by aaron cope. It was published on April 15, 2026 and tagged roboteyes, machine-learning, collection, oembeddings, embeddings, golang and parquet.

Shared cross-institutional vector embeddings – how we might get there

We are proposing a simple‑is‑best approach to sharing vector embeddings of our collections, a step that moves us closer to realizing the long‑standing ‘holy grail’ of cross‑institutional collections search through vector‑based image similarity.”

This is a blog post by aaron cope. It was published on April 06, 2026 and tagged roboteyes, machine-learning, collection, duckdb, golang, parquet and embeddings.

Updates (and additions) to machine-learning tools running on consumer hardware

These are not “silver bullet” tools. Rather, they endeavour to be part of a set of building blocks for creating an infrastructure that preserves and guarantees the cultural heritage sector some agency in our work.

This is a blog post by aaron cope. It was published on February 10, 2026 and tagged swift, roboteyes, machine-learning, collection, duckdb, golang and embeddings.

Similar object images derived using the MobileCLIP computer-vision models

Like a lot of things involving machine-learning, the image similarity results while not always right aren’t necessarily wrong either. In the same vein as searching collections by color this “fuzzy” and imprecise space presents a whole new avenue for browsing collections and making visible objects that would otherwise get lost in the crowd.

This is a blog post by aaron cope. It was published on January 09, 2026 and tagged swift, roboteyes, machine-learning, collection, mobileclip and embeddings.