Color Vector embeddings vs Video Intelligence API labelling

In Making a film color DNA I described how I was using both color strips and labelled content to find videos that were duplicates or similar. I’m already using ElasticSearch to disambiguate films using labels and object tracking artefacts identified by the Google Video Intelligence API. The next step is to enhance that similarity matching using the color strips of a film as finger prints.

Page Content hide

1 Color strips

1.1 Some example color strips

1.2 How to make color strips

2 What are vector embeddings and vector databases

3.4 Normalizing color values.

3.5 Vector values

3.5.1 Example vector values

4 Elastic search mappings

4.1 The search

5 The result

5.1 Finding duplicates

5.2 Comparing to AI labelling comparison

5.3 Using alternative vector searching

5.4 SImilar settings

5.5 Similar looks

5.6 FIlms from the same family

6 Summary

7 Links

8 Share with your network

Color strips

Here’s an example of a few films with their color strip underneath them. The length of the strip is normalized. We always end up with the same number of pixel columns, regardless of the length of the film. The sampling rate is a function of the film duration and the target number of samples in a a strip. A strip is the average of the colors of the frames that it represents.

Some example color strips

How to make color strips

If the frame rate is 25fps and the film is 1 minute long, there will be 1500 frames in the film. I’ve set a standard target of 256 samples. Each column in the strip will therefore represent about 6 frames. Each column is 3 pixels wide and 75pixels high. The final strip size will come out at 768 x 75 (or thereabouts depending on the precise frame rate vs length). This color strip then gives a fairly unique fingerprint for each film.

If you want to make your own, you can find a script for generating them at in Making a film color DNA

What are vector embeddings and vector databases

In tandem with the explosion in ML, you’ll also have seen a profileration of dedicated vector databases and regular databases being hastily equipped with vector capabilities.

A vector embedding is simply an array of numerical representations of an item in a multi-dimensional space. If you had a 2 dimensional model, for example, you could use say income and age as values in these 2 dimensions and then use a similarity function to find people that were most like each other by plotting those values on a graph and picking the ones that were closest together.

Of course, in a multi dimensional space it becomes harder to envision how that would look, and this is where the indexing and searching capabilities of vector databases come in.

Elastic Search

I’m already using Elastic Search (ES) in my application. ES already has mature and highly capable vector indexing and searching built in so it was a perfect platform to use for my color strip adventures. ES calls this k-nearest neighbor (kNN) search – which finds the k nearest vectors to a query vector, as measured by a similarity metric.

Generating vectors

The task now is to find a way of searching for duplicate or similar color strips.

Creating vectors

Initially I planned to use the Hugging face SentenceTransformer models but I couldn’t get any useful results from it. It was super slow for the many thousands of images I needed to process. My use case is really quite simple – create searchable vectors from colors – no object recognition or NLP required.

I experimented a bit (after learning Python which I wasn’t at all familiar with), and found I could create my own vectors, load them to Elastic and search on them. In the end I settled on 3 measures – 2 color spaces and a color difference triangulation from 3 reference colors.

rgb values
oklab values
the distance from 3 reference colors

I had read various articles about using rgb alone, but found this worked okay for exact duplicates, but was poor at detecting similarity. However all 3 of these measures, used seperately or together, gave some good searchable vectors for a variety of use cases.

Bin sizes

I wanted to match film strips that were from different cuts in of the same film. Some may have had slight chunks edited out or had slightly different colors due to different compression, frame rates and/or encoding schemes.

To balance speed and accuracy I went for 64 bins per color strip (so 3 strip columns per bin) – which ends up at ( 3 measures x 3 values * 64 bins) = 576 vectors.

Color extraction

Having done the initial experiments using Python, I switched over to Node. All the the other applications in my project are in Node. I used chroma for color manipulation and get-pixels to extract the color profiles of my color strips. If you are familiar with this site you may already know that I’ve done many articles on chroma so it was the natural choice.

I’m not going to put all the code in this article, but here’s the main app for creating the vectors and loading them to Elastic so you can get a flavor.

Behind the scenes my API returns every film in my database which has a color strip attached (the actual strip image is on Google Cloud storage). It then creates vectors of each of the 9 fields and bulk loads them into Elastic.


import { readImage } from "./src/nd.mjs"
import { colorVectors } from "./src/colors.mjs"
import { stripInit, stripBulk } from "../../shared/elastic/esstrip.mjs"
import { stripCount } from "../../shared/elastic/esstripsearch.mjs"
import { getArgs } from "./src/args.mjs"
import { svcInit } from "./src/svc.mjs"
import { gqInit, gqlQueries, gqlGenerator } from "./src/gq.mjs"
import { esInit } from "../../shared/elastic/es.mjs"

const main = async () => {

  // initialize service connections
  const connections = svcInit()
  // we'll need gql and elastic connections
  gqInit(connections)
  esInit(connections)

  // use the default proxy
  const proxy = null

  // get the cli args
  const args = getArgs()

  // attach to elastic index
  const index = await stripInit(args)
  await stripCount().then(r => console.log(`${r.count} items currently in index ${index.index}`))

  // generate an iterator
  // this will get a chunk at a time as defined by cli args
  const loadit = gqlGenerator({
    limit: args.chunkSize,
    startAt: args.imageStart,
    maxItems: args.imageMax,
    queryOptions: {},
    proxy,
    query: gqlQueries['MasterStrips']
  })

  // now get one result at a time
  let chunkIndex = 0
  let count = 0
  let loaded = Promise.resolve (null)

  for await (const chunk of loadit) {
    chunkIndex  
    count  = chunk.length
    const files = chunk.map (d=> ({
      id: d.id,
      fileName: d.stripUrl,
      filePath: d.StripArtifact.SignedUrl.url
    }))
    // get the vectors
    const vectorChunk = await Promise.all(
      files.map(file => readImage(file.filePath).then(image => colorVectors(args, image, file)))
    )
    // bulk load to elastic 
    if (args.load) {
      loaded = stripBulk(vectorChunk)
    }
    console.log (`...processed ${chunkIndex} chunk(s), ${count} item(s)`)
  }
  console.log (`...done - processed ${chunkIndex} chunk(s), ${count} item(s)`)
  if (!args.load) {
    console.log ('...trial run only - to commit specify --load in cli args')
  }
  // if we don't wait for the last load to be done, we wont get an accurate index count
  loaded.then(()=>stripCount().then(r => console.log(`${r.count} items now in index ${index.index}`)))

}

main()

create vectors and

Normalizing color values.

Here’s the main code for compressing the image, bucketizing and normalizing the color space values into bins. For the ‘l2_norm’ similarity metric is important that we normalize the vector values (which initially have different scales and ranges). For simplicity I’m normalizing between 0..1.

const getChannelBins = (args, image) => {

  const [cols] = image.shape

  // and everything but the first row (every row is the same in this use case)
  // lets just work with regular arrays now - ndarrays are too exhausting - and convert to chroma colors too
  // we'll also drop the alpha channel 
  const chromas = Array.from({ length: cols },
    (_, col) => chroma(image.get(col, 0, 0), image.get(col, 0, 1), image.get(col, 0, 2))
  )

  const maxBin = Math.min(args.bins, cols)
  const buckets = Array.from({ length: maxBin }, () => [])

  // first bucketize the chroma values
  chromas.forEach((ch, col) => {
    const bin = Math.floor(col / chromas.length * buckets.length)
    buckets[bin].push(ch)
  })

  // average across bin values
  const bins = buckets.map(b => chroma.average(b))

  // get the rgbvalues (don't round)
  const rgbs = checkNormal('rgbs', bins.map(b => b.rgb(false)).map(([r, g, b]) => [r / 255, g / 255, b / 255]))

  // get the differences from reference colors
  const refColors = [chroma('magenta'), chroma('cyan'), chroma('yellow')]
  const refs = checkNormal('refs', bins.map(b => refColors.map(c => chroma.deltaE(c, b) / 100)))


  // get the oklab ranges by deduction are approx  0:1.01, -0.24:0.28, -0.29: 0.2
  const oklabs = checkNormal(
    'oklabs', bins.map(b => b.oklab()).map(([l, a, b]) => [l/1.01, (a   .24) / (.28   .24), (b   .29) / (0.2   .29)])
  )
  return {
    refs,
    rgbs,
    oklabs
  }

}

bucketizing the image anc creating vectors

Vector values

After this excercise each of the 9 fields has an array of 64 values each of which define an attribute of the average color across a time period of the film. These are the vector embeddings that Elastic search will use for nearest neighbor matching.

For exampe the r field will be an array of values between 0 and 1 which define the percentage ‘redness’ using the rgb code of the average color in each bin. A bin is a number of columns from the film strip, and each column is a summary of a slice of time (a collection of frames) from the original film.

Similarily the ol field will be an array of values between 0 and 1 which represent the luminance of the bin color using the oklab color space. I’ve found this color space to be most effective for expressing the perception of color – see Find nearest matches in color schemes with Apps Script for some color space comparisons.

And so on for the other fields.

Example vector values

A typical vector array might look somethiing like this – so there’d be 9 of these for each color strip.

[0.03413116703044943,0.07524576334039826,0.3792905685405672,0.6504033193065378,0.39155182083516554,0.3900236095338954,.....]

We can visualize these like this. Our example film of 1 minute long (1500 frames) would have been reduced to 9 arrays of 64 bins. Each bin representing about 23 frames – just under 1 second of film.

Elastic search mappings

Here are the mappings for each of those 3 x 3 vector mappings. Each of these 9 properties will have 64 values as a vector embedding. The “l2_norm” is the similarity metric I’m asking ES to use – this is essentially the distance between the target (the embedding of the film I’m looking for a duplicate for) and each of the candidates in the ES database.

{
  duplicate: ['r', 'g', 'b'],
  perception: ['oa', 'ol', 'ob'],
  distance: ['ref0', 'ref1', 'ref2']
}

{
  "mappings": {
    "properties": {
      "r": {
        "type": "dense_vector",
        "index": true,
        "similarity": "l2_norm"
      },
      .... repeat for each of the 9 vectors
}

elastic index mappings

The search

Once indexed, doing a search on elastic is straightforward.

export const stripSim = (args, target) => {

  const should = args.searchFields.map(field => ({
    num_candidates: args.imageMax,
    k: args.k,
    field,
    query_vector: target._source[field]
  }))

  const body = { knn: should, index: getIndexName(), size: args.limit, from: args.offset }

  return esSearch(body)

}

Elastic search spec

target – contains the vector embeddings of the film for which we want to find potential matches
args.imageMax – the number of candidates elastic should consider as part of the search corpus
field – each of the up to 9 fields (r,g,b,ol,oa,ob,ref0,ref1,ref2) to be used for similarity matches
query_vector – the vector embedding of the target to match the field‘s values against.
k the number of candidates to look for (they’ll be sorted according to nearness score)
size,offset – can be used for paging purposes if you’re dealing with a large k value.

The result

Next step was to enhance the UI so it could do color strip comparisons and find matches. So far I’ve only encoded about 15% of my films. As I bring the others on we’ll see even closer matches.

Finding duplicates

Here’s a duplicate with 80% certainty. Looking at the films in more detail I can see that although they are the same film, one has been edited and is a few seconds shorter. The bucketizing of the strips allowed catching duplicates across different edits of the same film.

Comparing to AI labelling comparison

Using the labelling from the Video Intelligence API I also detect the same duplication, but with a little less accuracy. However – the search time using the color strip is just a second or two. The (much more complex) label search method takes about 20 seconds.

Using alternative vector searching

Doing the color search on just rgb vector values finds the duplicate. This time with 74% confidence rather than 80% with all 3 vectors.

Using color perception only on the oklab color space gives a 91% confidence of a match

SImilar settings

Here’s another example. This time we not only caught an exact duplicate, but also another with the same kitchen setting even though it’s a different film.

Similar looks

Notice how using perception only can identify films that have a similar look, even though their color strips are quite different.

FIlms from the same family

This time we’ve found 3 duplicates.

But if we widen the confidence level of interest a bit, we notice a another film that looks like it might be a duplicate – but it only has 32% confidence. In fact it’s a completely different film. But it’s made by the same company to a ‘formula’ as part of a campaign. So we now have another way of grouping films that are part of the same family.

Summary

Delving into the details of vector embeddings has been a great learning experience for me, and as a bonus the end result is a really effective alternative way to find not only films that are more or less duplicates (and help clean up my database) but alsoto find other ways of finding ‘films like this one’.

My films are generally short ads, so the method might not be as effective for longer films with large edits.

In future posts I’ll share some more of the code and conclusions once I’ve loaded a lot more data.