In Making a film color DNA I described how I was using both color strips and labelled content to find videos that were duplicates or similar. I’m already using ElasticSearch to disambiguate films using labels and object tracking artefacts identified by the Google Video Intelligence API. The next step is to enhance that similarity matching using the color strips of a film as finger prints.
Color strips
Here’s an example of a few films with their color strip underneath them. The length of the strip is normalized. We always end up with the same number of pixel columns, regardless of the length of the film. The sampling rate is a function of the film duration and the target number of samples in a a strip. A strip is the average of the colors of the frames that it represents.
Some example color strips
How to make color strips
If the frame rate is 25fps and the film is 1 minute long, there will be 1500 frames in the film. I’ve set a standard target of 256 samples. Each column in the strip will therefore represent about 6 frames. Each column is 3 pixels wide and 75pixels high. The final strip size will come out at 768 x 75 (or thereabouts depending on the precise frame rate vs length). This color strip then gives a fairly unique fingerprint for each film.
If you want to make your own, you can find a script for generating them at in Making a film color DNA
What are vector embeddings and vector databases
In tandem with the explosion in ML, you’ll also have seen a profileration of dedicated vector databases and regular databases being hastily equipped with vector capabilities.
A vector embedding is simply an array of numerical representations of an item in a multi-dimensional space. If you had a 2 dimensional model, for example, you could use say income and age as values in these 2 dimensions and then use a similarity function to find people that were most like each other by plotting those values on a graph and picking the ones that were closest together.
Of course, in a multi dimensional space it becomes harder to envision how that would look, and this is where the indexing and searching capabilities of vector databases come in.
Elastic Search
I’m already using Elastic Search (ES) in my application. ES already has mature and highly capable vector indexing and searching built in so it was a perfect platform to use for my color strip adventures. ES calls this k-nearest neighbor (kNN) search – which finds the k nearest vectors to a query vector, as measured by a similarity metric.
Generating vectors
The task now is to find a way of searching for duplicate or similar color strips.
Creating vectors
Initially I planned to use the Hugging face SentenceTransformer models but I couldn’t get any useful results from it. It was super slow for the many thousands of images I needed to process. My use case is really quite simple – create searchable vectors from colors – no object recognition or NLP required.
I experimented a bit (after learning Python which I wasn’t at all familiar with), and found I could create my own vectors, load them to Elastic and search on them. In the end I settled on 3 measures – 2 color spaces and a color difference triangulation from 3 reference colors.
- rgb values
- oklab values
- the distance from 3 reference colors
I had read various articles about using rgb alone, but found this worked okay for exact duplicates, but was poor at detecting similarity. However all 3 of these measures, used seperately or together, gave some good searchable vectors for a variety of use cases.
Bin sizes
I wanted to match film strips that were from different cuts in of the same film. Some may have had slight chunks edited out or had slightly different colors due to different compression, frame rates and/or encoding schemes.
To balance speed and accuracy I went for 64 bins per color strip (so 3 strip columns per bin) – which ends up at ( 3 measures x 3 values * 64 bins) = 576 vectors.
Color extraction
Having done the initial experiments using Python, I switched over to Node. All the the other applications in my project are in Node. I used chroma for color manipulation and get-pixels to extract the color profiles of my color strips. If you are familiar with this site you may already know that I’ve done many articles on chroma so it was the natural choice.
I’m not going to put all the code in this article, but here’s the main app for creating the vectors and loading them to Elastic so you can get a flavor.
Behind the scenes my API returns every film in my database which has a color strip attached (the actual strip image is on Google Cloud storage). It then creates vectors of each of the 9 fields and bulk loads them into Elastic.
Normalizing color values.
Here’s the main code for compressing the image, bucketizing and normalizing the color space values into bins. For the ‘l2_norm’ similarity metric is important that we normalize the vector values (which initially have different scales and ranges). For simplicity I’m normalizing between 0..1.
Vector values
After this excercise each of the 9 fields has an array of 64 values each of which define an attribute of the average color across a time period of the film. These are the vector embeddings that Elastic search will use for nearest neighbor matching.
For exampe the r field will be an array of values between 0 and 1 which define the percentage ‘redness’ using the rgb code of the average color in each bin. A bin is a number of columns from the film strip, and each column is a summary of a slice of time (a collection of frames) from the original film.
Similarily the ol field will be an array of values between 0 and 1 which represent the luminance of the bin color using the oklab color space. I’ve found this color space to be most effective for expressing the perception of color – see Find nearest matches in color schemes with Apps Script for some color space comparisons.
And so on for the other fields.
Example vector values
A typical vector array might look somethiing like this – so there’d be 9 of these for each color strip.
[0.03413116703044943,0.07524576334039826,0.3792905685405672,0.6504033193065378,0.39155182083516554,0.3900236095338954,.....]
We can visualize these like this. Our example film of 1 minute long (1500 frames) would have been reduced to 9 arrays of 64 bins. Each bin representing about 23 frames – just under 1 second of film.
Elastic search mappings
Here are the mappings for each of those 3 x 3 vector mappings. Each of these 9 properties will have 64 values as a vector embedding. The “l2_norm” is the similarity metric I’m asking ES to use – this is essentially the distance between the target (the embedding of the film I’m looking for a duplicate for) and each of the candidates in the ES database.
{
duplicate: ['r', 'g', 'b'],
perception: ['oa', 'ol', 'ob'],
distance: ['ref0', 'ref1', 'ref2']
}
The search
Once indexed, doing a search on elastic is straightforward.
- target – contains the vector embeddings of the film for which we want to find potential matches
- args.imageMax – the number of candidates elastic should consider as part of the search corpus
- field – each of the up to 9 fields (r,g,b,ol,oa,ob,ref0,ref1,ref2) to be used for similarity matches
- query_vector – the vector embedding of the target to match the field‘s values against.
- k the number of candidates to look for (they’ll be sorted according to nearness score)
- size,offset – can be used for paging purposes if you’re dealing with a large k value.
The result
Next step was to enhance the UI so it could do color strip comparisons and find matches. So far I’ve only encoded about 15% of my films. As I bring the others on we’ll see even closer matches.
Finding duplicates
Here’s a duplicate with 80% certainty. Looking at the films in more detail I can see that although they are the same film, one has been edited and is a few seconds shorter. The bucketizing of the strips allowed catching duplicates across different edits of the same film.
Comparing to AI labelling comparison
Using the labelling from the Video Intelligence API I also detect the same duplication, but with a little less accuracy. However – the search time using the color strip is just a second or two. The (much more complex) label search method takes about 20 seconds.
Using alternative vector searching
Doing the color search on just rgb vector values finds the duplicate. This time with 74% confidence rather than 80% with all 3 vectors.
Using color perception only on the oklab color space gives a 91% confidence of a match
SImilar settings
Here’s another example. This time we not only caught an exact duplicate, but also another with the same kitchen setting even though it’s a different film.
Similar looks
Notice how using perception only can identify films that have a similar look, even though their color strips are quite different.
FIlms from the same family
This time we’ve found 3 duplicates.
But if we widen the confidence level of interest a bit, we notice a another film that looks like it might be a duplicate – but it only has 32% confidence. In fact it’s a completely different film. But it’s made by the same company to a ‘formula’ as part of a campaign. So we now have another way of grouping films that are part of the same family.
Summary
Delving into the details of vector embeddings has been a great learning experience for me, and as a bonus the end result is a really effective alternative way to find not only films that are more or less duplicates (and help clean up my database) but alsoto find other ways of finding ‘films like this one’.
My films are generally short ads, so the method might not be as effective for longer films with large edits.
In future posts I’ll share some more of the code and conclusions once I’ve loaded a lot more data.
Links
- Making a film color DNA
- Google Video Intelligence API film labelling
- Totally Unscripted special show on film disambiguation with Video Intelligence API
- Color scales, custom schemes and proxies with Apps Script
- Content oriented color mixing with Apps Script
- Find nearest matches in color schemes with Apps Script
- Color scales, custom schemes and proxies with Apps Script