Some time ago I described how to use the Video Intelligence (VI) API to create the ‘dna’ of a film by using VI labels. You can then use these fingerprints to find films that are likely to be copies, or similar to other films. We did a Totally Unscripted show a couple of years ago to describe the technique. You can see the video here.

In this post I’ll show another technique to create a different kind of DNA which not only can enhance film disambiguation by content using VI labels, but can also be used to find films that are similar in the shots – color is often used to set the atmosphere or feel of a film, so we can find films that share an atmosphere in a more subtle way than simply comparing content. This article is limited to showing how to create color ‘strips’ of a film. I’ll do another later to show how these strips can be compared with each other for similarities.

Content DNA

First though, let’s recap on content dna. Using the VI API we can analyze a film and pick up these kind of labels.

These can be used to navigate through a film. Here for example, I’ve noticed that we have a frame label of ‘gym’, and clicked on it to jump to the part of the film with a gym. The ‘dna’ of the frame labels is shown on the right – color coded for each label. You can imagine if we had a duplicate or film with similar content, the dna signature would be similar.

For disambiguation, we also use ‘object tracking’. This follows given item’s movement through the film – so for example if we had 2 films, both with a dog and balloon making the same relative movement – there’s a good chance they are the same film, especially if backed up by other coincident object tracking matches.

Color strips

Here’s a music video with a color strip dna just below the film, as well as content labelling.

Just as we used the content dna to navigate the film, we can also use the color tracking dna to navigate by clicking on a color in the strip.

I clicked on the yellowish section to get here.

How to make color strips

Here’s a shell that copies a file from Cloud Storage, and finds the entry in my database by using the md5 of the file. Because I want all the strips to be the same length regardless of the length of the film, I have to normalize the sample rate to take a fixed number of sample images and average the coloe of those sampled images to a single image file. Then we combine the image files into a strip, load it to cloud storage, and update my database with the url of the color strip.

You’ll need bc and ffpmeg

Here’s the shell script

#!/bin/bash
# film name
FILM=$1
# the api key to access my api to update my database
APIKEY=$2

# some of this may not be relevant for you
BUCKET="gs://MY_BUCKET/"
PFX="MY_DESTINATION_GCS_FOLDER/"
FX="MY_VIDEO_SOURCE_GCS_FOLDER/"
API="https://MY_API_ENDPOINT"


if [ -z $FILM ]; then
echo "Arg 1 should be the film name"
exit 1
fi
echo "...doing film ${FILM}"
if [ -z $STRIP ]; then
echo "Arg 2 should be the strip name"
exit 1
fi
echo "...creating strip ${STRIP}"
if [ -z $FPS ]; then
echo "Arg 3 should be the new fps to generate the required number of images"
exit 1
fi
echo "...new FPS is ${FPS}"
if [ -z $FOLDER ]; then
echo "Arg 4 should be the folder for the final result png"
exit 1
fi

## get the md5 of the film
## eg 0006c7fe42adaa132c21324e121c304c.mp4 = 0006c7fe42adaa132c21324e121c304c
MD5="${FILM%.*}"
echo "...working on md5 ${MD5}"

# this is my small app to pick up the ID in my database that matches the film MD5
# you'll have to replace this however you are going to record the image color strip produced
FILMMASTER=$(node index.mjs -v false -a ${API} -k ${APIKEY} -m ${MD5})
echo "...FILMMASTER ${FILMMASTER}"

## get film from gcs
gsutil cp "${BUCKET}${FX}${FILM}" ./

# get the current number of frames
# ffprobe gets installed with ffmpeg
FPS=$(ffprobe -v error -select_streams v:0 -show_entries stream=avg_frame_rate -of default=noprint_wrappers=1:nokey=1 "${FILM}" | sed -E "s/\/1//" | bc -l)

if [ -z $FPS ]; then
echo "Couldnt get the frame rate"
exit 1
fi

# now get the duration in secs
S=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "${FILM}" | bc -l)
if [ -z $S ]; then
echo "Couldnt get the duration"
exit 1
fi

# we want a standard number of images no matter the fps or duration
# so in other words change the frame rate
# the new framerate needs to be the duration in secs/number of images required
IMAGES=256
FPS=$(echo "$S / $IMAGES" | bc -l)

echo "...${FILM} (seconds:${S} total frames:${TF} new FPS ${FPS} for ${IMAGES} images)"
FOLDER="strips/"
STRIP=$(basename ${FILM})

echo "...result will be in ${FOLDER}${STRIP}"
SCRATCH="./tmp/"
PREFIX="${MD5}"-
EXT=".png"

# make the series of images
BITS="${SCRATCH}${PREFIX}d${EXT}"
FINAL="${FOLDER}${STRIP}${EXT}"

# clean the previous stuff in case there's any around
rm -f -- "${SCRATCH}${PREFIX}*${EXT}"
rm -f -- "${FINAL}"

# this will cut the film into a standard number of images and take the average color
ffmpeg -i "${FILM}" -hide_banner -loglevel error -vf tblend=all_mode=average,scale=1:1,fps=1/${FPS} "${BITS}" < /dev/null

# number of files created
NFILES=$(ls ${SCRATCH} | tail -n 1 | sed "s/${PREFIX}//" | sed "s/.png//" | bc)
echo "...${NFILES} created in ${SCRATCH}" < /dev/null

# get the files created and combine
ffmpeg -i "${BITS}" -hide_banner -loglevel error -filter_complex scale=3:75,tile="${NFILES}x1",avgblur=sizeX=2 -update true "${FINAL}" < /dev/null

# clean tmp files
rm "${SCRATCH}"*
echo "...final result is ${FINAL}"

# now move to gcs
DEST="${BUCKET}${PFX}/"
gsutil cp "${FINAL}" "${DEST}"

# delete the film
rm "${FILM}"

# do the mutation
# this is my small app to update my database
# you'll have some other way of recording the result location on cloud storage
node index.mjs -v true -a ${API} -k ${APIKEY} -f ${FILMMASTER} -s ${STRIP}${EXT}
strip.sh

Batching

I have many thousands of films to process, so I’ll need a script that can compare all the video files I have on GCS, with all the image files and select out those that don’t yet have a color strip prepared. Here’s how

BUCKET="gs://MY_BUCKET/"
gsutil ls "${BUCKET}MY_VIDEO_FOLDER" > films.lst
cat films.lst | sed -E "s/.*\///g" | sort > vids.lst
gsutil ls "${BUCKET}MY_IMAGE_FOLDER" | sed "s/.png//g" | sed -E "s/.*\///g" | sort > strips.lst
## files vids without strips
comm -3 vids.lst strips.lst > work.lst
Video files without strips -mks.sh

Processing the batch

Since there are new films arriving all the time, I usually select out a chunk at a time with something like this

sh mks.sh | head work.lst -n 100 > w.lst

Then, run the strip.sh for the selected chunk

KEY=MY_API_KEY
while IFS= read -r LINE; do
echo $LINE
bash strip.sh "${LINE}" ${KEY}
done < w.lst
work.sh – process the list of videos

A few more

Here’s a few more color strips from random ads