The problem with rate limits

You’ve most likely hit the problem of rate-limited APIS at some point. You make a request and it gets refused because you’ve done too many requests in some time period. Fair enough, there are many techniques (you’ll find various approaches elsewhere on this site) to deal with this but usually it’s after the event. You try something, it fails, you wait a bit, you try again – repeat till it works. Some APIS have the decency to give you details about how long to wait before trying again (Vimeo is a good example of this). However, this all can be a real pain when

  • The requests you are making are asynchronous and uncontrollable
  • You are using a paid-for API and have to pay for usage even if rejected
  • There are multi-layers to the rate-limiting – for example a per minute rate and perhaps a monthly limit too – retries to defeat the per minute rate get added to your monthly usage
  • If you are using pubsub, which will publish another attempt just as you are trying to deal with the previous rate limit failure, so you process that one too, and it all spirals into a recursive set of activities that achieve nothing at all.

An approach

Being able to queue requests and feed them in a controlled way is a great way to avoid rate limit failures before they happen, rather than deal with them after they’ve happened. My goto for all matters asynchronous is always Sindre Sorhus who has some awesome repositories for this kind of stuff. In this case, I’m using p-queue  as the basis for an asynchronous queuing solution.

The problem

My app and its back end system use the Google Video Intelligence API to analyze films uploaded by users, firstly to catalogue their content for searching, and secondly to disambiguate them. They may be copies or edits of films uploaded by others – so I need to ensure I have a way of matching films that are similar to each other to avoid duplication of metadata. An analyzed film has this kind of searchable information
Which expand out into stuff like this, for example
This allows not only direct navigation to points in the film where that content was detected, but also makes the film content searchable, for example
Finally, along with object tracking information, these can form part of the disambiguation process for new films being loaded.

The Video Intelligence API

 Here’s the workflow
The Video Intelligence API and this workflow ticks all the boxes as a problem API for rate limiting.
  • The processing runs as a long-running API task – a black box – that either works or doesn’t. If it fails due to a 429 (rate-limit – per minute) problem, the run is wasted, you have to start again – and pay again. At over $1 a minute, this can really mount up.
  • Pubsub requests could be arriving at any time. If there are multiple failures, they may be delivered multiple times, making things even worse.
  • Everything is asynchronous


Here’s where a queue comes in handy.  The p-queue code almost worked straight out of the box, except that I also needed deduplication to discard multiple pubsub requests when it was becoming impatient for a message ack. The characteristics I need are


  • Limit a certain number of runs in a given interval
  • Limit the number of concurrent runs
  • Deduplication of requests already queued up to do the same thing, either because of a failure retry or the same film was submitted multiple times.
  • Introduce logging


I made a wrapper for p-queue to add the extra stuff I needed. Here’s the code

Of note are the use of a digest to identify a queue insertion so that duplicates can be detected, and the ability to treat a dup as a cause for concern or part of normal operation. In the vi processor itself, it’s a straightforward asynchronous queue with single concurrency. When one finishes the other starts.

Initialize it like this

Add items to the queue like this, passing a digest to uniquely identify what this request is doing with which film to be used as a duplicate detector. When the queue item (action()) is finally resolved the pubsub messaged is asked (according to the returned consume property), and it’s all over. If the item is skipped it means it’s already in the queue so we don’t want to tell pubsub to stop sending messages in case the queued version subsequently fails.

Bulk processing

Normally the occasional film needs processing, but you may want to do some operation that analyzes thousands of films. In this case, we don’t want to leave it to the processor to handle duplicates and queueing, because pubsub will be going crazy waiting for its messages to be consumed while they are all waiting in the queue to be consumed one by one. In this case, we need a queue for a queue, which only provokes and analysis request according to some schedule. We can use the same pq module to accomplish this.

This time we want to provoke ‘intervalCap’ instances concurrently in any ‘interval’. In my case – this turns out to be 1 every 120 seconds.

The task this time is to send a request to pubsub to process another film. The processor will either do it right away or add it to the queue – but because the bulk updater is itself throttling requests it will receive them in an orderly enough way to be able to deal with them tidily.

Running bulk

Just to finish this topic, all the stages, including my back end database and graphql api, run as Kubernetes deployments, but the bulk processing submitter itself can also run as a Kubernetes job – which gets it off your desktop.  My bulk processor is a node app, so I can create an image, push it to the container registry and kick it off with.

And that’s it – handling asynchronous tasks and avoiding rate limits before they happen.


Since G+ is closed, you can now star and follow post announcements and discussions on github, here
More Google Cloud Platform topics below