Dealing with rate limited services

In Running GmailApp in parallel we found that throwing additional parallel executors at a rate limited service doesn't help much - it will only let you do so much at the same time. However Running things in parallel using HTML service is a sequential scheduler as well as a parallel orchestrator. So we can split the job up and run sequential sets of parallel threads to eventually get the job done. A head and tail janitor process carry forward the work that has already been done, or needs to be done to a reduction process ready for the next set. In this way we get through all the work, as long as we keep each thread to below 6 minutes.


As usual we'll create profiles and some new .map executors for this.

The profiles

The first task is to get all the threads matching the search text in your email. 
function logEmailsProfileSets() {
  var profile = [];
  
  // get the matching threads
  profile.push( [
    {
      "name": "GET THREADS",
      "functionName": "getTheThreads",
      "skip": false,
      "debug":false,
      "options": {
        "searchText": "The Excel Liberation forum has moved to a Google+ community"
      }
    }
  ]);

Next, a reduction to bring all the results together. As usual we'll use the common function reduceTheResults(), that we've used in all the examples for that. The reason for a reduction is that we might later split the getTheThreads() function into parallel execution tasks, and we'd need to combine the results.
  // next reduce the messages to one
  var profileReduction = [];
  profileReduction.push({
    "name": "reduction",
    "functionName":"reduceTheResults",
    "options":{
    }
  });


Now we need to split the threads into some number of sets, and within each set, a number of threads to be worked on in parallel. First the head janitor function. His role is to carry forward any work that has already been done in previous sets.
  // get and process all the messages
  var SETS = 4;
  var CHUNKS = 2;
  
  // a set is a collection of threads that can run together, including a head janitor and tail janitor function to mop up
  for (var j=0; j < SETS ; j++) {
    var profileSet = [];
    var janitor =  {
      "name": "HEAD JANITOR:" + j,
      "functionName":"janitor",
      "skip":false,
      "debug":false,
      "options": {
         "setIndex": j,
         "sets":SETS,
         "janitor": "head"
       } 
    };
    profileSet.push(janitor);

Now the threads within each set
    // now the threads
    for (var i =0; i <CHUNKS;i++ ) {
      profileSet.push ( {
        "name": "MESSAGES:" + i,
        "functionName":"getMessageSets",
        "skip":false,
        "debug":false,
        "options": {
          "setIndex": j,
          "sets":SETS,
          "threads":CHUNKS,
          "index": i
        } 
      });
    }

And then the tail janitor - his job will be to carry forward the work that needs to be done in subsequent sets
    // now the tail janitor
    var janitor = cUseful.clone(janitor);
    janitor.name = "TAIL JANITOR:" + j;
    janitor.options.janitor = "tail";
    profileSet.push(janitor);

    // add this set to the list
    profile.push(profileSet);

We need a reduction between each set. 
    // add this set to the list, then do a reduction
    profile.push(profileSet , profileReduction);
      
  }

Now we need a special reduction that will blow out cases where we have multiple recipients into separate records
  profile.push([{
    "name": "reduction",
    "functionName":"reduceTheEmailToResults",
    "debug":true,
    "options":{
    }
  }]);

Finally we'll need to log the results

  // finally log the results
  profile.push( [{
    "name": "LOG",
    "functionName": "logTheResults",
    "skip": false,
    "debug": false,
    "options": {
      "driver": "cDriverSheet",
      "clear": true,
      "parameters": {
        "siloid": "emails",
        "dbid": "1yTQFdN_O2nFb9obm7AHCTmPKpf5cwAd78uNQJiCcjPk",
        "peanut": "bruce"
      }
    }
  }]);

  return profile;
  
  
}

The only change we need to make now is to call this function to set up the run profile
function showSidebar() {
   
   // kicking off the sidebar executes the orchestration
   libSidebar('asyncService',ADDONNAME, logEmailsProfileSets() );
 
}

The executors

Now we need to write the couple of new functions mentioned in these profiles that will be called by the htmlservice

This one will search the email for threads that match the search term - this is the same as the one used in Running GmailApp in parallel

function getTheThreads(options) {
  return cUseful.rateLimitExpBackoff( function() {
    return GmailApp.search(options.searchText).map(function(d) {
      return d.getId();
    });
  });
}

and this one will be run in parallel, processing chunks of the total messages. It differs from the one in Running GmailApp in parallel, since we now have the concept of 'sets'. We're also postponing the duplication of records with multiple recipients in the to: list until the final reduction, rather than doing it here. 
/**
 * do a chunk of message processing - this one is expecting to do it in sets
 * @param {object} options describes what to do 
 * @param {object} reduceResults this would contain results from a previous stage if present
 * @return {object} test data to pass on to next stage
 */
function getMessageSets (options,reduceResults) {
  
  var data = reduceResults[0].results;
  
  // peel off the data that is to do with this set
  var setStart = Math.round(options.setIndex/options.sets * data.length);
  var setFinish = Math.round(options.setIndex+1)/options.sets * data.length ;
  var setData = data.slice(setStart, setFinish);
  
  
  // we'll only do a section of data belonging to this thread
  var start = Math.round(options.index/options.threads * setData.length);
  var finish = Math.round((options.index+1)/options.threads * setData.length) ;
  if (finish < start ) throw 'invalid getmessagesets indices start ' + start + ' finish ' + finish + ' data length ' + setData.length;
  
  var threadData = setData.slice(start, finish);
  

  // work with that slice of messages
  return threadData.map ( function (c) {
    // for later decrypt testing, we'll include everything
    try {
      return cUseful.rateLimitExpBackoff(function () {
        
        return GmailApp.getThreadById(c).getMessages().map(function(e) {
            return {to:e.getTo(),subject:e.getSubject(),dateSent:e.getDate().toString(),from:e.getFrom()};
          });
        
      },1000);
    }
    catch (err) {
      throw 'error in getting id ' + JSON.stringify(c) +  ' start ' + start + ' finish ' + finish + ' data length ' + threadData.length + ':' + JSON.stringify(err) ;
    }

  });
  
}

Here's the modified final reduction - expanding the to: recipient field
/**
 * reduce the results from a previous mapping excercise - this special because we'll end up with a differnt number of results
 * @param {object} options describes what to do 
 * @param {object} mapResults this would contain results from a previous stage if present
 * @return {array.*} test data to pass on to next stage
 */
function reduceTheEmailToResults(options, mapResults) {

  // we'll have all the results here so consolidate, also expanding out the comma separated to fields

  var results = mapResults.reduce ( function (p,c) {
      (Array.isArray (c.results) ? c.results : [c.results]).forEach (function(d) {
        
        d.forEach (function(e) {
          cUseful.arrayAppend(p, e.to.split(",").map(function(f) {
            return {to:f,subject:e.subject,dateSent:e.dateSent,from:e.from};
          }));
        });
       
      });
      return p;
  },[]);
  
  return results;

}

Here's a snap of the run - We got 725 seconds of processing over 386 seconds, and managed to process something we wouldn't have been able to inside of a 6 minute limit, even with parallel execution. 



Remember these tips.
  • it doesn't help to throw lots of executors at a rate limited service - most of the time will be spent waiting. Running scheduled sets will eventually get it done.
  • Your execution processes should never change the number of records. The end of a set should always have the same as it started with - execution threads are .map() operations
  • If your final result does need to change the number of records, it should be done in a final reduction.
For the code you need to set this all up, see Parallel implementation and getting started


Authorization

GmailApp needs authorization, so its worth running a small test first to force an authorization dialog, as well as to test your executor functions. That can easily be done by emulating a couple of execution steps in a simple function like this.


function smallEmailTest() {
  var messages = getTheThreads({searchText:'something bizarre'});
  Logger.log(getMessages ({index:0,threads:2},[{results:messages}]));
}


For more on this topic, see Running things in parallel using HTML service. For more snippets like this see Google Apps Scripts snippets

For help and more information join our forum,follow the blog or follow me on twitter .

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, available All formats are available now from O'Reilly,Amazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.




Comments