In Running GmailApp in parallel we found that throwing additional parallel executors at a rate limited service doesn’t help much – it will only let you do so much at the same time. However Running things in parallel using HTML service is a sequential scheduler as well as a parallel orchestrator. So we can split the job up and run sequential sets of parallel threads to eventually get the job done. A head and tail janitor process carry forward the work that has already been done, or needs to be done to a reduction process ready for the next set. In this way we get through all the work, as long as we keep each thread to below 6 minutes.
As usual we’ll create profiles and some new .map executors for this.
The profiles
function logEmailsProfileSets() { var profile = []; // get the matching threads profile.push( [ { "name": "GET THREADS", "functionName": "getTheThreads", "skip": false, "debug":false, "options": { "searchText": "The Excel Liberation forum has moved to a Google+ community" } } ]);
Next, a reduction to bring all the results together. As usual we’ll use the common function reduceTheResults(), that we’ve used in all the examples for that. The reason for a reduction is that we might later split the getTheThreads() function into parallel execution tasks, and we’d need to combine the results.
// next reduce the messages to one var profileReduction = []; profileReduction.push({ "name": "reduction", "functionName":"reduceTheResults", "options":{ } });
Now we need to split the threads into some number of sets, and within each set, a number of threads to be worked on in parallel. First the head janitor function. His role is to carry forward any work that has already been done in previous sets.
// get and process all the messages var SETS = 4; var CHUNKS = 2; // a set is a collection of threads that can run together, including a head janitor and tail janitor function to mop up for (var j=0; j < SETS ; j++) { var profileSet = []; var janitor = { "name": "HEAD JANITOR:" + j, "functionName":"janitor", "skip":false, "debug":false, "options": { "setIndex": j, "sets":SETS, "janitor": "head" } }; profileSet.push(janitor);
Now the threads within each set
// now the threads for (var i =0; i <CHUNKS;i++ ) { profileSet.push ( { "name": "MESSAGES:" + i, "functionName":"getMessageSets", "skip":false, "debug":false, "options": { "setIndex": j, "sets":SETS, "threads":CHUNKS, "index": i } }); }
And then the tail janitor – his job will be to carry forward the work that needs to be done in subsequent sets
// now the tail janitor var janitor = cUseful.clone(janitor); janitor.name = "TAIL JANITOR:" + j; janitor.options.janitor = "tail"; profileSet.push(janitor); // add this set to the list profile.push(profileSet);
We need a reduction between each set.
// add this set to the list, then do a reduction profile.push(profileSet , profileReduction); }
Now we need a special reduction that will blow out cases where we have multiple recipients into separate records
profile.push([{ "name": "reduction", "functionName":"reduceTheEmailToResults", "debug":true, "options":{ } }]);
Finally we’ll need to log the results
// finally log the results profile.push( [{ "name": "LOG", "functionName": "logTheResults", "skip": false, "debug": false, "options": { "driver": "cDriverSheet", "clear": true, "parameters": { "siloid": "emails", "dbid": "1yTQFdN_O2nFb9obm7AHCTmPKpf5cwAd78uNQJiCcjPk", "peanut": "bruce" } } }]); return profile; }
The only change we need to make now is to call this function to set up the run profile
function showSidebar() { // kicking off the sidebar executes the orchestration libSidebar('asyncService',ADDONNAME, logEmailsProfileSets() ); }
The executors
function getTheThreads(options) { return cUseful.rateLimitExpBackoff( function() { return GmailApp.search(options.searchText).map(function(d) { return d.getId(); }); }); }
and this one will be run in parallel, processing chunks of the total messages. It differs from the one in Running GmailApp in parallel, since we now have the concept of ‘sets’. We’re also postponing the duplication of records with multiple recipients in the to: list until the final reduction, rather than doing it here.
/** * do a chunk of message processing - this one is expecting to do it in sets * @param {object} options describes what to do * @param {object} reduceResults this would contain results from a previous stage if present * @return {object} test data to pass on to next stage */ function getMessageSets (options,reduceResults) { var data = reduceResults[0].results; // peel off the data that is to do with this set var setStart = Math.round(options.setIndex/options.sets * data.length); var setFinish = Math.round(options.setIndex+1)/options.sets * data.length ; var setData = data.slice(setStart, setFinish); // we'll only do a section of data belonging to this thread var start = Math.round(options.index/options.threads * setData.length); var finish = Math.round((options.index+1)/options.threads * setData.length) ; if (finish < start ) throw 'invalid getmessagesets indices start ' + start + ' finish ' + finish + ' data length ' + setData.length; var threadData = setData.slice(start, finish); // work with that slice of messages return threadData.map ( function (c) { // for later decrypt testing, we'll include everything try { return cUseful.rateLimitExpBackoff(function () { return GmailApp.getThreadById(c).getMessages().map(function(e) { return {to:e.getTo(),subject:e.getSubject(),dateSent:e.getDate().toString(),from:e.getFrom()}; }); },1000); } catch (err) { throw 'error in getting id ' + JSON.stringify(c) + ' start ' + start + ' finish ' + finish + ' data length ' + threadData.length + ':' + JSON.stringify(err) ; } }); }
Here’s the modified final reduction – expanding the to: recipient field
/** * reduce the results from a previous mapping excercise - this special because we'll end up with a different number of results * @param {object} options describes what to do * @param {object} mapResults this would contain results from a previous stage if present * @return {array.*} test data to pass on to next stage */ function reduceTheEmailToResults(options, mapResults) { // we'll have all the results here so consolidate, also expanding out the comma separated to fields var results = mapResults.reduce ( function (p,c) { (Array.isArray (c.results) ? c.results : [c.results]).forEach (function(d) { d.forEach (function(e) { cUseful.arrayAppend(p, e.to.split(",").map(function(f) { return {to:f,subject:e.subject,dateSent:e.dateSent,from:e.from}; })); }); }); return p; },[]); return results; }
Here’s a snap of the run – We got 725 seconds of processing over 386 seconds, and managed to process something we wouldn’t have been able to inside of a 6 minute limit, even with parallel execution.
Remember these tips:
- it doesn’t help to throw lots of executors at a rate limited service – most of the time will be spent waiting. Running scheduled sets will eventually get it done.
- Your execution processes should never change the number of records. The end of a set should always have the same as it started with – execution threads are .map() operations
- If your final result does need to change the number of records, it should be done in a final reduction.
For the code you need to set this all up, see Parallel implementation and getting started
Authorization
function smallEmailTest() { var messages = getTheThreads({searchText:'something bizarre'}); Logger.log(getMessages ({index:0,threads:2},[{results:messages}])); }
For more snippets like this see Google Apps Scripts snippets