There’s no getting away from the fact that Apps Script is slower than the equivalent client based JavaScript processing. It is fundamentally synchronous in implementation, and also has limits on processing time and a host of other quotas. For a cloud based, free service that’s about extending Drive capabilities rather than being scalable in the manner of Google App Engine, I suppose it’s normal. But let’s see if we can at least subvert at least these two things.

There are other ways to approach parallel running, such as taking advantage of the client processing capability. If you are able to do that, then you should take a look at Running things in parallel using HTML service. The method in this post uses timed triggers which are not very controllable.

  • get over the 6 minute maximum execution time for Apps Script
  • run things in parallel

Latest news – There’s a new and improved version of this over on Orchestration of Apps Scripts – parallel threads and defeating quotas

I figured that if I implemented a rudimentary Map/Reduce capability that could split a meaty task into multiple chunks, run them all at the same time on separate threads, then bring the result together for final processing, then I could achieve these two goals. The TriggerBuilder service is key to this, but it’s rather difficult to control execution. Specifically this innocent looking sentence taken from the documentation.

Specifies the duration (in milliseconds) after the current time that the trigger will run. (plus or minus 15 minutes).

Plus or minus 15 minutes… (why specify in milliseconds?)

In any case, let’s press on and see what we have here. Here’s a primer for a way to orchestrate parallel tasks, or maybe you should take a look at Parallel process orchestration with HtmlService, which is another way of busting the 6 minute limit. Less complex and more reliable, but less geekworthy.

TriggerHappy

Libraries

I provide a library (cTriggerHappy) – MuIOvLUHIRpRlID7V_gEpMqi_d-phDA33, which you can include or fork as you prefer. Another library you need in you application is Database abstraction with google apps script, which is Mj61W-201_t_zC9fJg1IzYiz3TLx7pV4j. If you are forking the library, it needs Database abstraction with google apps script and Using named locks with Google Apps Scripts.

How to set up

This is fairly extreme scripting, so it’s a little complex. You should first take a look at the primer slides and start with a copy of an example application.

The control object.

This is used to manage orchestration, and specifies various things including some setup so you can use Database abstraction with google apps script. Although you could probably use any of the supported back end databases, I recommend Google Drive for the data, and a spreadsheet for logging and reporting. Here’s an example control function, which you should tailor to your own environment. There are 5 data types each of which could be held in independent data stores if required.

Other control parameters.

For testing, you should run with this false, then change to true when everything looks good.. This will cause no triggers to be generated, but will instead allow the process to be run sequentially in line

This is the number of milliseconds to wait between trigger creations and execution. The TriggerBuilder choreography seems to be a little more solid if you wait a bit before starting execution of a trigger

Debugging can be tricky with detached processes. This allows logging material to be written to the store described in logAccess:{}

TriggerHappy will attempt to create as many parallel threads as are needed to run everything at once. This might cause some quota problems, so you can set this to some number other than 0. This limits the number of parallel processes to a specific number. When one completes, others will be generated as required.

This is the number of milliseconds to wait between trigger creations. The TriggerBuilder choreography seems to be a little more solid if you wait a bit between creating triggers.

This is a unique script ID to allow multiple scripts to use the same database. Triggers associated with the given script ID will only execute on tasks it is meant to.

The names of the 3 functions that will be triggered to to the reduce, map and process functions.

Splitting up the work

Each job needs to be split into work packages called tasks. These tasks should be able to be run in any order and need to be independent of each other. Here’s an example, that is splitting a task into 5 chunks.

The .saveTask() method allows you pass any parameters you want that will be available to your taskFunction. Note I’m using the .log() method regularly to report progress in the log.

the .triggerTasks() sets off the whole business of scheduling tasks to be mapped.

the taskFunction

This is the map stage. Tasks will be scheduled to run each of the chunks of work. The taskFunction is the one that gets called for each chunk. This is where you would execute the point of your application. In this example, I’m generating various random objects, controlled by values I passed to each chunk when I split the tasks in the first place. Depending on the setting of control.threads, all or some of these tasks will be triggered to run simultaneously. Additional threads will be initiated as required until there are no more tasks needing dealt with.

Note that there are a few mandatory requirements here.

    • create an object with handleCode, handleError, and task properties. Fill the task property with something to do with the .somethingToMap() method.
    • store the result in an array, in the same object
    • signal any errors if necessary
  • signal when complete

the reduceFunction

This is the reduce stage. A reduce will automatically be scheduled if all the mapping tasks of the job are completed. It’s a fairly straightforward process, and your reduce function will almost certainly use the provided .reduce() method although you could do some special processing if you really wanted to. All that happens here is that all the independent results of the mapping tasks are combined into a single result.

although you could do some special things if you needed to.

The processFunction

Once the reduce function has completed, you can now continue and finish the work. In our example, the random objects that we created in each of the chunks have been combined by taskReduce, and now the whole thing can be written to a sheet.

Note that there are a few mandatory requirements here.

    • if there is anything to do, this will return the reduced data.
    • signal that we are done
    • clean up all triggers when done – very important to avoid running out of trigger space

Debugging

As mentioned, debugging is tricky. It’s better to have a function that runs your scripts on a subset of data serially as part of a script before moving on to running by triggers.

Set control.triggers = false, then create a function like the one below. This will run though the mapping of tasks one by one, then the reduce function, then the processing function. Once you have the result you want reliably you can move on to trying it in parallel by setting control.triggers = true;

Logging

a .log() method is provided to allow you to log whatever you want. Various logging is done by default, but you can add your own, for example

Here’s an example of a fragment of a log file – with triggering disabled


now the same thing with triggering enabled


Reporting

It’s sometimes useful to take a look inside the orchestration files. If you’ve used Drive as your database, you can just open them. However, there is a .report() method to give a summary view like this.


Keys and instances

Each task, job and reduction has a unique key. This will help you track down problems if you need to. You’ll also notice and instance id on the logger. Each triggered task also has a unique instance id so you can track its progress in the logger. Note that this is independent of the trigger ID, which is allocated by GAS. This instance id can be used on both triggered and inline operation.

Cleaning up

TriggerHappy does not automatically clean up its files. It may be that the reduce data, or even the individual task data needs to be reused. I’m also considering enabling a rescheduler so that entire jobs can be run multiple times – that would mean that the job files could also be useful. However, if you don’t need any of that, there are pre-baked methods for cleaning up. This function will clear everything.

Summary

This approach is probably not for everyone, but it does exercise a number of interesting ideas such as Using named locks with Google Apps Scripts and triggers and of course the concept of using multiple threads – since it is the cloud after all. I have found that triggers are a little fragile, and that work executed in the context of a trigger executes more slowly that the same task as a regular script. For more like this see Google Apps Scripts snippets

Here’s a substantial example, copying from one database format to another – convertingfromscriptb

The Library Code

For more like this see Google Apps Scripts Snippets
For help and more information join our forum, follow the blog, follow me on Twitter