There are other ways to approach parallel running, such as taking advantage of the client processing capability. If you are able to do that, then you should take a look at Running things in parallel using HTML service. The method in this post uses timed triggers which are not very controllable.
Latest news - There's a new and improved version of this over on Orchestration of Apps Scripts - parallel threads and defeating quotas I figured that if I implemented a rudimentary Map/Reduce capability that could split a meaty task into multiple chunks, run them all at the same time on separate threads, then bring the result together for final processing, then I could achieve these two goals. The TriggerBuilder service is key to this, but it's rather difficult to control execution. Specifically this innocent looking sentence taken from the documentation. Specifies the duration (in milliseconds) after the current time that the trigger will run. (plus or minus 15 minutes). Plus or minus 15 minutes… (why specify in milliseconds?) In any case, let's press on and see what we have here. Here's a primer for a way to orchestrate parallel tasks, or maybe you should take a look at Parallel process orchestration with HtmlService, which is another way of busting the 6 minute limit. Less complex and more reliable, but less geekworthy. LibrariesI provide a library (cTriggerHappy) - MuIOvLUHIRpRlID7V_gEpMqi_d-phDA33, which you can include or fork as you prefer. Another library you need in you application is Database abstraction with google apps script, which is Mj61W-201_t_zC9fJg1IzYiz3TLx7pV4j. If you are forking the library, it needs Database abstraction with google apps script and Using named locks with Google Apps Scripts. How to set upThis is fairly extreme scripting, so it's a little complex. You should first take a look at the primer slides and start with a copy of an example application. The control object.This is used to manage orchestration, and specifies various things including some setup so you can use Database abstraction with google apps script. Although you could probably use any of the supported back end databases, I recommend Google Drive for the data, and a spreadsheet for logging and reporting. Here's an example control function, which you should tailor to your own environment. There are 5 data types each of which could be held in independent data stores if required. // this is the orchestration package for a piece of work that will be split into tasks // it describes where to store itself, and keeps track of all the chunks // it can be stored in any of the back end databases described in http://ramblings.mcpher.com/Home/excelquirks/dbabstraction // this example is using google drive // this identifies this scripts and the functions it will run function getControl () { return { script: { id: "1A7lJCKs1KFlj20fBqXjQFne0IhWV0ZpKcYrsYulwxvu__rSZBFnIJPwJ", reduceFunction: 'workReduce', taskFunction:'workMap', processFunction:'workProcess' }, taskAccess: { siloId: 'tasks.json', db: cDataHandler.dhConstants.DB.DRIVE, driverSpecific: '/datahandler/driverdrive/tasks', driverOb: null }, logAccess: { siloId: 'thappylog', db: cDataHandler.dhConstants.DB.SHEET, driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY', driverOb: null }, reductionAccess: { siloId: 'reductions.json', db: cDataHandler.dhConstants.DB.DRIVE, driverSpecific: '/datahandler/driverdrive/tasks', driverOb: null }, jobAccess: { siloId: 'jobs.json', db: cDataHandler.dhConstants.DB.DRIVE, driverSpecific: '/datahandler/driverdrive/tasks', driverOb: null }, reportAccess: { siloId: 'thappyreport', db: cDataHandler.dhConstants.DB.SHEET, driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY', driverOb: null }, triggers: true, delay:5000, enableLogging:true, threads:0, stagger:1000 }; } Other control parameters.triggers:true For testing, you should run with this false, then change to true when everything looks good.. This will cause no triggers to be generated, but will instead allow the process to be run sequentially in line delay:5000 This is the number of milliseconds to wait between trigger creations and execution. The TriggerBuilder choreography seems to be a little more solid if you wait a bit before starting execution of a trigger enableLogging:true Debugging can be tricky with detached processes. This allows logging material to be written to the store described in logAccess:{} threads:0 TriggerHappy will attempt to create as many parallel threads as are needed to run everything at once. This might cause some quota problems, so you can set this to some number other than 0. This limits the number of parallel processes to a specific number. When one completes, others will be generated as required.
stagger:1000 This is the number of milliseconds to wait between trigger creations. The TriggerBuilder choreography seems to be a little more solid if you wait a bit between creating triggers. script.id: "1A7lJCKs1KFlj20fBqXjQFne0IhWV0ZpKcYrsYulwxvu__rSZBFnIJPwJ"This is a unique script ID to allow multiple scripts to use the same database. Triggers associated with the given script ID will only execute on tasks it is meant to. script.reduceFunction, taskFunction, processFunction The names of the 3 functions that will be triggered to to the reduce, map and process functions. Splitting up the workEach job needs to be split into work packages called tasks. These tasks should be able to be run in any order and need to be independent of each other. Here's an example, that is splitting a task into 5 chunks. function splitJobIntoTasks () { // need this for each function that might be triggered var tHappy = new cTriggerHappy.TriggerHappy (getControl()); // i'm splitting the work in chunks tHappy.log(null, 'starting to split','splitJobIntoTasks'); tHappy.init (); var nChunks = 5; for (var i=0; i < nChunks ; i++ ) { // this is results package for each task chunk and where to store itself // change this to the storage of your choice, and add any parameters you need to the parameters object tHappy.saveTask ( {index:i, something:'some user values', numObs:tHappy.randBetween(20,100)}); } // launch everything tHappy.log(null, 'finished splitting'); tHappy.triggerTasks (); tHappy.log(null, 'triggering is done','splitJobIntoTasks'); return nChunks; } The .saveTask() method allows you pass any parameters you want that will be available to your taskFunction. Note I'm using the .log() method regularly to report progress in the log. the .triggerTasks() sets off the whole business of scheduling tasks to be mapped. the taskFunctionThis is the map stage. Tasks will be scheduled to run each of the chunks of work. The taskFunction is the one that gets called for each chunk. This is where you would execute the point of your application. In this example, I'm generating various random objects, controlled by values I passed to each chunk when I split the tasks in the first place. Depending on the setting of control.threads, all or some of these tasks will be triggered to run simultaneously. Additional threads will be initiated as required until there are no more tasks needing dealt with. Note that there are a few mandatory requirements here.
result.data = obs;
result.handleError = err;
function workMap() { // need this for each function that might be triggered var tHappy = new cTriggerHappy.TriggerHappy (getControl()); // your result goes here var result = {data:null,handleCode:0,handleError:'',task:tHappy.somethingToMap()}; // first find something to do // if anything to do if (result.task) { tHappy.log( null, 'starting mapping for job ' + result.task.jobKey + '/' + result.task.taskIndex + ' task ' + result.task.key ,'workMap'); var ob = generateRandomObject(10); var obs= []; try { // this is the work - for illustration use the params for (var i=0;i < result.task.params.numObs;i++) { obs.push(generateRandomValues(ob)); } // store the result and status result.data = obs; } catch(err) { // store the error result.handleCode = TASK_STATUS.FAILED; result.handleError = err; tHappy.log (null,err,'workMap'); throw(err); } // update task status tHappy.finished (result); tHappy.log(null, ' finished mapping'); } return {handleError: result.handleError, handleCode: result.handleCode}; function generateRandomObject (n) { var ob = {}; for (var i=0;i<n;i++){ ob['x'+i] = null; } return ob; } function generateRandomValues (ob) { return Object.keys(ob).reduce(function(p,c) { p[c] = tHappy.arbitraryString(tHappy.randBetween(5,20)); return p; },{}); } } the reduceFunctionThis is the reduce stage. A reduce will automatically be scheduled if all the mapping tasks of the job are completed. It's a fairly straightforward process, and your reduce function will almost certainly use the provided .reduce() method although you could do some special processing if you really wanted to. All that happens here is that all the independent results of the mapping tasks are combined into a single result. although you could do some special things if you needed to. function workReduce () { // need this for each function that might be triggered var tHappy = new cTriggerHappy.TriggerHappy (getControl()); // bring all the results together tHappy.log(null, 'starting reduction','workReduce'); tHappy.reduce(); tHappy.log(null, 'finishing reduction','workReduce'); } the processFunctionOnce the reduce function has completed, you can now continue and finish the work. In our example, the random objects that we created in each of the chunks have been combined by taskReduce, and now the whole thing can be written to a sheet. Note that there are a few mandatory requirements here.
var reduced = tHappy.somethingToProcess ();
function workProcess() { // need this for each function that might be triggered var tHappy = new cTriggerHappy.TriggerHappy (getControl()); // all is over, we get the reduced data and do something with it. var reduced = tHappy.somethingToProcess (); tHappy.log( null, 'starting processing for job ' + (reduced ? JSON.stringify(reduced) : ' - but nothing to do'),'workProcess'); if (reduced) { // do something with the data - for this example we're going to copy it to a spreadsheet var sheetHandler = new cDataHandler.DataHandler ( 'thappytest', cDataHandler.dhConstants.DB.SHEET, undefined, '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY'); if (!sheetHandler.isHappy()) { throw ('failed to get handler for sheet processing'); } // delete current sheet tHappy.handledOk(sheetHandler.remove()); // add new data tHappy.handledOk(sheetHandler.save(reduced.result)); // mark it as processed tHappy.processed(reduced); // we'll use the logger too tHappy.log( null, 'finished processing','workProcess'); // clean up any triggers we know we're done tHappy.cleanupAllTriggers(); } } DebuggingAs mentioned, debugging is tricky. It's better to have a function that runs your scripts on a subset of data serially as part of a script before moving on to running by triggers. Set control.triggers = false , then create a function like the one below. This will run though the mapping of tasks one by one, then the reduce function, then the processing function. Once you have the result you want reliably you can move on to trying it in parallel by setting control.triggers = true ;function endTest () { // divide up the work var control = getControl(); var n = splitJobIntoTasks(); if (control.triggers) { } else { // this is just a direct test end to end test, - no triggers // do a couple of tasks for (var i=0; i < n; i++) { workMap(); } // reduce workReduce(); // do something with the result workProcess(); } } Logginga .log() method is provided to allow you to log whatever you want. Various logging is done by default, but you can add your own, for example tHappy.log(null, 'finishing reduction','workReduce'); Here's an example of a fragment of a log file - with triggering disabled now the same thing with triggering enabled ReportingIt's sometimes useful to take a look inside the orchestration files. If you've used Drive as your database, you can just open them. However, there is a .report() method to give a summary view like this. function report () { // need this for each function that might be triggered new cTriggerHappy.TriggerHappy (getControl()).report(); } Keys and instancesEach task, job and reduction has a unique key. This will help you track down problems if you need to. You'll also notice and instance id on the logger. Each triggered task also has a unique instance id so you can track its progress in the logger. Note that this is independent of the trigger ID, which is allocated by GAS. This instance id can be used on both triggered and inline operation. Cleaning upTriggerHappy does not automatically clean up its files. It may be that the reduce data, or even the individual task data needs to be reused. I'm also considering enabling a rescheduler so that entire jobs can be run multiple times - that would mean that the job files could also be useful. However, if you don't need any of that, there are pre-baked methods for cleaning up. This function will clear everything.
|
Services > Desktop Liberation - the definitive resource for Google Apps Script and Microsoft Office automation >