As the scriptDB deprecation date approaches, you may have decided which database to move to and are making plans for the switch over. The problem is though that if you have many thousands of records you might hit the quota time for running. You could certainly split it up, but as noted in Comparing all back ends performance the most you’ll get out of scriptDB querying is 2000 a minute, and if your back end is parse.com (about the fastest), the most you can write is about 5000 a minute. So if you have more than about 8000 records you’ll probably hit the 6 minute limit. Running a very large conversion in serial could tie you up for a while.

In Parallel processing in Apps Script I showed how you could create a simple map/reduce method to run a number of processes at once. Here’s how you can split up your work and run many threads at the same time. We are going to use Database abstraction with google apps script to read and write to the database, as well as the control and log structures required by Parallel processing in Apps Script.
Parallel processing with unpredictable sequencing is not an easy subject, but once you figure it out it, it may be useful for all sorts of longer running processes.
Libraries
You’ll need these libraries

cTriggerHappy:

MuIOvLUHIRpRlID7V_gEpMqi_d-phDA33

which you can include or fork as you prefer. Another library you need in you application is Database abstraction with google apps script, which is

Mj61W-201_t_zC9fJg1IzYiz3TLx7pV4j

Starting off

You should read Parallel processing in Apps Script for the detail of how this works, but in summary –

  • Your job is split into a number of chunks
  • Each of these chunks is run independently and simultaneously – each of these chunks is called ‘mapping’
  • An addition run called ‘reduce’ is scheduled and run when all the mapping is finished. This will consolidate the results of each of the mappings into a single result.
  • A final run called ‘process’ is scheduled and run. This does any post processing required once all the data is consolidated by the reduce function.
These split, map, reduce and process functions are all pretty much boiler plate. In our example, we want to take a large scriptDB database and convert it to a parse.com data base. Using Database abstraction with google apps script this can normally be accomplished in a few lines of code.

Something like this.

var scriptHandler = new cDataHandler.DataHandler ('random', cDataHandler.dhConstants.DB.SCRIPTDB,  undefined, 'myddb', ScriptDb.getMyDb());
  assert(scriptHandler .isHappy(), 'unable to get scriptdb handler','handler');

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler ( 'random', cDataHandler.dhConstants.DB.PARSE, undefined, 'myParse', JSON.parse(userStore.getProperty("parseKeys")));
  assert(handler.isHappy(), 'unable to get parse handler','handler');

  var result = handler.save ( scriptHandler.query().data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

The problem starts when you hit quota limits on the time to read and write. Parse is much faster than scriptDB (see Comparing all back ends performance), but you can still soon hit the processing time limits doing both. We could split it into bits like this – this would be the easy route.

var result = handler.save ( scriptHandler.query(undefined, {limit:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:2000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:3000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

 ... etc

But we’d need to run each separate and it might take a long time to look after that. Using the boiler plate from Parallel processing in Apps Script, we can do something like this

The split

We’ll set some maximum chunk size that should be run in one go, then use the boiler plate below – this deletes all the current parse.com data, gets the size of the current scriptDB and lines up a number of map jobs to run.

function splitJobIntoTasks () {
   
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // get the scriptdb to see how much work there is
  var handler = getTheScriptDb();

  var result = handler.count();
  assert (result.handleCode >=0, result, 'counting scriptdb');
  var toDo = result.data[0].count;
  var total = toDo;
  var maxChunkSize = 1000;

  tHappy.log (null, 'deleting parse data', 'splitJobIntoTasks');
  var parseHandler = getTheParse();
  var result = parseHandler.remove();
  assert (result.handleCode >= 0, result, 'deleting parse data');
  
  // i'm splitting the work in chunks
  tHappy.log (null, 'starting to split','splitJobIntoTasks');
  tHappy.init ();
  var chunkNumber;
  for  (chunkNumber=0 ; toDo > 0 && maxChunkSize ; chunkNumber++) {
    var n = Math.min(toDo, maxChunkSize);
    tHappy.saveTask ( {
      index:chunkNumber,
      skip:total-toDo,
      limit:n 
    });
    toDo-=n;
  }

  tHappy.log(null, 'finished splitting');
  tHappy.triggerTasks ();
  tHappy.log(null, 'triggering is done','splitJobIntoTasks');
  return chunkNumber;
}

The Map

Some number of these will be scheduled to run simultaneously – each one will run a chunk of work as defined by tHappy.saveTask() during the split phase.

function workMap() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl());
  
  try {
    // your result goes here
    var result = {data:null,handleCode:0,handleError:'',task:tHappy.somethingToMap()}; 
    // first find something to do
    
    
    // if anything to do 
    if (result.task) {
      
      tHappy.log( null, 'starting mapping for job ' +  result.task.jobKey + '/' + result.task.taskIndex +  ' task ' + result.task.key  ,'workMap');
      
      // get the scriptDB
      var handler = getTheScriptDb();
      
      // get the parse.com
      var parseHandler = getTheParse();
      
      // get the data segment
      var sData = handler.query(undefined, {skip:result.task.params.skip, limit:result.task.params.limit});
      assert (sData.handleCode >=0 && sData.data.length === result.task.params.limit, sData, ' getting scriptdb data');
      tHappy.log (null, 'got ' + sData.data.length + ' records from scriptDB starting at ' + result.task.params.skip, 'workMap');
      result.data = [result.task.params.skip,sData.data.length];
      
      // write it to parse
      var pData = parseHandler.save(sData.data);
      assert (pData.handleCode >=0 , result, ' writing parse data');
      tHappy.log (null, 'wrote ' + sData.data.length + ' records to parse starting at ' + result.task.params.skip, 'workMap');
      
      tHappy.finished (result);
      tHappy.log(null, 'finished mapping');
    }
  }
  catch (err) {
    tHappy.log(null, err, 'mapping failure');
    throw err;
  }
  
  return {handleError: result.handleError, handleCode: result.handleCode};

}

The reduction

In this example, there’s nothing much to do, since the map operations were independent of each other. This just consolidates the results and triggers a process. To ensure workflow integrity, a reduce must always follow a set of maps, even though in this case its not doing a lot.

function workReduce () {
     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  try {
    // bring all the results together
    tHappy.log(null, 'starting reduction','workReduce');
    tHappy.reduce();
    tHappy.log(null, 'finishing reduction','workReduce');
  }
  catch (err) {
    tHappy.log(null, err, 'reduction failure');
    throw err;
  }
}

The process

Again, not too much to do. We’ll just check that parse.com ended up with the expected number of records.

function workProcess() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // all is over, we get the reduced data and do something with it.
  try {
    var reduced = tHappy.somethingToProcess ();
    tHappy.log( null, 'starting processing for job ' + (reduced ? JSON.stringify(reduced) : ' - but nothing to do'),'workProcess');
    if (reduced) {
     
      // we'll just report on the parse records
      var parseHandler = getTheParse();
      var pData = parseHandler.count();
      assert (pData.handleCode >=0,pData,'counting parse data');
      
      tHappy.log (null, 'parse contains ' + pData.data[0].count + ' records', 'workProcess');
      
    
      // mark it as processed
      tHappy.processed(reduced);
      
      // we'll use the logger too
      tHappy.log( null, 'finished processing','workProcess');
      
      // clean up any triggers we know we're done
      tHappy.cleanupAllTriggers();
      
    }
  }
  catch (err) {
    tHappy.log(null, err, 'processing failure');
    throw err;
  }
}

Some utilities

I created a couple of functions that are used a number of times here. Note that I’ve specified a siloId (‘random’) in this example. If you are already using Database abstraction with google apps script, then your scriptDB will have this. If not you can just leave it as null and it will take the whole scriptDb. Note also that you’ll need to have set up your parse credentials in your property store (see DriverParse)

function getTheScriptDb() {

  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.SCRIPTDB,      
    undefined,
    'myddb',
    ScriptDb.getMyDb());
  
  assert(handler.isHappy(), 'unable to get scriptdb handler','handler');
  
  return handler;

}
function getTheParse() {

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.PARSE,      
    undefined,
    'myParse',
    JSON.parse(userStore.getProperty("parseKeys")));
  
  assert(handler.isHappy(), 'unable to get parse handler','handler');
  
  return handler;

}

The log

You’ll notice tHappy.log() being called a few times.It’s tough to debug triggered simultaneous tasks, but the .log() can be used to report progress of each of the various threads in a single place. Here’s a typical log. You’ll see that pretty much all 6 mapping operations were running simultaneously and about 15 minutes of processing time was done in about 6 .

when comment triggerId triggerFunc instance
30 July 2014 15:01:24 BST deleting parse data splitJobIntoTasks gyhy8q1zp1
30 July 2014 15:01:24 BST starting to split splitJobIntoTasks gyhy8q1zp1
30 July 2014 15:01:46 BST finished splitting gyhy8q1zp1
30 July 2014 15:01:47 BST triggered 0 5.32857E+18 workMap gyhy8q1zp1
30 July 2014 15:01:50 BST triggered 1 8.87382E+18 workMap gyhy8q1zp1
30 July 2014 15:01:51 BST triggered 2 1.49364E+17 workMap gyhy8q1zp1
30 July 2014 15:01:52 BST triggered 3 2.6416E+18 workMap gyhy8q1zp1
30 July 2014 15:01:53 BST triggered 4 3.33949E+18 workMap gyhy8q1zp1
30 July 2014 15:01:54 BST triggered 5 6.50636E+18 workMap gyhy8q1zp1
30 July 2014 15:01:54 BST triggered 6 tasks from 6 (thread limit 0) triggerTasks gyhy8q1zp1
30 July 2014 15:01:55 BST triggering is done splitJobIntoTasks gyhy8q1zp1
30 July 2014 15:02:03 BST starting mapping for job ujhy8q22zl/0 task gahy8q25i5 workMap ewhy8q2t4t
30 July 2014 15:02:07 BST starting mapping for job ujhy8q22zl/1 task uqhy8q27tj workMap sghy8q2vlq
30 July 2014 15:02:07 BST got 1600 records from scriptDB starting at 0 workMap ewhy8q2t4t
30 July 2014 15:02:12 BST starting mapping for job ujhy8q22zl/2 task hvhy8q2a79 workMap dphy8q2z39
30 July 2014 15:02:13 BST got 1600 records from scriptDB starting at 1600 workMap sghy8q2vlq
30 July 2014 15:02:17 BST got 1600 records from scriptDB starting at 3200 workMap dphy8q2z39
30 July 2014 15:02:17 BST starting mapping for job ujhy8q22zl/3 task ruhy8q2c55 workMap mbhy8q347g
30 July 2014 15:02:22 BST got 1600 records from scriptDB starting at 4800 workMap mbhy8q347g
30 July 2014 15:02:52 BST starting mapping for job ujhy8q22zl/4 task hahy8q2eal workMap qhhy8q3su0
30 July 2014 15:02:59 BST got 1600 records from scriptDB starting at 6400 workMap qhhy8q3su0
30 July 2014 15:04:21 BST wrote 1600 records to parse starting at 0 workMap ewhy8q2t4t
30 July 2014 15:04:32 BST triggered 5 7.6271E+17 workMap ewhy8q2t4t
30 July 2014 15:04:33 BST triggered 1 tasks from 1 (thread limit 0) triggerTasks ewhy8q2t4t
30 July 2014 15:04:34 BST finished mapping ewhy8q2t4t
30 July 2014 15:04:47 BST wrote 1600 records to parse starting at 1600 workMap sghy8q2vlq
30 July 2014 15:04:49 BST wrote 1600 records to parse starting at 3200 workMap dphy8q2z39
30 July 2014 15:04:56 BST wrote 1600 records to parse starting at 4800 workMap mbhy8q347g
30 July 2014 15:05:05 BST wrote 1600 records to parse starting at 6400 workMap qhhy8q3su0
30 July 2014 15:05:19 BST starting mapping for job ujhy8q22zl/5 task fkhy8q2gwe workMap bmhy8q6l55
30 July 2014 15:05:21 BST got 100 records from scriptDB starting at 8000 workMap bmhy8q6l55
30 July 2014 15:05:25 BST wrote 100 records to parse starting at 8000 workMap bmhy8q6l55
30 July 2014 15:05:50 BST finished mapping dphy8q2z39
30 July 2014 15:05:52 BST finished mapping sghy8q2vlq
30 July 2014 15:05:53 BST finished mapping mbhy8q347g
30 July 2014 15:06:05 BST triggered 0 2.8631E+18 workReduce bmhy8q6l55
30 July 2014 15:06:06 BST finished mapping bmhy8q6l55
30 July 2014 15:06:21 BST starting reduction workReduce cnhy8q8ew2
30 July 2014 15:06:45 BST triggered 0 9.03126E+18 workProcess cnhy8q8ew2
30 July 2014 15:06:47 BST finishing reduction workReduce cnhy8q8ew2
30 July 2014 15:07:25 BST starting processing for job {“key”:”fshy8q8lgc”,”jobKey”:”ujhy8q22zl”,”result”:[0,1600,1600,1600,3200,1600,4800,1600,6400,1600,8000,100],”start”:1406729189532,”finish”:1406729189532,”handleCode”:0,”handleError”:””} workProcess cshy8q9e99
30 July 2014 15:07:27 BST parse contains 8100 records workProcess cshy8q9e99
30 July 2014 15:07:34 BST finished processing >workProcess cshy8q9e99

The control

This can use any of the Database abstraction with google apps script supported backends for controls files and logs, but I prefer using Drive for control files and sheets for the log file. You can set all that up here. You’ll need to create and specify drive folders and a spreadsheet to use for logging if you are using the defaults below.

function getControl () {
  return {
    script: {
      id: "1i5qxNh1qs_-b6hKRMY_W1MR4nnwuyhHQG6Y9tSINrgyqFQW6qH-GMP__",
      reduceFunction: 'workReduce',
      taskFunction:'workMap',
      processFunction:'workProcess'
    },
    taskAccess: {
      siloId:  'migrateTasks.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    logAccess: {
      siloId:  'thappylog',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    reductionAccess: {
      siloId:  'migrateReductions.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    jobAccess: {
      siloId:  'migrateJobs.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    reportAccess: {
      siloId:  'thappyreport',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    triggers: true,
    delay:5000,
    enableLogging:true,
    threads:0,
    stagger:1000, 
    killStage:false,
    debugGrabbing:false
  };

The report

Getting a report of the orchestration database or drive files can sometime be helpful. You can get one like this, and it will end up wherever you’ve set control.reportAccess to save to.

function report () {
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  tHappy.report();
}

Orchestration

One of the biggest challenges of all this was the locking mechanism between multiple instances of the same script. I’m using the library for Using named locks with Google Apps Scripts for all locking. Whether or not to log debugging of locking attempts can be switched on and off with the control.debugging property. It’s used like this. Take a look at this log output if things don’t seem to be going as expected.

function grabControl(who, f) {
  
      if (control_.debugGrabbing) {
        self.log(null,who,'trying to grab ' + new Date().getTime());
      }
      var info;
      var r = new cNamedLock.NamedLock(undefined,undefined,true).setKey('thappy' , control_.jobAccess).protect(who+'_'+instance_, function (lock) {
        if (control_.debugGrabbing) {
          info = ':lock-' + JSON.stringify(lock.getInfo());
          self.log(null,who,'grabbed ' + new Date().getTime() + info);
        }
        return f();
      }).result;
      
      if (control_.debugGrabbing) {
        self.log(null,who,'ungrabbed ' + new Date().getTime() + info);
      }  
      return r;
  }

You can see the kind of output you’ll get below, where you can follow the game between each task as it tries to control who does what next. Note you should use the timestamp to examine the log – since it is also protected by locking – and it’s not guaranteed that locks are distributed int the order they are asked for, since deadlocks are avoided by using a random retry interval.

30 July 2014 14:17:26 BST triggerTasks grabbed 1406726246451:lock-{“id”:”zry8ohifj”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726246378,”who”:”triggerTasks_sphy8ogu96″,”expires”:1406726336378} sphy8ogu96
30 July 2014 14:17:28 BST triggered 0 8.3724E+18 workMap sphy8ogu96
30 July 2014 14:17:30 BST triggered 1 5.64626E+18 workMap sphy8ogu96
30 July 2014 14:17:32 BST triggered 2 1.39194E+17 workMap sphy8ogu96
30 July 2014 14:17:34 BST triggered 3 8.84058E+18 workMap sphy8ogu96
30 July 2014 14:17:36 BST triggered 4 4.85608E+18 workMap sphy8ogu96
30 July 2014 14:17:37 BST somethingtomap trying to grab 1406726257242 xphy8ohqvu
30 July 2014 14:17:37 BST triggered 5 1.16435E+17 workMap sphy8ogu96
30 July 2014 14:17:38 BST triggered 6 tasks from 6 (thread limit 8) triggerTasks sphy8ogu96
30 July 2014 14:17:38 BST somethingtomap trying to grab 1406726258689 kdhy8ohs01
30 July 2014 14:17:39 BST somethingtomap grabbed 1406726259858:lock-{“id”:”wwy8ohsj1″,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726259720,”who”:”somethingtomap_kdhy8ohs01″,”expires”:1406726349720} kdhy8ohs01
30 July 2014 14:17:38 BST triggerTasks ungrabbed 1406726258959:lock-{“id”:”zry8ohifj”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726246378,”who”:”triggerTasks_sphy8ogu96″,”expires”:1406726336378} sphy8ogu96
30 July 2014 14:17:41 BST triggering is done splitJobIntoTasks sphy8ogu96
30 July 2014 14:17:43 BST somethingtomap ungrabbed 1406726263004:lock-{“id”:”wwy8ohsj1″,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726259720,”who”:”somethingtomap_kdhy8ohs01″,”expires”:1406726349720} kdhy8ohs01
30 July 2014 14:17:43 BST starting mapping for job afhy8ogwfv/0 task hfhy8ogzkv workMap kdhy8ohs01
30 July 2014 14:17:45 BST got 1000 records from scriptDB starting at 0 workMap kdhy8ohs01
30 July 2014 14:18:18 BST wrote 1000 records to parse starting at 0 workMap kdhy8ohs01
30 July 2014 14:18:19 BST TriggerHappy.finished trying to grab 1406726299207 kdhy8ohs01
30 July 2014 14:18:20 BST TriggerHappy.finished grabbed 1406726300263:lock-{“id”:”ufy8oinqo”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726300130,”who”:”TriggerHappy.finished_kdhy8ohs01″,”expires”:1406726390130} kdhy8ohs01
30 July 2014 14:18:27 BST TriggerHappy.finished ungrabbed 1406726307820:lock-{“id”:”ufy8oinqo”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726300130,”who”:”TriggerHappy.finished_kdhy8ohs01″,”expires”:1406726390130} kdhy8ohs01
For more on this see Parallel processing in Apps Script and Database abstraction with google apps script
For help and more information join our forum, follow the blog or follow me on Twitter