Converting from scriptb to parse.com - multiple simultaneous threads

As the scriptDB deprecation date approaches, you may have decided which database to move to and are making plans for the switch over. The problem is though that if you have many thousands of records you might hit the quota time for running. You could certainly split it up, but as noted in Comparing all back ends performance the most you’ll get out of scriptDB querying is 2000 a minute, and if your back end is parse.com (about the fastest), the most you can write is about 5000 a minute. So if you have more than about 8000 records you’ll probably hit the 6 minute limit. Running a very large conversion in serial could tie you up for a while.

In Parallel processing in Apps Script I showed how you could create a simple map/reduce method to run a number of processes at once. Here’s how you can split up your work and run many threads at the same time. We are going to use Database abstraction with google apps script to read and write to the database, as well as the control and log structures required by Parallel processing in Apps Script.

Parallel processing with unpredictable sequencing is not an easy subject, but once you figure it out it, it may be useful for all sorts of longer running processes.

Libraries
You’ll need these libraries

cTriggerHappy:

MuIOvLUHIRpRlID7V_gEpMqi_d-phDA33

which you can include or fork as you prefer. Another library you need in you application is Database abstraction with google apps script, which is

Mj61W-201_t_zC9fJg1IzYiz3TLx7pV4j

Page Content hide

1 Starting off

2 Something like this.

Starting off

You should read Parallel processing in Apps Script for the detail of how this works, but in summary –

Your job is split into a number of chunks
Each of these chunks is run independently and simultaneously – each of these chunks is called ‘mapping’
An addition run called ‘reduce’ is scheduled and run when all the mapping is finished. This will consolidate the results of each of the mappings into a single result.
A final run called ‘process’ is scheduled and run. This does any post processing required once all the data is consolidated by the reduce function.

These split, map, reduce and process functions are all pretty much boiler plate. In our example, we want to take a large scriptDB database and convert it to a parse.com data base. Using Database abstraction with google apps script this can normally be accomplished in a few lines of code.

Something like this.

var scriptHandler = new cDataHandler.DataHandler ('random', cDataHandler.dhConstants.DB.SCRIPTDB,  undefined, 'myddb', ScriptDb.getMyDb());
  assert(scriptHandler .isHappy(), 'unable to get scriptdb handler','handler');

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler ( 'random', cDataHandler.dhConstants.DB.PARSE, undefined, 'myParse', JSON.parse(userStore.getProperty("parseKeys")));
  assert(handler.isHappy(), 'unable to get parse handler','handler');

  var result = handler.save ( scriptHandler.query().data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

The problem starts when you hit quota limits on the time to read and write. Parse is much faster than scriptDB (see Comparing all back ends performance), but you can still soon hit the processing time limits doing both. We could split it into bits like this – this would be the easy route.

var result = handler.save ( scriptHandler.query(undefined, {limit:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:2000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:3000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

 ... etc

But we’d need to run each separate and it might take a long time to look after that. Using the boiler plate from Parallel processing in Apps Script, we can do something like this

The split

We’ll set some maximum chunk size that should be run in one go, then use the boiler plate below – this deletes all the current parse.com data, gets the size of the current scriptDB and lines up a number of map jobs to run.

function splitJobIntoTasks () {
   
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // get the scriptdb to see how much work there is
  var handler = getTheScriptDb();

  var result = handler.count();
  assert (result.handleCode >=0, result, 'counting scriptdb');
  var toDo = result.data[0].count;
  var total = toDo;
  var maxChunkSize = 1000;

  tHappy.log (null, 'deleting parse data', 'splitJobIntoTasks');
  var parseHandler = getTheParse();
  var result = parseHandler.remove();
  assert (result.handleCode >= 0, result, 'deleting parse data');
  
  // i'm splitting the work in chunks
  tHappy.log (null, 'starting to split','splitJobIntoTasks');
  tHappy.init ();
  var chunkNumber;
  for  (chunkNumber=0 ; toDo > 0 && maxChunkSize ; chunkNumber++) {
    var n = Math.min(toDo, maxChunkSize);
    tHappy.saveTask ( {
      index:chunkNumber,
      skip:total-toDo,
      limit:n 
    });
    toDo-=n;
  }

  tHappy.log(null, 'finished splitting');
  tHappy.triggerTasks ();
  tHappy.log(null, 'triggering is done','splitJobIntoTasks');
  return chunkNumber;
}

The Map

Some number of these will be scheduled to run simultaneously – each one will run a chunk of work as defined by tHappy.saveTask() during the split phase.

function workMap() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl());
  
  try {
    // your result goes here
    var result = {data:null,handleCode:0,handleError:'',task:tHappy.somethingToMap()}; 
    // first find something to do
    
    
    // if anything to do 
    if (result.task) {
      
      tHappy.log( null, 'starting mapping for job ' +  result.task.jobKey + '/' + result.task.taskIndex +  ' task ' + result.task.key  ,'workMap');
      
      // get the scriptDB
      var handler = getTheScriptDb();
      
      // get the parse.com
      var parseHandler = getTheParse();
      
      // get the data segment
      var sData = handler.query(undefined, {skip:result.task.params.skip, limit:result.task.params.limit});
      assert (sData.handleCode >=0 && sData.data.length === result.task.params.limit, sData, ' getting scriptdb data');
      tHappy.log (null, 'got ' + sData.data.length + ' records from scriptDB starting at ' + result.task.params.skip, 'workMap');
      result.data = [result.task.params.skip,sData.data.length];
      
      // write it to parse
      var pData = parseHandler.save(sData.data);
      assert (pData.handleCode >=0 , result, ' writing parse data');
      tHappy.log (null, 'wrote ' + sData.data.length + ' records to parse starting at ' + result.task.params.skip, 'workMap');
      
      tHappy.finished (result);
      tHappy.log(null, 'finished mapping');
    }
  }
  catch (err) {
    tHappy.log(null, err, 'mapping failure');
    throw err;
  }
  
  return {handleError: result.handleError, handleCode: result.handleCode};

}

The reduction

In this example, there’s nothing much to do, since the map operations were independent of each other. This just consolidates the results and triggers a process. To ensure workflow integrity, a reduce must always follow a set of maps, even though in this case its not doing a lot.

function workReduce () {
     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  try {
    // bring all the results together
    tHappy.log(null, 'starting reduction','workReduce');
    tHappy.reduce();
    tHappy.log(null, 'finishing reduction','workReduce');
  }
  catch (err) {
    tHappy.log(null, err, 'reduction failure');
    throw err;
  }
}

The process

Again, not too much to do. We’ll just check that parse.com ended up with the expected number of records.

function workProcess() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // all is over, we get the reduced data and do something with it.
  try {
    var reduced = tHappy.somethingToProcess ();
    tHappy.log( null, 'starting processing for job ' + (reduced ? JSON.stringify(reduced) : ' - but nothing to do'),'workProcess');
    if (reduced) {
     
      // we'll just report on the parse records
      var parseHandler = getTheParse();
      var pData = parseHandler.count();
      assert (pData.handleCode >=0,pData,'counting parse data');
      
      tHappy.log (null, 'parse contains ' + pData.data[0].count + ' records', 'workProcess');
      
    
      // mark it as processed
      tHappy.processed(reduced);
      
      // we'll use the logger too
      tHappy.log( null, 'finished processing','workProcess');
      
      // clean up any triggers we know we're done
      tHappy.cleanupAllTriggers();
      
    }
  }
  catch (err) {
    tHappy.log(null, err, 'processing failure');
    throw err;
  }
}

Some utilities

I created a couple of functions that are used a number of times here. Note that I’ve specified a siloId (‘random’) in this example. If you are already using Database abstraction with google apps script, then your scriptDB will have this. If not you can just leave it as null and it will take the whole scriptDb. Note also that you’ll need to have set up your parse credentials in your property store (see DriverParse)

function getTheScriptDb() {

  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.SCRIPTDB,      
    undefined,
    'myddb',
    ScriptDb.getMyDb());
  
  assert(handler.isHappy(), 'unable to get scriptdb handler','handler');
  
  return handler;

}
function getTheParse() {

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.PARSE,      
    undefined,
    'myParse',
    JSON.parse(userStore.getProperty("parseKeys")));
  
  assert(handler.isHappy(), 'unable to get parse handler','handler');
  
  return handler;

}

The log

You’ll notice tHappy.log() being called a few times.It’s tough to debug triggered simultaneous tasks, but the .log() can be used to report progress of each of the various threads in a single place. Here’s a typical log. You’ll see that pretty much all 6 mapping operations were running simultaneously and about 15 minutes of processing time was done in about 6 .

when	comment	triggerId	triggerFunc	instance
30 July 2014 15:01:24 BST	deleting parse data		splitJobIntoTasks	gyhy8q1zp1
30 July 2014 15:01:24 BST	starting to split		splitJobIntoTasks	gyhy8q1zp1
30 July 2014 15:01:46 BST	finished splitting			gyhy8q1zp1
30 July 2014 15:01:47 BST	triggered 0	5.32857E+18	workMap	gyhy8q1zp1
30 July 2014 15:01:50 BST	triggered 1	8.87382E+18	workMap	gyhy8q1zp1
30 July 2014 15:01:51 BST	triggered 2	1.49364E+17	workMap	gyhy8q1zp1
30 July 2014 15:01:52 BST	triggered 3	2.6416E+18	workMap	gyhy8q1zp1
30 July 2014 15:01:53 BST	triggered 4	3.33949E+18	workMap	gyhy8q1zp1
30 July 2014 15:01:54 BST	triggered 5	6.50636E+18	workMap	gyhy8q1zp1
30 July 2014 15:01:54 BST	triggered 6 tasks from 6 (thread limit 0)		triggerTasks	gyhy8q1zp1
30 July 2014 15:01:55 BST	triggering is done		splitJobIntoTasks	gyhy8q1zp1
30 July 2014 15:02:03 BST	starting mapping for job ujhy8q22zl/0 task gahy8q25i5		workMap	ewhy8q2t4t
30 July 2014 15:02:07 BST	starting mapping for job ujhy8q22zl/1 task uqhy8q27tj		workMap	sghy8q2vlq
30 July 2014 15:02:07 BST	got 1600 records from scriptDB starting at 0		workMap	ewhy8q2t4t
30 July 2014 15:02:12 BST	starting mapping for job ujhy8q22zl/2 task hvhy8q2a79		workMap	dphy8q2z39
30 July 2014 15:02:13 BST	got 1600 records from scriptDB starting at 1600		workMap	sghy8q2vlq
30 July 2014 15:02:17 BST	got 1600 records from scriptDB starting at 3200		workMap	dphy8q2z39
30 July 2014 15:02:17 BST	starting mapping for job ujhy8q22zl/3 task ruhy8q2c55		workMap	mbhy8q347g
30 July 2014 15:02:22 BST	got 1600 records from scriptDB starting at 4800		workMap	mbhy8q347g
30 July 2014 15:02:52 BST	starting mapping for job ujhy8q22zl/4 task hahy8q2eal		workMap	qhhy8q3su0
30 July 2014 15:02:59 BST	got 1600 records from scriptDB starting at 6400		workMap	qhhy8q3su0
30 July 2014 15:04:21 BST	wrote 1600 records to parse starting at 0		workMap	ewhy8q2t4t
30 July 2014 15:04:32 BST	triggered 5	7.6271E+17	workMap	ewhy8q2t4t
30 July 2014 15:04:33 BST		triggered 1 tasks from 1 (thread limit 0)		triggerTasks	ewhy8q2t4t
30 July 2014 15:04:34 BST	finished mapping			ewhy8q2t4t
30 July 2014 15:04:47 BST	wrote 1600 records to parse starting at 1600		workMap	sghy8q2vlq
30 July 2014 15:04:49 BST	wrote 1600 records to parse starting at 3200		workMap	dphy8q2z39
30 July 2014 15:04:56 BST	wrote 1600 records to parse starting at 4800		workMap	mbhy8q347g
30 July 2014 15:05:05 BST	wrote 1600 records to parse starting at 6400		workMap	qhhy8q3su0
30 July 2014 15:05:19 BST	starting mapping for job ujhy8q22zl/5 task fkhy8q2gwe		workMap	bmhy8q6l55
30 July 2014 15:05:21 BST	got 100 records from scriptDB starting at 8000		workMap	bmhy8q6l55
30 July 2014 15:05:25 BST	wrote 100 records to parse starting at 8000		workMap	bmhy8q6l55
30 July 2014 15:05:50 BST	finished mapping			dphy8q2z39
30 July 2014 15:05:52 BST	finished mapping			sghy8q2vlq
30 July 2014 15:05:53 BST	finished mapping			mbhy8q347g
30 July 2014 15:06:05 BST	triggered 0	2.8631E+18	workReduce	bmhy8q6l55
30 July 2014 15:06:06 BST	finished mapping			bmhy8q6l55
30 July 2014 15:06:21 BST	starting reduction		workReduce	cnhy8q8ew2
30 July 2014 15:06:45 BST	triggered 0	9.03126E+18	workProcess	cnhy8q8ew2
30 July 2014 15:06:47 BST	finishing reduction		workReduce	cnhy8q8ew2
30 July 2014 15:07:25 BST	starting processing for job {“key”:”fshy8q8lgc”,”jobKey”:”ujhy8q22zl”,”result”:[0,1600,1600,1600,3200,1600,4800,1600,6400,1600,8000,100],”start”:1406729189532,”finish”:1406729189532,”handleCode”:0,”handleError”:””}		workProcess	cshy8q9e99
30 July 2014 15:07:27 BST	parse contains 8100 records		workProcess	cshy8q9e99
30 July 2014 15:07:34 BST	finished processing		>workProcess	cshy8q9e99

The control

This can use any of the Database abstraction with google apps script supported backends for controls files and logs, but I prefer using Drive for control files and sheets for the log file. You can set all that up here. You’ll need to create and specify drive folders and a spreadsheet to use for logging if you are using the defaults below.

function getControl () {
  return {
    script: {
      id: "1i5qxNh1qs_-b6hKRMY_W1MR4nnwuyhHQG6Y9tSINrgyqFQW6qH-GMP__",
      reduceFunction: 'workReduce',
      taskFunction:'workMap',
      processFunction:'workProcess'
    },
    taskAccess: {
      siloId:  'migrateTasks.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    logAccess: {
      siloId:  'thappylog',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    reductionAccess: {
      siloId:  'migrateReductions.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    jobAccess: {
      siloId:  'migrateJobs.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    reportAccess: {
      siloId:  'thappyreport',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    triggers: true,
    delay:5000,
    enableLogging:true,
    threads:0,
    stagger:1000, 
    killStage:false,
    debugGrabbing:false
  };

The report

Getting a report of the orchestration database or drive files can sometime be helpful. You can get one like this, and it will end up wherever you’ve set control.reportAccess to save to.

function report () {
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  tHappy.report();
}

Orchestration

One of the biggest challenges of all this was the locking mechanism between multiple instances of the same script. I’m using the library for Using named locks with Google Apps Scripts for all locking. Whether or not to log debugging of locking attempts can be switched on and off with the control.debugging property. It’s used like this. Take a look at this log output if things don’t seem to be going as expected.

function grabControl(who, f) {
  
      if (control_.debugGrabbing) {
        self.log(null,who,'trying to grab ' + new Date().getTime());
      }
      var info;
      var r = new cNamedLock.NamedLock(undefined,undefined,true).setKey('thappy' , control_.jobAccess).protect(who+'_'+instance_, function (lock) {
        if (control_.debugGrabbing) {
          info = ':lock-' + JSON.stringify(lock.getInfo());
          self.log(null,who,'grabbed ' + new Date().getTime() + info);
        }
        return f();
      }).result;
      
      if (control_.debugGrabbing) {
        self.log(null,who,'ungrabbed ' + new Date().getTime() + info);
      }  
      return r;
  }

You can see the kind of output you’ll get below, where you can follow the game between each task as it tries to control who does what next. Note you should use the timestamp to examine the log – since it is also protected by locking – and it’s not guaranteed that locks are distributed int the order they are asked for, since deadlocks are avoided by using a random retry interval.

30 July 2014 14:17:26 BST	triggerTasks		grabbed 1406726246451:lock-{“id”:”zry8ohifj”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726246378,”who”:”triggerTasks_sphy8ogu96″,”expires”:1406726336378}	sphy8ogu96
30 July 2014 14:17:28 BST	triggered 0	8.3724E+18	workMap	sphy8ogu96
30 July 2014 14:17:30 BST	triggered 1	5.64626E+18	workMap	sphy8ogu96
30 July 2014 14:17:32 BST	triggered 2	1.39194E+17	workMap	sphy8ogu96
30 July 2014 14:17:34 BST	triggered 3	8.84058E+18	workMap	sphy8ogu96
30 July 2014 14:17:36 BST	triggered 4	4.85608E+18	workMap	sphy8ogu96
30 July 2014 14:17:37 BST	somethingtomap		trying to grab 1406726257242	xphy8ohqvu
30 July 2014 14:17:37 BST	triggered 5	1.16435E+17	workMap	sphy8ogu96
30 July 2014 14:17:38 BST	triggered 6 tasks from 6 (thread limit 8)		triggerTasks	sphy8ogu96
30 July 2014 14:17:38 BST	somethingtomap		trying to grab 1406726258689	kdhy8ohs01
30 July 2014 14:17:39 BST	somethingtomap		grabbed 1406726259858:lock-{“id”:”wwy8ohsj1″,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726259720,”who”:”somethingtomap_kdhy8ohs01″,”expires”:1406726349720}	kdhy8ohs01
30 July 2014 14:17:38 BST	triggerTasks		ungrabbed 1406726258959:lock-{“id”:”zry8ohifj”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726246378,”who”:”triggerTasks_sphy8ogu96″,”expires”:1406726336378}	sphy8ogu96
30 July 2014 14:17:41 BST	triggering is done		splitJobIntoTasks	sphy8ogu96
30 July 2014 14:17:43 BST	somethingtomap		ungrabbed 1406726263004:lock-{“id”:”wwy8ohsj1″,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726259720,”who”:”somethingtomap_kdhy8ohs01″,”expires”:1406726349720}	kdhy8ohs01
30 July 2014 14:17:43 BST	starting mapping for job afhy8ogwfv/0 task hfhy8ogzkv		workMap	kdhy8ohs01
30 July 2014 14:17:45 BST	got 1000 records from scriptDB starting at 0		workMap	kdhy8ohs01
30 July 2014 14:18:18 BST	wrote 1000 records to parse starting at 0		workMap	kdhy8ohs01
30 July 2014 14:18:19 BST	TriggerHappy.finished		trying to grab 1406726299207	kdhy8ohs01
30 July 2014 14:18:20 BST	TriggerHappy.finished		grabbed 1406726300263:lock-{“id”:”ufy8oinqo”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726300130,”who”:”TriggerHappy.finished_kdhy8ohs01″,”expires”:1406726390130}	kdhy8ohs01
30 July 2014 14:18:27 BST	TriggerHappy.finished		ungrabbed 1406726307820:lock-{“id”:”ufy8oinqo”,”key”:”y9DckumDse6i72+k381bjg==”,”when”:1406726300130,”who”:”TriggerHappy.finished_kdhy8ohs01″,”expires”:1406726390130}	kdhy8ohs01