Converting from scriptb to parse.com - multiple simultaneous threads

As the scriptDB deprecation date approaches, you may have decided which database to move to and are making plans for the switch over. The problem is though that if you have many thousands of records you might hit the quota time for running. You could certainly split it up, but as noted in Comparing all back ends performance the most you'll get out of scriptDB querying is 2000 a minute, and if your back end is parse.com (about the fastest), the most you can write is about 5000 a minute. So if you have more than about 8000 records you'll probably hit the 6 minute limit. Running a very large conversion in serial could tie you up for a while. 

In Parallel processing in Apps Script I showed how you could create a simple map/reduce method to run a number of processes at once. Here's how you can split up your work and run many threads at the same time. We are going to use Database abstraction with google apps script to read and write to the database, as well as the control and log structures required by Parallel processing in Apps Script

Parallel processing with unpredictable sequencing is not an easy subject, but once you figure it out it, it may be useful for all sorts of longer running processes.

Libraries

You'll need these libraries 

cTriggerHappy - MuIOvLUHIRpRlID7V_gEpMqi_d-phDA33, which you can include or fork as you prefer. Another library you need in you application is Database abstraction with google apps script, which is Mj61W-201_t_zC9fJg1IzYiz3TLx7pV4j.  

Starting off

You should read Parallel processing in Apps Script for the detail of how this works, but in summary -
  • Your job is split into a number of chunks
  • Each of these chunks is run independently and simultaneously - each of these chunks is called 'mapping'
  • An addition run called 'reduce' is scheduled and run when all the mapping is finished. This will consolidate the results of each of the mappings into a single result.
  • A final run called 'process' is scheduled and run. This does any post processing required once all the data is consolidated by the reduce function. 
These split, map, reduce and process functions are all pretty much boiler plate. In our example, we want to take a large scriptDB database and convert it to a parse.com data base. Using Database abstraction with google apps script this can normally be accomplished in a few lines of code. 

Something like this. 


  var scriptHandler = new cDataHandler.DataHandler ('random', cDataHandler.dhConstants.DB.SCRIPTDB,  undefined, 'myddb', ScriptDb.getMyDb());
  assert(scriptHandler .isHappy(), 'unable to get scriptdb handler','handler');

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler ( 'random', cDataHandler.dhConstants.DB.PARSE, undefined, 'myParse', JSON.parse(userStore.getProperty("parseKeys")));
  assert(handler.isHappy(), 'unable to get parse handler','handler');

  var result = handler.save ( scriptHandler.query().data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

The problem starts when you hit quota limits on the time to read and write. Parse is much faster than scriptDB (see Comparing all back ends performance), but you can still soon hit the processing time limits doing both.  We could split it into bits like this - this would be the easy route.

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:1000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:2000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

  var result = handler.save ( scriptHandler.query(undefined, {limit:1000,skip:3000} ).data);
  assert (result.handlerCode >=0 , result , 'writing to parse);

 ... etc

But we'd need to run each separate and it might take a long time to look after that. Using the boiler plate from Parallel processing in Apps Script, we can do something like this

The split

We'll set some maximum chunk size that should be run in one go, then use the boiler plate below - this deletes all the current parse.com data, gets the size of the current scriptDB and lines up a number of map jobs to run.

function splitJobIntoTasks () {
   
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // get the scriptdb to see how much work there is
  var handler = getTheScriptDb();

  var result = handler.count();
  assert (result.handleCode >=0, result, 'counting scriptdb');
  var toDo = result.data[0].count;
  var total = toDo;
  var maxChunkSize = 1000;

  tHappy.log (null, 'deleting parse data', 'splitJobIntoTasks');
  var parseHandler = getTheParse();
  var result = parseHandler.remove();
  assert (result.handleCode >= 0, result, 'deleting parse data');
  
  // i'm splitting the work in chunks
  tHappy.log (null, 'starting to split','splitJobIntoTasks');
  tHappy.init ();
  var chunkNumber;
  for  (chunkNumber=0 ; toDo > 0 && maxChunkSize ; chunkNumber++) {
    var n = Math.min(toDo, maxChunkSize);
    tHappy.saveTask ( {
      index:chunkNumber,
      skip:total-toDo,
      limit:n 
    });
    toDo-=n;
  }

  tHappy.log(null, 'finished splitting');
  tHappy.triggerTasks ();
  tHappy.log(null, 'triggering is done','splitJobIntoTasks');
  return chunkNumber;
}

The Map

Some number of these will be scheduled to run simultaneously - each one will run a chunk of work as defined by tHappy.saveTask() during the split phase.

function workMap() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl());
  
  try {
    // your result goes here
    var result = {data:null,handleCode:0,handleError:'',task:tHappy.somethingToMap()}; 
    // first find something to do
    
    
    // if anything to do 
    if (result.task) {
      
      tHappy.log( null, 'starting mapping for job ' +  result.task.jobKey + '/' + result.task.taskIndex +  ' task ' + result.task.key  ,'workMap');
      
      // get the scriptDB
      var handler = getTheScriptDb();
      
      // get the parse.com
      var parseHandler = getTheParse();
      
      // get the data segment
      var sData = handler.query(undefined, {skip:result.task.params.skip, limit:result.task.params.limit});
      assert (sData.handleCode >=0 && sData.data.length === result.task.params.limit, sData, ' getting scriptdb data');
      tHappy.log (null, 'got ' + sData.data.length + ' records from scriptDB starting at ' + result.task.params.skip, 'workMap');
      result.data = [result.task.params.skip,sData.data.length];
      
      // write it to parse
      var pData = parseHandler.save(sData.data);
      assert (pData.handleCode >=0 , result, ' writing parse data');
      tHappy.log (null, 'wrote ' + sData.data.length + ' records to parse starting at ' + result.task.params.skip, 'workMap');
      
      tHappy.finished (result);
      tHappy.log(null, 'finished mapping');
    }
  }
  catch (err) {
    tHappy.log(null, err, 'mapping failure');
    throw err;
  }
  
  return {handleError: result.handleError, handleCode: result.handleCode};

}

The reduction

In this example, there's nothing much to do, since the map operations were independent of each other. This just consolidates the results and triggers a process. To ensure workflow integrity, a reduce must always follow a set of maps, even though in this case its not doing a lot.

function workReduce () {
     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  try {
    // bring all the results together
    tHappy.log(null, 'starting reduction','workReduce');
    tHappy.reduce();
    tHappy.log(null, 'finishing reduction','workReduce');
  }
  catch (err) {
    tHappy.log(null, err, 'reduction failure');
    throw err;
  }
}

The process

Again, not too much to do. We'll just check that parse.com ended up with the expected number of records. 

function workProcess() {

     
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  
  // all is over, we get the reduced data and do something with it.
  try {
    var reduced = tHappy.somethingToProcess ();
    tHappy.log( null, 'starting processing for job ' + (reduced ? JSON.stringify(reduced) : ' - but nothing to do'),'workProcess');
    if (reduced) {
     
      // we'll just report on the parse records
      var parseHandler = getTheParse();
      var pData = parseHandler.count();
      assert (pData.handleCode >=0,pData,'counting parse data');
      
      tHappy.log (null, 'parse contains ' + pData.data[0].count + ' records', 'workProcess');
      
    
      // mark it as processed
      tHappy.processed(reduced);
      
      // we'll use the logger too
      tHappy.log( null, 'finished processing','workProcess');
      
      // clean up any triggers we know we're done
      tHappy.cleanupAllTriggers();
      
    }
  }
  catch (err) {
    tHappy.log(null, err, 'processing failure');
    throw err;
  }
}

Some utilities


I created a couple of functions that are used a number of times here. Note that I've specified a siloId ('random') in this example. If you are already using Database abstraction with google apps script, then your scriptDB will have this. If not you can just leave it as null and it will take the whole scriptDb. Note also that you'll need to have set up your parse credentials in your property store (see DriverParse)

function getTheScriptDb() {

  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.SCRIPTDB,      
    undefined,
    'myddb',
    ScriptDb.getMyDb());
  
  assert(handler.isHappy(), 'unable to get scriptdb handler','handler');
  
  return handler;

}
function getTheParse() {

  var userStore = PropertiesService.getScriptProperties();
  var handler = new cDataHandler.DataHandler (
    'random',                        
    cDataHandler.dhConstants.DB.PARSE,      
    undefined,
    'myParse',
    JSON.parse(userStore.getProperty("parseKeys")));
  
  assert(handler.isHappy(), 'unable to get parse handler','handler');
  
  return handler;

}

The log

You'll notice tHappy.log() being called a few times.It's tough to debug triggered simultaneous tasks, but the .log() can be used to report progress of each of the various threads in a single place. Here's a typical log. You'll see that pretty much all 6 mapping operations were running simultaneously and about 15 minutes of processing time was done in about 6 .

whencommenttriggerIdtriggerFuncinstance
30 July 2014 15:01:24 BSTdeleting parse datasplitJobIntoTasksgyhy8q1zp1
30 July 2014 15:01:24 BSTstarting to splitsplitJobIntoTasksgyhy8q1zp1
30 July 2014 15:01:46 BSTfinished splittinggyhy8q1zp1
30 July 2014 15:01:47 BSTtriggered 05.32857E+18workMapgyhy8q1zp1
30 July 2014 15:01:50 BSTtriggered 18.87382E+18workMapgyhy8q1zp1
30 July 2014 15:01:51 BSTtriggered 21.49364E+17workMapgyhy8q1zp1
30 July 2014 15:01:52 BSTtriggered 32.6416E+18workMapgyhy8q1zp1
30 July 2014 15:01:53 BSTtriggered 43.33949E+18workMapgyhy8q1zp1
30 July 2014 15:01:54 BSTtriggered 56.50636E+18workMapgyhy8q1zp1
30 July 2014 15:01:54 BSTtriggered 6 tasks from 6 (thread limit 0)triggerTasksgyhy8q1zp1
30 July 2014 15:01:55 BSTtriggering is donesplitJobIntoTasksgyhy8q1zp1
30 July 2014 15:02:03 BSTstarting mapping for job ujhy8q22zl/0 task gahy8q25i5workMapewhy8q2t4t
30 July 2014 15:02:07 BSTstarting mapping for job ujhy8q22zl/1 task uqhy8q27tjworkMapsghy8q2vlq
30 July 2014 15:02:07 BSTgot 1600 records from scriptDB starting at 0workMapewhy8q2t4t
30 July 2014 15:02:12 BSTstarting mapping for job ujhy8q22zl/2 task hvhy8q2a79workMapdphy8q2z39
30 July 2014 15:02:13 BSTgot 1600 records from scriptDB starting at 1600workMapsghy8q2vlq
30 July 2014 15:02:17 BSTgot 1600 records from scriptDB starting at 3200workMapdphy8q2z39
30 July 2014 15:02:17 BSTstarting mapping for job ujhy8q22zl/3 task ruhy8q2c55workMapmbhy8q347g
30 July 2014 15:02:22 BSTgot 1600 records from scriptDB starting at 4800workMapmbhy8q347g
30 July 2014 15:02:52 BSTstarting mapping for job ujhy8q22zl/4 task hahy8q2ealworkMapqhhy8q3su0
30 July 2014 15:02:59 BSTgot 1600 records from scriptDB starting at 6400workMapqhhy8q3su0
30 July 2014 15:04:21 BSTwrote 1600 records to parse starting at 0workMapewhy8q2t4t
30 July 2014 15:04:32 BSTtriggered 57.6271E+17workMapewhy8q2t4t
30 July 2014 15:04:33 BSTtriggered 1 tasks from 1 (thread limit 0)triggerTasksewhy8q2t4t
30 July 2014 15:04:34 BSTfinished mappingewhy8q2t4t
30 July 2014 15:04:47 BSTwrote 1600 records to parse starting at 1600workMapsghy8q2vlq
30 July 2014 15:04:49 BSTwrote 1600 records to parse starting at 3200workMapdphy8q2z39
30 July 2014 15:04:56 BSTwrote 1600 records to parse starting at 4800workMapmbhy8q347g
30 July 2014 15:05:05 BSTwrote 1600 records to parse starting at 6400workMapqhhy8q3su0
30 July 2014 15:05:19 BSTstarting mapping for job ujhy8q22zl/5 task fkhy8q2gweworkMapbmhy8q6l55
30 July 2014 15:05:21 BSTgot 100 records from scriptDB starting at 8000workMapbmhy8q6l55
30 July 2014 15:05:25 BSTwrote 100 records to parse starting at 8000workMapbmhy8q6l55
30 July 2014 15:05:50 BSTfinished mappingdphy8q2z39
30 July 2014 15:05:52 BSTfinished mappingsghy8q2vlq
30 July 2014 15:05:53 BSTfinished mappingmbhy8q347g
30 July 2014 15:06:05 BSTtriggered 02.8631E+18workReducebmhy8q6l55
30 July 2014 15:06:06 BSTfinished mappingbmhy8q6l55
30 July 2014 15:06:21 BSTstarting reductionworkReducecnhy8q8ew2
30 July 2014 15:06:45 BSTtriggered 09.03126E+18workProcesscnhy8q8ew2
30 July 2014 15:06:47 BSTfinishing reductionworkReducecnhy8q8ew2
30 July 2014 15:07:25 BSTstarting processing for job {"key":"fshy8q8lgc","jobKey":"ujhy8q22zl","result":[0,1600,1600,1600,3200,1600,4800,1600,6400,1600,8000,100],"start":1406729189532,"finish":1406729189532,"handleCode":0,"handleError":""}workProcesscshy8q9e99
30 July 2014 15:07:27 BSTparse contains 8100 recordsworkProcesscshy8q9e99
30 July 2014 15:07:34 BSTfinished processingworkProcesscshy8q9e99

The control

This can use any of the Database abstraction with google apps script supported backends for controls files and logs, but I prefer using Drive for control files and sheets for the log file. You can set all that up here. You'll need to create and specify drive folders and a spreadsheet to use for logging if you are using the defaults below.

function getControl () {
  return {
    script: {
      id: "1i5qxNh1qs_-b6hKRMY_W1MR4nnwuyhHQG6Y9tSINrgyqFQW6qH-GMP__",
      reduceFunction: 'workReduce',
      taskFunction:'workMap',
      processFunction:'workProcess'
    },
    taskAccess: {
      siloId:  'migrateTasks.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    logAccess: {
      siloId:  'thappylog',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    reductionAccess: {
      siloId:  'migrateReductions.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    jobAccess: {
      siloId:  'migrateJobs.json',
      db: cDataHandler.dhConstants.DB.DRIVE,
      driverSpecific: '/datahandler/driverdrive/tasks',
      driverOb: null
    },
    reportAccess: {
      siloId:  'thappyreport',
      db: cDataHandler.dhConstants.DB.SHEET,
      driverSpecific: '12pTwh5Wzg0W4ZnGBiUI3yZY8QFoNI8NNx_oCPynjGYY',
      driverOb: null
    },
    triggers: true,
    delay:5000,
    enableLogging:true,
    threads:0,
    stagger:1000, 
    killStage:false,
    debugGrabbing:false
  };

The report


Getting a report of the orchestration database or drive files can sometime be helpful. You can get one like this, and it will end up wherever you've set control.reportAccess to save to.

function report () {
  // need this for each function that might be triggered
  var tHappy = new cTriggerHappy.TriggerHappy (getControl()); 
  tHappy.report();
}

Orchestration

One of the biggest challenges of all this was the locking mechanism between multiple instances of the same script. I'm using the library for Using named locks with Google Apps Scripts for all locking. Whether or not to log debugging of locking attempts can be switched on and off with the control.debugging property. It's used like this. Take a look at this log output if things don't seem to be going as expected.

  function grabControl(who, f) {
  
      if (control_.debugGrabbing) {
        self.log(null,who,'trying to grab ' + new Date().getTime());
      }
      var info;
      var r = new cNamedLock.NamedLock(undefined,undefined,true).setKey('thappy' , control_.jobAccess).protect(who+'_'+instance_, function (lock) {
        if (control_.debugGrabbing) {
          info = ':lock-' + JSON.stringify(lock.getInfo());
          self.log(null,who,'grabbed ' + new Date().getTime() + info);
        }
        return f();
      }).result;
      
      if (control_.debugGrabbing) {
        self.log(null,who,'ungrabbed ' + new Date().getTime() + info);
      }  
      return r;
  }

You can see the kind of output you'll get below, where you can follow the game between each task as it tries to control who does what next. Note you should use the timestamp to examine the log - since it is also protected by locking - and it's not guaranteed that locks are distributed int the order they are asked for, since deadlocks are avoided by using a random retry interval.

30 July 2014 14:17:26 BSTtriggerTasksgrabbed 1406726246451:lock-{"id":"zry8ohifj","key":"y9DckumDse6i72+k381bjg==","when":1406726246378,"who":"triggerTasks_sphy8ogu96","expires":1406726336378}sphy8ogu96
30 July 2014 14:17:28 BSTtriggered 08.3724E+18workMapsphy8ogu96
30 July 2014 14:17:30 BSTtriggered 15.64626E+18workMapsphy8ogu96
30 July 2014 14:17:32 BSTtriggered 21.39194E+17workMapsphy8ogu96
30 July 2014 14:17:34 BSTtriggered 38.84058E+18workMapsphy8ogu96
30 July 2014 14:17:36 BSTtriggered 44.85608E+18workMapsphy8ogu96
30 July 2014 14:17:37 BSTsomethingtomaptrying to grab 1406726257242xphy8ohqvu
30 July 2014 14:17:37 BSTtriggered 51.16435E+17workMapsphy8ogu96
30 July 2014 14:17:38 BSTtriggered 6 tasks from 6 (thread limit 8)triggerTaskssphy8ogu96
30 July 2014 14:17:38 BSTsomethingtomaptrying to grab 1406726258689kdhy8ohs01
30 July 2014 14:17:39 BSTsomethingtomapgrabbed 1406726259858:lock-{"id":"wwy8ohsj1","key":"y9DckumDse6i72+k381bjg==","when":1406726259720,"who":"somethingtomap_kdhy8ohs01","expires":1406726349720}kdhy8ohs01
30 July 2014 14:17:38 BSTtriggerTasksungrabbed 1406726258959:lock-{"id":"zry8ohifj","key":"y9DckumDse6i72+k381bjg==","when":1406726246378,"who":"triggerTasks_sphy8ogu96","expires":1406726336378}sphy8ogu96
30 July 2014 14:17:41 BSTtriggering is donesplitJobIntoTaskssphy8ogu96
30 July 2014 14:17:43 BSTsomethingtomapungrabbed 1406726263004:lock-{"id":"wwy8ohsj1","key":"y9DckumDse6i72+k381bjg==","when":1406726259720,"who":"somethingtomap_kdhy8ohs01","expires":1406726349720}kdhy8ohs01
30 July 2014 14:17:43 BSTstarting mapping for job afhy8ogwfv/0 task hfhy8ogzkvworkMapkdhy8ohs01
30 July 2014 14:17:45 BSTgot 1000 records from scriptDB starting at 0workMapkdhy8ohs01
30 July 2014 14:18:18 BSTwrote 1000 records to parse starting at 0workMapkdhy8ohs01
30 July 2014 14:18:19 BSTTriggerHappy.finishedtrying to grab 1406726299207kdhy8ohs01
30 July 2014 14:18:20 BSTTriggerHappy.finishedgrabbed 1406726300263:lock-{"id":"ufy8oinqo","key":"y9DckumDse6i72+k381bjg==","when":1406726300130,"who":"TriggerHappy.finished_kdhy8ohs01","expires":1406726390130}kdhy8ohs01
30 July 2014 14:18:27 BSTTriggerHappy.finishedungrabbed 1406726307820:lock-{"id":"ufy8oinqo","key":"y9DckumDse6i72+k381bjg==","when":1406726300130,"who":"TriggerHappy.finished_kdhy8ohs01","expires":1406726390130}kdhy8ohs01


You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, All formats are available from O'ReillyAmazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.


For help and more information join our forumfollow the blogfollow me on twitter
Comments