Optimizing access to sheet values

Why so slow?

Note - this article was written when Apps Script first came out. Since that time, the speed of Sheets access has increased significantly, but I leave this article here for interest.

The Data Manipulation Classes that I am migrating in this project - see Data manipulation Classes in Google Apps Script, are highly efficient in Excel and are much faster than working directly in the Excel object model. However, you do have to get the data in there in the first place.


It is well known that, in Excel, reading a whole bunch of cell values in one go is much faster than accessing them individually. In Google apps script, this is even more so the case, since the overhead associated with a client/server request for data and the internet speed (as opposed to local computer speed) will introduce an unacceptable delay in most ports from Excel to Google Apps. I wondered if there was a way to introduce local caching, so if any sheet was ever accessed, all its values would be transferred locally and any further accesses would be to the cached copy. I got results that showed this was 70 times faster. You can find the code (its a work in progress) and test data in the VBA to Google Apps Script Roadmapper project download See also this blogpost on optimizing google apps scripts getValues() for details. You can also find the code for a timer, and the test results for this in Apps Script timer collection



Caching of Google Docs Sheet Values

Here are the functions to allow caching, but first, some examples of how to use them. The complete code (work in progress) can be found here - Google Apps Script Roadmapper code

This reads in all the values in a couple of sheets, and shows the number of rows and columns. You don't need to care whether this was an actual read from the server or whether this was a cache access.
 // read a couple of worksheets
 DebugPrint (sheetCache("isbn").getRowCount());
 DebugPrint (sheetCache("inputdata").getColumnCount());

A forEach() function is provided for enumeration. For every cell in the worksheet, your function is called, passing the value, row and column index.
 // print them out
 sheetCache("isbn").forEach( function(v,r,c) {
   DebugPrint (v,r,c);
   }
 );
 
This will return a single value at the row, column specified. If it is outside the range of the sheet, and Empty() value is returned 
DebugPrint ('outside range of sheet',     sheetCache("isbn").getValue(50,50));

This is interesting, since getValues() only returns an array the shape of the used range of the sheet. However, in Excel, you will get an array the size of the range you speficy, padded with empties. This emulates the Excel approach
 DebugPrint ('values overlap edge of range',
     sheetCache("isbn").getValues
       (Sheets("isbn").getRange("e3:m8"))[0].length);
}

What about other cacheable values

So why not use this same technique for every cacheable value ( getFontSizes(), getHorizontalAlignments() etc..) . See Google Apps Script Range object for the full list.

Turns out this is a trivial matter, since javaScript treats everything as an object, including functions, you can just pass the particular function you want called, and keep a separate cache going for each one. For example, this will execute getBackGroundColors() instead of getValues() to populate the cache, and return the background color at row rn, column cn of worksheet wn. Any cacheable method can be used, they all work the same way as getValues(), and you will see the same kind of performance gain that we have already seen.

sheetCache(wn,'getBackgroundColors').getValue(rn,cn)

Cache identification

The function sheetCache(object) takes a range, a worksheet, or a worksheet name. It will always return the cache associated with that worksheet or implied worksheet.  Here is the code. Note that the default method is getValues, but this could be any cacheable method that returns values for the whole sheet, such as getComments, getFontColors etc.

function sheetCache(ob,method) {
  return createSheetCache().getCache(ob,fixOptional (method,'getValues'));
}

How the method is executed

As mentioned, the 2nd (optional .. default is getValues) argument specifies the method that would retrieve the values or other objects in bulk from the sheet to fill up the cache. A separate cache is maintained for each method that is invoked.  Here is the getValues() that is executed within the caching class.  Note how the google apps script native method fills up the cache this.xValues = r[this.xMethod](); if it is the first time accessed it if it needs to be refreshed. It is that simple.

cCacheItem.prototype.getValues = function(r) {
  if (this.dirty() || !this.xValues) {
    var r = wholeSheet(this.xName);
    DebugAssert(r[this.xMethod], this.xMethod + ' method does not exist for range' + sad(r));
    this.xValues = r[this.xMethod]();
    this.setDirty(false);
  }
  return isUndefined (r) ? this.xValues : this.getValuesOfRange(r);
}


Values by range

By default, cache.getValues() will return all the values of the selected method in the used worksheet. However,  you can force an array of values to match the shape of a range, by passing a range argument to getValues(). For example this will return a block of values for e3:m8, and will pad them out if this range goes beyond the used range of the sheet.

 sheetCache("isbn").getValues
       (Sheets("isbn").getRange("e3:m8")))

Writing to cache

You can also write to cache and commit it when finished. Any  method that has both a set and get (for getValue and setValue) will use the same cache. This means that any updates you make with set, but have not yet committed, will be returned on subsequent gets. Note that if you have formulas, you would need to commit, and mark the cache as dirty so that subsequent getValues would return the recalculated value.

Here is an example from the Google Script Color Ramp project, where the background colors are written to cache.

  var cacheColor = sheetCache(wr,"getBackgroundColors");
  for (var i = 0; i < rampList.length ;i++ ){
    for (var m = 0; m < npoints ; m++) {
      cacheColor.setValue(rampLibraryHex (rampList[i], 0, npoints -1, m),i+1,m+1);
    }
  }
  cacheColor.close();

How this works

the .setValue() method makes changes to the cache, extending it if necessary, and marking it as needing to be flushed at some point

cCacheItem.prototype.setValue = function (value,rn,cn) {
  return (this.extend (rn,cn).setTouched().xValues[rn-1][cn-1] = value);
};

the .commit() method writes the cache back to the spreadsheet, and can even be used (via the optional argument),  to instead copy the cache back to different worksheet (an easy way to clone the values in a sheet).

cCacheItem.prototype.commit = function (optOut) {
  var oPlace = this.xParent.getWn(fixOptional(optOut, this.xName));
  if ( this.touched()  || oPlace != this.xName ) {
    // maybe writing to a different place
    var r = vResize(wholeSheet(oPlace),this.getRowCount(),this.getColumnCount());
    var m = 'set' + Mid(this.xMethod,4) ;
    DebugAssert(r[m], m + ' method does not exist for range' + sad(r));
    r[m](this.xValues);
    // if this is the same sheet as cache then its now clean
    if (oPlace == this.xName)this.setTouched(false);
  }
  return this;
};

the .close() method commits any changes and voids the cache

cCacheItem.prototype.close = function () {
  //commit changes and kill the cache
  this.commit();
  this.quit();
  return null;
};

the .quit() method voids the cache

cCacheItem.prototype.quit = function () {
  // abandon changes and kill the cache
  this.xValues =null;
  this.xParent.caches().remove(this.xKey);
  return null;
};

Scalability

I gave some thought to scalability (would this be practical with 000's of cells in a sheet?). In the end, I figured that google Docs sheets are anyway pretty small by nature, and if it was not scalable for cache approach, it probably wouldn't be a candidate for scripting anyway. I'll look into this a little more later. 

Code


Take a look at  From VBA to Google Apps Script for more like this.. In the meantime why not join our forum,follow the blog or follow me on twitter to ensure you get updates when they are available.  

Transitioning is covered more comprehensively in my my book, Going Gas - from VBA to Apps script, available All formats are available now from O'Reilly,Amazon and all good bookshops. You can also read a preview on O'Reilly.





Comments