Update analytics batch code and the 6 minute limit

Each night I run a scheduled task which takes the Google Analytics data for this Google Sites and matches it up to the site pages. At the time of writing I have 1000 different pages in analytics to match to over 500 pages on this site. 

You can take a copy of it here.

Avoiding the 6 minute limit.

As usual, with such a lot of content, I have to worry about the 6 minute execution time limit. There were a number of approaches I could have taken to cut the work into chunks - such as Parallel processing in Apps Script, but this process is a recursive one - and it's kind of complicated to cut up. The thing that takes most time is to go off and get G+ counts for each page, and domain name variants on that page (for example sites.google.com/a/mcpher.com/abc is actually the same page as ramblings.mcpher.com/abc and www.mcpher.com/abc but they are, sometimes .. but strangely not always, seen as different pages by G+).

So that's 2000 URL fetches right there. Allow .3 seconds for each and i've already blown the limit, not even accounting for having to backoff when I hit URLFetch per second quota. 

Luckily caching can come to rescue. I use this method - Extract the plus one counts from a page to get the g+ data, but I run a section of the overall process a few times first, knowing that it will probably run out of execution time. However running it, fills up cache  (about 10 times as fast) with the results , so that when I come to run the real process, all the g+ results will already be in cache and everything can finish comfortably within the quotas.

Here's the snippet that I run a few times


function preCachePlusOnes() {
  // if we do this first & seperately we can limit execution time
  // all this does is get the site and populate cache so that when we run the real thing it picks it up from cache
  // can be scheduled to run a couple of times to make sure it picks up everything
  
  // get the parameters for this site
  var options = cSiteStats.getOptions('ramblings');
    // this is the site i'm working with
  var site = SitesApp.getSite(options.domain, options.site);
  
  // get all the pages on the site
  var root = getPages(site);
  
  // add plus 1 counts
  addPlusOneCounts (root,options,true);
  
  
}

So my scheduled triggers look like this


Here's what runs less regularly, and populates the database with all the statistics



For help and more information join our forum,follow the blog or follow me on twitter .


Comments