Extract the plus one counts from a page

When implementing Using the gplus api in Apps Script I noticed that there didn't seem to be an easy way to just get the the number of plus ones given a URL - but yet I knew it must be possible because the g+ button includes the option to show a count. I'm currently working on Displaying analytics data on site pages and wanted to combine Google Analytics with Google Plus counts for that. 

Scraping the source

When you dig into the page source with a g+ button with a count enabled, you find this 
 
<div id="aggregateCount" class="Oy">41</div>

So all you have to do is find out how this is triggered. While researching I came across this blog post by Helmut Granda, and what you need is this 

https://plusone.google.com/u/0/_/+1/fastbutton?url=http://ramblings.mcpher.com&count=true

So now all that's necessary is to scrape the web source to pull out the aggregateCount div. There's not even any need for oAuth, as would probably be required if I was using the G+ api. I originally planned to use XMLService, but it's a very trivial regex problem (and anyway it's a hack) so I decided on using that instead. Here's the code (I'm using Backing off on rate limiting since i'm going to be using this snippet in bulk and want to avoid rate limit problems)

/**
 * @param {string} url the url to get the plus 1 count for
 * @return {number} the plus one count
 */
 
 // thanks to http://www.helmutgranda.com/2011/11/01/get-a-url-google-count-via-php/ for his PHP hack for the idea.
 
function getPlusOneCount(url) {
  // this is hack from how the plus 1 button with count on a page gets its data. 
  // the g+ api doesnt seem to have this capability, so here's a workaround
  url = "https://sites.google.com/a/mcpher.com/share/";
  var query = "https://plusone.google.com/u/0/_/+1/fastbutton?count=true&url=" + encodeURIComponent(url);
  
  // do ab exp backoff in case this is called loads of times
  var result = cUseful.rateLimitExpBackoff(function () {
    return UrlFetchApp.fetch ( query);
  }); 
  
  // find the result and return - XML service is unable to parse, so I'll just use a regex
  var match = /.*<div.*id=["']aggregateCount["'].*>\s*(\d+)\s*<.*/i.exec(result.getContentText());
  if (match.length < 2) {
    Logger.log ("no g+ found for " + url);
  }

  return match.length === 2 ? Number (match[1]) : 0 ;
}

Using sharedCount

As pointed out by fellow GDE Martin Hawksey, there is a sharedCount API for getting stuff like this. It also has the advantage of getting stats from facebook, linkedin etc. and hopefully will still work if Google change their button layout. The disadvantage is that it's another API and another API key to worry about. It has a free plan of 10,000 a day which is fine for batch use cases like my one, but might run out of steam if using live on a popular web site.

I did a test on a few hundred pages and it came up with the same gPlus results (luckily). The sharedCount API was a bit slower on average, but was much more variable in performance.

msgplus hacksharedCount
average442.13508.49
max6536271
min32222

Here's the code for the sharedCount version (you'll need to store your sharedCount apikey in your script properties and pass it through), and you can pick up the count from the GooglePlusOne property of the returned object.

function getSharedCount(url,key) {

  var query = "http://free.sharedcount.com/?apikey="+key+"&url=" + encodeURIComponent(url);

    // do ab exp backoff in case this is called loads of times
    var result = cUseful.rateLimitExpBackoff(function () {
      return UrlFetchApp.fetch ( query);
    },undefined,undefined,undefined,true); 
    
   
    // find the result and return - XML service is unable to parse, so I'll just use a regex
    var r = result.getContentText();
    

  return JSON.parse(r);
}


Exceeded time limits

Sooner or later you'll run into exceed quota problems if you are doing bulk analysis, so an option is to use cache. I use my own cachehandler library so I don't need to worry about creating keys, but you can use the vanilla cache service if you want. 

When I use this is in bulk mode, I schedule a pre-run whose only job is to populate cache. If it takes too long then just run it a few times. Then when I run my task that really does the processing all the plus one data is already in cache. Here's the difference it makes to both the hack and the sharedCount version.

msgplus hacksharedCount
average41.9544.78
max150141
min910

So using cache pretty much equalizes the methods, but more importantly is 10 times as fast.

Here's the cache version for each method. Both methods need a cache set up at the beginning
// keep plus ones in cache for a few hours
var cache = new cCacheHandler.CacheHandler(60*60*4,'sitePlusOnes');

sharedCount
function getSharedCount(url,key,optCache) {
 
  // we'll use cache if we can since these calls take up to a second to deal with
  var query = "http://free.sharedcount.com/?apikey="+key+"&url=" + encodeURIComponent(url);
  
  var cached;
  if (optCache) {
    cached = cache.getCache(query);
  }
  if (!cached) {
    // do ab exp backoff in case this is called loads of times
    var result = cUseful.rateLimitExpBackoff(function () {
      return UrlFetchApp.fetch ( query);
    },undefined,undefined,undefined,true); 
    
   
    // find the result and return - XML service is unable to parse, so I'll just use a regex
    var r = result.getContentText();
    

    if(optCache) {
      cache.putCache (r,query);
    }
  }
  else {
    var r = cached;
   
  }

  return JSON.parse(r);
}

G+ hack
function getPlusOneCount(url,optCache) {
  // this is hack from how the plus 1 button with count on a page gets its data. 
  // the g+ api doesnt seem to have this capability, so here's a workaround

  // we'll use cache if we can since these calls take up to a second to deal with
  var query = "https://plusone.google.com/u/0/_/+1/fastbutton?count=true&url=" + encodeURIComponent(url);
  
  var cached;
  if (optCache) {
    cached = cache.getCache(query);
  }
  if (!cached) {
    // do ab exp backoff in case this is called loads of times
    var result = cUseful.rateLimitExpBackoff(function () {
      return UrlFetchApp.fetch ( query);
    },undefined,undefined,undefined,true); 
    
   
    // find the result and return - XML service is unable to parse, so I'll just use a regex
    var match = /.*<div.*id=["']aggregateCount["'].*>\s*(\d+)\s*<.*/i.exec(result.getContentText());
    if (match.length < 2) {
      Logger.log ("no g+ found for " + url);
    }
    var r = match.length === 2 ? match[1] : '0' ;
    if(optCache) {
      cache.putCache (r,query);
    }
  }
  else {
    var r = cached;
   
  }
  return Number(r);
}

For more snippets like this see Google Apps Scripts snippets
For help and more information join our forumfollow the blogfollow me on twitter

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, available All formats are available now from O'Reilly,Amazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.




Comments