BigQuery and the Execution API

The data for BigQuiz app is held in BigQuery. In Loading large JSON datasets into BigQuery with Apps Script I showed how you could maintain data in BigQuery with Apps Script, but now when I come to use it, I find that I want to abstract away access to it. Here's why.
  • I might change from BigQuery to something else
  • I'd like to only write the logic for getting data just once. My front-end is in HtmlService with an Apps Script webapp for now, but I might change that and want to create new apps on different platforms. By using the Execution API, I only need to figure out how to interact with that from whatever the front end platform is. That leaves me free to change how the back end works, or even to migrate to a different back end with no front end changes
  • The webapp is going to run 'as the user' since I need to get access to the user's data via the People API, but the bigquery access will run as me. This also allows me to keep any billing associated with BigQuery in a billing enabled project separate to the webapp project, and also allows me to use a service account for that part of the puzzle. This also avoids the need for BigQuery scope in anything other than the API executable.

Try the App here.

The bigQuery part.

This is a standalone project that will be called via the execution API. As usual, I first need to create a function to run once using goa to manage a service account. This will create my credentials, which I've downloaded from the console project that holds my bigquery data. I've also created an API key in that project for billing

function oneOffSetting() { 
  
  // service account for cloud vision
  // provoke drive auth DriveApp.getFiles()
  cGoa.GoaApp.setPackage (App.globals.oauth.props , 
   cGoa.GoaApp.createServiceAccount (DriveApp , {
     packageName: App.globals.oauth.packageName,
     fileId:'0B92ExLh4POiZSnRPNmkwX0N6SGM',
     scopes : cGoa.GoaApp.scopesGoogleExpand (['cloud-platform.read-only','bigquery']),
     service:'google_service',
     apiKey:'AIzaxxxxxxxxxiIPOQJ3HB8'
   }));
 
}

QueryUtils namespace

Next up I needed a couple of reusable functions through which to access BigQuery. I decided not to use the Apps Script advanced service, but instead to use the JSON REST API. I actually find it easier than the advanced service,  it allows me better control over exponential backoff, and  more importantly, I can use a service account pointing to a different project where I have all my cloud billing centralized. 

There is actually only one exposed method in this namespace
QueryUtils.query (apiKey , accessToken , preojectId , sqlString);

and it takes care of normalizes the paged, data returned from BigQuery into an array of JavaScript objects.  Note in particular the use of exponential backoff while waiting for the query data to complete loading (this happens asynchronously). This avoids the unnecessary sleeping recommended in the developers guide. The lookahead function tells expBackoff about things that might happen that it's worth retrying for. Aside from that, the code is fairly self explanatory.
    var queryJob = cUseful.Utils.expBackoff ( function () {
      return bigQueryFetch_ ( apiKey, accessToken,  projectId, sqlString);
    }, {
      lookahead:function(response , attempt) {
        return !response.jobComplete;
      }
    });

The code for QueryUtils.

/**
*@namespace QueryUtils
*runs server side and handles all database querying
*/
var QueryUtils = (function (ns) {
  
  /**
  * do a query
  * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
  * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
  * @param {string} projectId the project id
  * @param {string} sqlString the string
  * @return {[object]} the results as array of javacript objects
  */
  ns.query = function (apiKey, accessToken, projectId , sqlString) {
    
    
    // we can use exp backoff to get results of job
    var queryJob = cUseful.Utils.expBackoff ( function () {
      return bigQueryFetch_ ( apiKey, accessToken,  projectId, sqlString);
    }, {
      lookahead:function(response , attempt) {
        return !response.jobComplete;
      }
    });
    
    // now retrieve the result
    var result = nextQueryResults_(apiKey , accessToken , queryJob);
    
    // Get all the rows of results.
    var rows = result.rows;
    while (result.pageToken) {
      result = nextQueryResults_ (apiKey, accessToken, result);
      Array.protoype.push.apply (rows,result.rows);
    }
    
    // the headers
    var headers = result.schema.fields.map(function(field) {
      return field.name;
    });
    
    // the rows to an array of obs
    return (rows || []).map(function (d) {
      var idx = 0;
      return headers.reduce(function (p,c) {
        p[c] = d.f[idx].v;
        idx++;
        return p;
      },{});
    });
    
  };
  
  /**
  * take care of returning the pageifed data frm a query
  * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
  * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
  * @param {object} queryResult the query result resource
  * @param {object} the next set of results
  */
  function nextQueryResults_ (apiKey , accessToken,  queryResult ) {
    
    var response = cUseful.Utils.expBackoff ( function () {
      return UrlFetchApp.fetch (
        "https://www.googleapis.com/bigquery/v2/projects/" + 
        queryResult.jobReference.projectId + 
        "/queries/" + 
        queryResult.jobReference.jobId + 
        "?apiKey=" + apiKey + (queryResult.pageToken ? 
        ("&pageToken=" + queryResult.pageToken) : ''),  {
        headers:{
          Authorization:"Bearer " + accessToken
        },
          method:"GET"  
      });
    });
    return JSON.parse(response.getContentText());
  }; 
  
  /**
  * do initial query
  * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
  * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
  * @param {string} projectId the project id
  * @param {string} sqlString the string
  */
  function bigQueryFetch_ (apiKey, accessToken, projectId, sqlString) {
    
    var body = {
      "kind": "bigquery#queryRequest",
      "query": sqlString
    };
    
    var result = cUseful.Utils.expBackoff (function () {
      return UrlFetchApp.fetch ( 
        "https://www.googleapis.com/bigquery/v2/projects/" + projectId + "/queries?apiKey=" + apiKey , {
          method:"POST",
          headers:{
            Authorization:"Bearer " + accessToken
          },
          payload:JSON.stringify(body),
          contentType:"application/json"
       });
    });
  
    return JSON.parse(result.getContentText());
  
   };
                  
   return ns;
}) (QueryUtils || {});


The execution part

This is what gets called via the ExecutionAPI. There are two types of calls. 

The first is to return a summary of all the categories in the BigQuery Table. That provides the input for this selection back in the front end

Here's the code - of course cache is used wherever possible. Querying BigQuery with the QueryUtils namespace is just a matter of constructig an approriate SQL string.

/**
 * will do a bigquery to return categories
 * @param {number} maxCats maximum categories
 * @param {boolean} noCache whether to suppress cache
 * @return {object} the result
 */
function getCategories (maxCats,noCache) {
  App.init();
  var ag = App.globals.bigQuery;
  
  var sqlString =  'SELECT COUNT(*) as num,category FROM' + 
    ' [' + ag.dataStore + '.' + ag.table + ']' + 
      ' GROUP BY category' +
      ' ORDER BY num DESC' +
      ' LIMIT ' + maxCats;
  // use cache if allowed
  var cache = CacheService.getScriptCache();
  var cacheKey = cUseful.Utils.keyDigest (sqlString);
  var data = noCache ? null : cache.get(cacheKey);
  var result = data ? JSON.parse(data) : doQuery_ (sqlString);

  // always write it to cache regardless
  cache.put(cacheKey, JSON.stringify(result),60*60*6);
  return result;
  
}


The other kind of query is one that returns a random question, along with a number of other answers from the same category which can be used as multiple choice selections given a question. This produces this dialog back in the front end.



Here's the code. The selected answers are also shuffled into random order. Note also that duplicate answers are removed to help avoid repetition, and a number of questions are returned in a chunk to reduce the number of calls that need to be made.
 * will be called by any script wanting some questions
 * @param {string} category the category
 * @param {number} numAnswers number of answers to provide
 * @param {number} chunkSize the number of questions to get
 * @return {object] the result
 */
function getQuestions (category, numAnswers, chunkSize) {
  App.init();
  var ag = App.globals.bigQuery;
  
  // get a number of questions plus a few more in case there are duplicate answers
  var sqlString =  'SELECT *, rand() as rand FROM' + 
    ' [' + ag.dataStore + '.' + ag.table + ']' + 
    ' WHERE category = "' + category +'"' +
    ' ORDER BY rand' +
    ' LIMIT ' + Math.ceil(numAnswers * chunkSize * 1.2);

  // do the query
  var data  = doQuery_ (sqlString);
  
  // get rid of anything with duplicate answers
  // its possible that this will return less than chunk size
  // but the caller shoudl be asking for more than they need anyway
  data = data.filter(function (d,i,a) {
    return !a.slice(0,i).some(function(e) {
      return e.answer === d.answer;
    });
  });
  
  // now separate into separate questions - first select a chunk of questions
  var questions = data.splice(0,chunkSize);
  
  // use the rest for multiple choice potential answers
  return {
    category:category,
    questions:questions.map (function (d) {
      return {
        picked:d,
        answers:shuffleArray([d].concat(data.splice(0,numAnswers - 1))).map(function (d) {
          return d.answer;
        })
      };
    })
  };
    
  
  // from http://stackoverflow.com/questions/2450954/how-to-randomize-shuffle-a-javascript-array
  function shuffleArray(array) {
    for (var i = array.length - 1; i > 0; i--) {
      var j = Math.floor(Math.random() * (i + 1));
      var temp = array[i];
      array[i] = array[j];
      array[j] = temp;
    }
    return array;
  }
}

and this submits both types of queries, or can be used from the execution API for adhoc queries.

function doQuery_ (sqlString) {  
  var ag = App.globals.bigQuery;
  return QueryUtils.query (
    App.goa.getProperty("apiKey"), 
    App.goa.getToken(),
    ag.projectId,
    sqlString);

}

The Settings


And of course the app settings are in their own namespace too.
/**
* used to control the App functions
* and hold globals values
* @namespace App
*/
var App = (function (ns) {
  
  ns.globals = {
    bigQuery: {
      projectId:'lateral-command-416',
      dataStore:'jeopardydata',
      table:'questions'
    },
    oauth: {
      packageName:'bigquery_serviceaccount',
      props:PropertiesService.getScriptProperties()
    }
  };
  
  ns.init = function () {
    ns.goa = cGoa.GoaApp.createGoa(
      ns.globals.oauth.packageName, 
      ns.globals.oauth.props
    ).execute();
  
  };
  
  return ns;
}) (App || {});


Once published as an API executable, that's all there is to it. I now have a fully delegated way to get data from BigQuery.


Calling the execution API from BigQuiz app


The BigQuiz app calls these two functions when it needs data.  These two functions are part of its Server namespace.
  /**
  * given a category get a selection of random questions
  * @param {string} category the category
  * @param {number} numberOfQuestions how many questions to use for answer multiple choice
  * @param {number} [chunkSize=1] the number of question chunks
  * @param {[object]} questions
  */
  ns.getQuestions = function (category , numberOfQuestions,chunkSize) {
    return Execution.run (
      'getQuestions',
      [category,numberOfQuestions,chunkSize || 1],
      false
    ).response.result;
  };
  

  /**
  * get existing categories
  * @param {number} maxCats number of cats to get
  * @param {[string]} categories
  */
  ns.getCategories = function (maxCats,nocache) {
  
    var execResponse = Execution.run (
      'getCategories',
      [maxCats,nocache],
      false
    );
    
    return execResponse.response.result;
    
  };

It uses the Execution namespace to handle the conversation. 
/**
* calls executionAPI
* @namespace Execution
*/
var Execution = (function(ns) {
  
  ns.run = function (functionName , args, devMode ) {
    var payload =  JSON.stringify({
      "function":functionName,
      "devMode":devMode,
      "parameters":args
    });
    
    // now call the remote project
    var result = UrlFetchApp
    .fetch ( 
      "https://script.googleapis.com/v1/scripts/" + 
      ns.getGoa().getProperty("executionProject") + ":run", {
      method:"POST",
      payload: payload,
      contentType: "application/JSON",
      headers: {
      "Authorization":"Bearer " + ns.getToken()
      }
    });
    
    return JSON.parse(result.getContentText());
  };
  
  ns.getGoa = function () {
    return cGoa.make(
      Demo.PACKAGE_RUN.name, 
      Demo.PACKAGE_RUN.props
    );
  };
  
  ns.getToken = function () {
    var goa = ns.getGoa();
    if (!goa.hasToken()) {
      throw 'there is no token available - did you do a consent dialog?';
    }
    return goa.getToken();
  };
  
  return ns;
})(Execution || {});

Note also that an OAuth2 authorization is needed with the project it's targetting. After setting up some credentials in the target project, BigQuiz needs a one off setting like this. Unfortunately you can't use a service account with the execution API, so this is handled as a dialog in the BigQuiz webapp startup.
  cGoa.GoaApp.setPackage (PropertiesService.getScriptProperties() , 
    cGoa.GoaApp.createPackageFromFile (DriveApp , {
      packageName: Demo.PACKAGE_RUN.name,
      fileId:'0B92ExLh4POiZdVZKdHVEUjBMaDg',
      scopes : cGoa.GoaApp.scopesGoogleExpand ([
        'drive',
        'script.external_request',
        'script.scriptapp',
        'script.storage'
      ]),
      service:'google',
      executionProject:'MrH2Hz-cZpO-bGH1gT1NAVKi_d-phDA33'
  }));






The code

You'll find the code on github for both  BigQuiz and quizExecution.


For more like this, see Google Apps Scripts snippets. Why not join our forumfollow the blog or follow me on twitter to ensure you get updates when they are available. 




Comments