BigQuery and the Execution API - Desktop Liberation

The data for BigQuiz app is held in BigQuery. In Loading large JSON datasets into BigQuery with Apps Script I showed how you could maintain data in BigQuery with Apps Script, but now when I come to use it, I find that I want to abstract away access to it. Here’s why.

I might change from BigQuery to something else
I’d like to only write the logic for getting data just once. My front-end is in HtmlService with an Apps Script webapp for now, but I might change that and want to create new apps on different platforms. By using the Execution API, I only need to figure out how to interact with that from whatever the front end platform is. That leaves me free to change how the back end works, or even to migrate to a different back end with no front end changes
The webapp is going to run ‘as the user‘ since I need to get access to the user’s data via the People API, but the bigquery access will run as me. This also allows me to keep any billing associated with BigQuery in a billing enabled project separate to the webapp project, and also allows me to use a service account for that part of the puzzle. This also avoids the need for BigQuery scope in anything other than the API executable.

Try the App here.

Page Content hide

1 The bigQuery part.

2 QueryUtils namespace

3 The code for QueryUtils

4 The Settings

5 Calling the execution API from BigQuiz app

6 The code

The bigQuery part.

This is a standalone project that will be called via the execution API. As usual, I first need to create a function to run once using goa to manage a service account. This will create my credentials, which I’ve downloaded from the console project that holds my bigquery data. I’ve also created an API key in that project for billing.

function oneOffSetting() {
// provoke drive auth DriveApp.getFiles()
  cGoa.GoaApp.setPackage (
    App.globals.oauth.props ,
    cGoa.GoaApp.createServiceAccount ( DriveApp ,{ 
        packageName: App.globals.oauth.packageName,
        fileId:'0B92ExLh4POiZSnRPNmkwX0N6SGM, 
        scopes : cGoa.GoaApp.scopesGoogleExpand (['cloud-platform.read-only','bigquery']),
        service:'google_service',
        apiKey:'AIzaxxxxxxxxxiIPOQJ3HB8'
    }));
}

once off setup goa

QueryUtils namespace

Next up I needed a couple of reusable functions through which to access BigQuery. I decided not to use the Apps Script advanced service, but instead to use the JSON REST API. I actually find it easier than the advanced service, it allows me better control over exponential backoff, and more importantly, I can use a service account pointing to a different project where I have all my cloud billing centralized.

There is actually only one exposed method in this namespace

QueryUtils.query (apiKey , accessToken , preojectId , sqlString);

and it takes care of normalizing the paged data returned from BigQuery into an array of JavaScript objects. Note in particular the use of exponential backoff while waiting for the query data to complete loading (this happens asynchronously). This avoids the unnecessary sleeping recommended in the developers guide. The lookahead function tells expBackoff about things that might happen that it’s worth retrying for. Aside from that, the code is fairly self explanatory.

var queryJob = cUseful.Utils.expBackoff(
  function () {
    return bigQueryFetch_(apiKey, accessToken, projectId, sqlString);
  },
  {
    lookahead: function (response, attempt) {
      return !response.jobComplete;
    }
  }
);

lookahead wait for jobcomplete response

The code for QueryUtils

/***@namespace QueryUtils
 *runs server side and handles all database querying
 */
var QueryUtils = (function (ns) {
  /**
   * do a query
   * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
   * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
   * @param {string} projectId the project id
   * @param {string} sqlString the string
   * @return {[object]} the results as array of javacript objects
   */
  ns.query = function (apiKey, accessToken, projectId, sqlString) {
    // we can use exp backoff to get results of job
    var queryJob = cUseful.Utils.expBackoff(
      function () {
        return bigQueryFetch_(apiKey, accessToken, projectId, sqlString);
      },
      {
        lookahead: function (response, attempt) {
          return !response.jobComplete;
        }
      }
    );
    // now retrieve the result
    var result = nextQueryResults_(apiKey, accessToken, queryJob);
    // Get all the rows of results.
    var rows = result.rows;
    while (result.pageToken) {
      result = nextQueryResults_(apiKey, accessToken, result);
      Array.protoype.push.apply(rows, result.rows);
    }
    // the headers
    var headers = result.schema.fields.map(function (field) {
      return field.name;
    });
    // the rows to an array of obs
    return (rows || []).map(function (d) {
      var idx = 0;
      return headers.reduce(function (p, c) {
        p = d.f[idx].v;
        idx  ;
        return p;
      }, {});
    });
  };
  /**
   * take care of returning the pageifed data frm a query
   * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
   * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
   * @param {object} queryResult the query result resource
   * @param {object} the next set of results
   */
  function nextQueryResults_(apiKey, accessToken, queryResult) {
    var response = cUseful.Utils.expBackoff(function () {
      return UrlFetchApp.fetch(
        "https://www.googleapis.com/bigquery/v2/projects/"  
          queryResult.jobReference.projectId  
          "/queries/"  
          queryResult.jobReference.jobId  
          "?apiKey="  
          apiKey  
          (queryResult.pageToken ? "&pageToken="   queryResult.pageToken : ""),
        {
          headers: {
            Authorization: "Bearer "   accessToken
          },
          method: "GET"
        }
      );
    });
    return JSON.parse(response.getContentText());
  }
  /**
   * do initial query
   * @param {string} apiKey for billing - apikey should belong to owner fo bigquery project
   * @param {string} accessToken this is a service token to allow this execution api to run as owner of bigquery project
   * @param {string} projectId the project id
   * @param {string} sqlString the string
   */
  function bigQueryFetch_(apiKey, accessToken, projectId, sqlString) {
    var body = {
      kind: "bigquery#queryRequest",
      query: sqlString
    };
    var result = cUseful.Utils.expBackoff(function () {
      return UrlFetchApp.fetch(
        "https://www.googleapis.com/bigquery/v2/projects/"  
          projectId  
          "/queries?apiKey="  
          apiKey,
        {
          method: "POST",
          headers: {
            Authorization: "Bearer "   accessToken
          },
          payload: JSON.stringify(body),
          contentType: "application/json"
        }
      );
    });
    return JSON.parse(result.getContentText());
  }
  return ns;
})(QueryUtils || {});

queryutils

The execution part

This is what gets called via the ExecutionAPI. There are two types of calls. The first is to return a summary of all the categories in the BigQuery Table. That provides the input for this selection back in the

Here’s the code – of course cache is used wherever possible. Querying BigQuery with the QueryUtils namespace is just a matter of constructig an approriate SQL string.

/**
 * will do a bigquery to return categories
 * @param {number} maxCats maximum categories
 * @param {boolean} noCache whether to suppress cache
 * @return {object} the result
 */
function getCategories(maxCats, noCache) {
  App.init();
  var ag = App.globals.bigQuery;
  var sqlString =
    "SELECT COUNT(*) as num,category FROM"  
    " ["  
    ag.dataStore  
    "."  
    ag.table  
    "]"  
    " GROUP BY category"  
    " ORDER BY num DESC"  
    " LIMIT "  
    maxCats;
  // use cache if allowed
  var cache = CacheService.getScriptCache();
  var cacheKey = cUseful.Utils.keyDigest(sqlString);
  var data = noCache ? null : cache.get(cacheKey);
  var result = data ? JSON.parse(data) : doQuery_(sqlString);
  // always write it to cache regardless
  cache.put(cacheKey, JSON.stringify(result), 60 * 60 * 6);
  return result;
}

The other kind of query is one that returns a random question, along with a number of other answers from the same category which can be used as multiple choice selections given a question.

The selected answers are also shuffled into random order. Note also that duplicate answers are removed to help avoid repetition, and a number of questions are returned in a chunk to reduce the number of calls that need to be made.

/* will be called by any script wanting some questions
 * @param {string} category the category
 * @param {number} numAnswers number of answers to provide
 * @param {number} chunkSize the number of questions to get
 * @return {object] the result
 */
function getQuestions(category, numAnswers, chunkSize) {
  App.init();
  var ag = App.globals.bigQuery;
  // get a number of questions plus a few more in case there are duplicate answers
  var sqlString =
    "SELECT *, rand() as rand FROM"  
    " ["  
    ag.dataStore  
    "."  
    ag.table  
    "]"  
    ' WHERE category = "'  
    category  
    '"'  
    " ORDER BY rand"  
    " LIMIT "  
    Math.ceil(numAnswers * chunkSize * 1.2);
  // do the query
  var data = doQuery_(sqlString);
  // get rid of anything with duplicate answers
  // its possible that this will return less than chunk size
  // but the caller shoudl be asking for more than they need anyway
  data = data.filter(function (d, i, a) {
    return !a.slice(0, i).some(function (e) {
      return e.answer === d.answer;
    });
  });
  // now separate into separate questions - first select a chunk of questions
  var questions = data.splice(0, chunkSize);
  // use the rest for multiple choice potential answers
  return {
    category: category,
    questions: questions.map(function (d) {
      return {
        picked: d,
        answers: shuffleArray([d].concat(data.splice(0, numAnswers - 1))).map(
          function (d) {
            return d.answer;
          }
        )
      };
    })
  };
  // from http://stackoverflow.com/questions/2450954/how-to-randomize-shuffle-a-javascript-array
  function shuffleArray(array) {
    for (var i = array.length - 1; i > 0; i--) {
      var j = Math.floor(Math.random() * (i   1));
      var temp = array[i];
      array[i] = array[j];
      array[j] = temp;
    }
    return array;
  }
}
//and this submits both types of queries, or can be used from the execution API for adhoc queries.
function doQuery_(sqlString) {
  var ag = App.globals.bigQuery;
  return QueryUtils.query(
    App.goa.getProperty("apiKey"),
    App.goa.getToken(),
    ag.projectId,
    sqlString
  );
}

get random question

The Settings

And of course the app settings are in their own namespace too.

/*** used to control the App functions* and hold globals values* @namespace App
*/
var App = (function (ns) {
  ns.globals = {
    bigQuery: {
      projectId: "lateral-command-416",
      dataStore: "jeopardydata",
      table: "questions"
    },
    oauth: {
      packageName: "bigquery_serviceaccount",
      props: PropertiesService.getScriptProperties()
    }
  };
  ns.init = function () {
    ns.goa = cGoa.GoaApp.createGoa(
      ns.globals.oauth.packageName,
      ns.globals.oauth.props
    ).execute();
  };
  return ns;
})(App || {});

settings

Once published as an API executable, that’s all there is to it. I now have a fully delegated way to get data from BigQuery

Calling the execution API from BigQuiz app

The BigQuiz app calls these two functions when it needs data. These two functions are part of its Server namespace.

/**
 * given a category get a selection of random questions
 * @param {string} category the category
 * @param {number} numberOfQuestions how many questions to use for answer multiple choice  * @param {number} [chunkSize=1] the number of question chunks
 * @param {[object]} questions
 */
ns.getQuestions = function (category, numberOfQuestions, chunkSize) {
  return Execution.run(
    "getQuestions",
    [category, numberOfQuestions, chunkSize || 1],
    false
  ).response.result;
};
/**  * get existing categories  * @param {number} maxCats number of cats to get  * @param {[string]} categories
 */
ns.getCategories = function (maxCats, nocache) {
  var execResponse = Execution.run("getCategories", [maxCats, nocache], false);
  return execResponse.response.result;
};
/** Note also that an OAuth2 authorization is needed with the project it's targetting.
 * After setting up some credentials in the target project, BigQuiz needs a one off setting like this.
 * Unfortunately you can't use a service account with the execution API, so this is handled as a
 * dialog in the BigQuiz webapp startup.
 */
cGoa.GoaApp.setPackage(
  PropertiesService.getScriptProperties(),
  cGoa.GoaApp.createPackageFromFile(DriveApp, {
    packageName: Demo.PACKAGE_RUN.name,
    fileId: "xxxxx",
    scopes: cGoa.GoaApp.scopesGoogleExpand([
      "drive",
      "script.external_request",
      "script.scriptapp",
      "script.storage"
    ]),
    service: "google",
    executionProject: "MrH2Hz-cZpO-bGH1gT1NAVKi_d-phDA33"
  })
);