Github as an Apps Script cache platform

Another plugin available for Apps script library with plugins for multiple backend cache platforms so we can use Github as a back end for caching large objects across platforms.

Benefits of using Github

It’s possible to use a github repo as the back end for a caching service, and by taking this approach we can share data between Apps script and multiple platforms. It also means you can retrieve the data with the git CLI in addition. It works in exactly the same way as all the other backends.  I could have used a gist for this, but even a private gist is still public if you know the URL. Using a regulat git Repo means you can make it private and share it using the tools already built into Github.

How to use

There’s a little bit of setup to do first as we need to use the Github API.

Create a repo and and a github oauth application

First create an empty repo to use (or use an existing one).   you need to go the github settings/developer  and register a new oauth app, and pick up a client secret and client id.  You can just put anything in the callback url for now – we’ll get that later.

register github oauth2 app

Setup your Apps Script App to connect to github.

You’ll need to add the cGoa library, then create and run this function using the clientid and secret you got from github.

library id: 1v_l4xN3ICa0lAW315NQEzAHPSoNiFdWHsMEwj2qA5t9cgZ5VWci2Qxv2

You only need to run it once, and you can delete it when done. It’s just creating the mechanism to be able to get a github oauth token when required.

/**
* this stores the credentials for the service in properties
* it should be run once, then deleted
*/
function oneOffStore () {
// this one connects to onedrive
var propertyStore = PropertiesService.getScriptProperties();
cGoa.GoaApp.setPackage (propertyStore ,{
clientId : "xxxxx",
clientSecret : "xxxxx",
scopes : [
'repo',
],
service: 'github',
packageName: 'gitgoa'
});

}
run once then delete

Create a webapp to connect to github

This is another one off activity. You need to connect github to your app, so you need a simple webapp to do that. Create this function and deploy as a webapp, and kick it off.

function doGet(e) {

const goa = cGoa.GoaApp.createGoa('gitgoa',PropertiesService.getScriptProperties()).execute(e);

if (goa.needsConsent()) {
return goa.getConsent();
}

// if we get here its time for your webapp to run and we should have a token, or thrown an error somewhere
if (!goa.hasToken()) throw 'something went wrong with goa';
return HtmlService.createHtmlOutput("You're all set. You can close this tab");

}
one off connection to github

Get the redirect URI and paste it into the github developer console

This will allow you to finish off your github oauth application setup and register your app as genuine. Once you’ve copied this in and saved, you can hit start. It’ll go through a github dialog, then say  ‘You’re all set. You can close this tab’. We’re done, and you can delete the deployment. Goa has all it needs to be able to talk to github and deal with the access token refresh dance in the future.

get github oauth callback

Create the crusher

Just as in the other examples, it starts with a store being passed to the crusher library to be initialized. The plugin is used in exactly the same way as the Google platform plugins – like this. I’m using a repo called ‘-crusher-store’, and want all files to be in the path ‘store’ (the prefix)

  const goa = cGoa.make('gitgoa', PropertiesService.getScriptProperties())
const crusher = new bmCrusher.CrusherPluginGitService().init({
tokenService: () => goa.getToken(),
prefix: 'store',
fetcher: UrlFetchApp.fetch,
repo: '-crusher-store',
owner: 'brucemcpherson',
uselz: true
})
Initialize the crusher

The plugin

We’ll be using the exact same methods introduced in Apps script library with plugins for multiple backend cache platforms will now operate on Github instead of the  other platforms. Here’s a refresher

Writing

All writing is done with a put method.

crusher.put(key, data[,expiry])
write some data

put takes 3 arguments

  • key – a string with some key to store this data against
  • data – It automatically detects converts to and from objects, so there’s no need to stringify anything.
  • expiry – optionally provide a number of seconds after which the data should expire.

Reading

const data = crusher.get (key)
retrieving data

get takes 1 argument

  • key – the string you put the data against, and will restore the data to it’s original state

Removing

crusher.remove(key)
removing an item

Expiry

Github doesn’t support automatic expiry, but keys with the same name will be overwritten, and any expired items will be treated as if they don’t exist.

Here’s what some store entries look like in Github GraphQL explorer. You can see that one of the entries has been distributed across multiple keys to deal with the maximum value size in Github. Getting the value will restore it to its original state.

crusher on github

Fingerprint optimization

Since it’s possible that an item will spread across multiple physical records, we want a way of avoiding rewriting (or decompressing) them if nothing has changed. Crusher keeps a fingerprint of the contents of the compressed item. When you write something and it detects that the data you want to write has the same fingerprint as what’s already stored, it doesn’t bother to rewrite the item.

However if you’ve specified an expiry time, then it will be rewritten so as to update its expiry. There’s a catch though. If your chosen store supports its own automatic expiration (as in the CacheService), then the new expiration wont be applied. Sometimes this behavior is what you want, but it does mean a subtle difference between different stores.

You can disable this behavior altogether when you initialize the crusher.


const crusher = new bmCrusher.CrusherPluginGitService().init({
tokenService: () => goa.getToken(),
prefix: 'store',
fetcher: UrlFetchApp.fetch,
repo: '-crusher-store',
owner: 'brucemcpherson',
uselz: true,
respectDigest: false
})
Always rewrite store even if the data has not changed

Formats

Crusher writes all data as zipped base64 encoded compressed, so the mime type will be text, and will need to be read by bmCrusher to make sense of it.

Notes on github API

If you take a look at the code for the API, you may notice that it uses both the Github GraphQL and the REST API. This is because not all mutations are supported in the GraphQL Api yet, so I had to use the REST one for some methods. If Github expand their API at some point, I may update it to full GraphQL

Compression

All data is compressed in when written to stores, no matter the back end. Initially, when the stores were intended for Apps Script platform only, I was using a type of compression only available on Apps Script, but now that we can go across platforms, we need to use a different form of compression.

I’ve added a polyfill to Apps Script so it supports both, but by default it will use the Apps Script native method. If you intend to share data outside Apps Script, it’s best to use lz compression. You select it with the “uselz” property when initializing the crusher as below. There’s no real penalty in always using lz, irrespective of the back end, but I leave it as an option for backwards compatibility.

  const crusher = new bmCrusher.CrusherPluginGitService().init({
tokenService: () => goa.getToken(),
prefix: 'store',
fetcher: UrlFetchApp.fetch,
repo: '-crusher-store',
owner: 'brucemcpherson',
uselz: true
})
select uselz selection

Plugin code

This plugin is already implemented in the bmCrusher library so you don’t need to do any of this, but I reproduce it here in case you are interested in seeing how to write a bmCrusher plugin for some other backend.


// plugins for Squeeze service
// the 'store' in this case is the full name of a repo eg brucemcpherson/cGoa
function CrusherPluginGitService() {

// writing a plugin for the Squeeze service is pretty straighforward.
// you need to provide an init function which sets up how to init/write/read/remove objects from the store
// this example is for the Apps Script Advanced Drive service
const self = this;

// these will be specific to your plugin
let _settings = null;
let _fetcher = null;

// the prefix is the path in the repo to hold stuff like this
const fixPrefix = (prefix) => prefix ? (prefix "/").replace(/\/ /g, '/').replace(/\/ $/, '/') : ''

// standard function to check store is present and of the correct type
function checkStore() {
if (!_settings.repo) throw "You must provide the repo to use";
if (!_settings.owner) throw "You must provide the owner of the repo to use";
if (!_settings.chunkSize) throw "You must provide the maximum chunksize supported";
if (!_settings.prefix) throw 'The prefix is the path in the repo to start storing data at';
if (!_settings.tokenService || typeof _settings.tokenService !== 'function') throw 'There must be a tokenservice function that returns an oauth token';
if (!_settings.fetcher || typeof _settings.fetcher !== 'function') throw 'There must be a fetch function that can do a urlfetch (url,options)';
return self;
}

const getQuery = ({ store, key, getContent = false }) => {
const { repo, owner, prefix } = store
const expression = "HEAD:" prefix key
return {
query: `query ($repo: String! , $owner: String!, $expression: String) {
repository(owner: $owner, name: $repo) {
object(expression: $expression) {
... on Blob {
oid
${getContent ? 'text' : ''}
}
}
}
}`,
variables: {
repo,
owner,
expression
}
}
}


/**
* @param {object} settings these will vary according to the type of store
*/
self.init = function (settings) {

_settings = settings || {};
_settings.prefix = fixPrefix(_settings.prefix)


const store = {
rest: 'https://api.github.com/',
gql: 'https://api.github.com/graphql',
prefix: _settings.prefix,
owner: _settings.owner,
repo: _settings.repo
}


// set default chunkzise for github (500k)
_settings.chunkSize = _settings.chunkSize || 500000;

// respect digest can reduce the number of chunks read, but may return stale
_settings.respectDigest = Utils.isUndefined(_settings.respectDigest) ? false : _settings.respectDigest;

// must have a cache service and a chunksize, and the store must be valid
checkStore();

// initialize the fetcher
_fetcher = new Fetcher(_settings).got

// now initialize the squeezer
self.squeezer = new Squeeze.Chunking()
.setStore(store)
.setChunkSize(_settings.chunkSize)
.funcWriteToStore(write)
.funcReadFromStore(read)
.funcRemoveObject(remove)
.setRespectDigest(_settings.respectDigest)
.setCompressMin(_settings.compressMin)
.setUselz(_settings.uselz || false)
// the prefix is handled in the store, so we can ignore it here
.setPrefix('');

// export the verbs
self.put = self.squeezer.setBigProperty;
self.get = self.squeezer.getBigProperty;
self.remove = self.squeezer.removeBigProperty;
return self;

};

// return your own settings
function getSettings() {
return _settings;
}


/**
* remove an item
* @param {string} key the key to remove
* @return {object} whatever you like
*/
function remove(store, key) {
checkStore();
const url = getUrl(store, key)

// so we need to get the sha in case its an update rather than a new entry
const getItem = _fetcher(url)

const sha = getItem && getItem.success && getItem.data && getItem.data.sha

// prepare the data
const body = {
message: `bmcrusher:${key}`,
sha
}

const result = _fetcher(url, {
method: 'DELETE',
payload: JSON.stringify(body),
headers: {
accept: 'application/vnd.github.v3 json'
}
})
return result

}


const getUrl = (store, key) => {
const { repo, owner, prefix } = store
return store.rest `repos/${owner}/${repo}/contents/${prefix}/${key}`.replace(/\/ /g, '/')
}

/**
* write an item
* @param {object} store whatever you initialized store with
* @param {string} key the key to write
* @param {string} str the string to write
* @param {number} expiry time in secs .. ignored in drive
* @return {object} whatever you lik
*/
function write(store, key, str = '', expiry) {
checkStore();
const url = getUrl(store, key);

// so we need to get the sha in case its an update rather than a new entry
const getItem = _fetcher(url)
const sha = getItem && getItem.success && getItem.data && getItem.data.sha

// prepare the data
const body = {
content: Utilities.base64Encode(str),
message: `bmcrusher:${key}`,
sha
}

const result = _fetcher(url, {
payload: JSON.stringify(body),
method: 'PUT',
contentType: "text/plain",
headers: {
accept: 'application/vnd.github.v3 json'
}
})

if (!result.success) {
throw new Error(result.content)
}
return result.data;
}

const getGql = (store, key) => {

const payload = JSON.stringify(getQuery({ store, key, getContent: true }))

const result = _fetcher(store.gql, {
payload,
method: 'POST',
contentType: "application/json"
})
return result
}

/**
* read an item
* @param {object} store whatever you initialized store with
* @param {string} key the key to write
* @return {object} whatever you like
*/
function read(store, key) {
checkStore();
const result = getGql(store, key)

const data = result && result.success && result.data && result.data.data
return data && data.repository && data.repository.object && data.repository.object.text

}

}
github bmcrusher plugin

Links

bmCrusher

library id: 1nbx8f-kt1rw53qbwn4SO2nKaw9hLYl5OI3xeBgkBC7bpEdWKIPBDkVG0

Github: https://github.com/brucemcpherson/bmCrusher

Scrviz: https://scrviz.web.app/?repo=brucemcpherson/bmCrusher

Apps script library with plugins for multiple backend cache platforms

cGoa

library id: 1v_l4xN3ICa0lAW315NQEzAHPSoNiFdWHsMEwj2qA5t9cgZ5VWci2Qxv2

Github: https://github.com/brucemcpherson/cGoa

Scrviz: https://scrviz.web.app/?repo=brucemcpherson/cGoa

How fast can you get OAuth2 set up in Apps Script