Apps Script V8: Arraybuffers and Typed arrays, endianness and views

Page Content hide

1 Motivation

2 Why you might need this

3 The base class

4 Endianness

5 Dataviews

6 Getters and methods

6.1 Checktype

7 Class extensions

7.1 ABmp, AGif, APng and AJpg

In Apps Script V8: Arraybuffers and Typed arrays we had a first look at how in v8, you can process and map binary data to JavaScript types for easy access. In Apps Script V8: Multiple script files, classes and namespaces, we also looked at v8 classes. This article will dig in a little bit more and look at how to extend classes and apply that easily handling data structures presented as an array of bytes as you’d have to do when dealing with binary data from a file.

Motivation

IMG files are a good example of complicated structured you might need to dig into, so let’s create a standard IMG class, with extensions for jpg, gif, png and BMP variants. From that, we’ll be able to extract the width and height (a much simpler way of doing this would be to look at the imageMediaMetadata property of the response from a Drive API query, but there’s really no fun in that)

Why you might need this

If you are dealing with Binary files or streaming, ArrayBuffers and Typed arrays complement the Apps Script Blob utilities.

The base class

Each of our image types follows the same principle, but the details vary. We’ll use the idea of a base class (AImg) , which will be common across all image types, and a class extension tweaking the specifics for each image type. Here’s the constructor

class AImg {
  constructor (bytes) {
    // confirm endianness
    this._littleEndian = new Uint8Array(new Uint16Array([0xbbee]).buffer)[0] === 0xee;
    if (!this._littleEndian) {
      throw 'woah - apps script should be little endian!!'
    }
    // create a buffer and populate it with the initial bytes 
    this._buffer = new Uint8Array(bytes).buffer
    this._type = 'unknown'
    
    // use this view to extract numbers
    this._view = new DataView(this._buffer)
  
    // which are generally 16 bit ones
    this._bits = 16
  }

There’s a few things to expand upon here.

Endianness

Machine architecture (especially in the early days), varied in the way that numbers are stored. Those of you who grew up having to juggle between mainframes, minis and then Intel will probably be familiar with the complication of the byte order of numbers, but nowadays it’s pretty standard. However, some of these image file formats were created a long time ago, so the order in which they store bytes internally reflected the machines the creators were using at the time, so this is a little problem you have to be aware of when dealing with binary data.

See if you can figure out what this is doing.

this._littleEndian = new Uint8Array(new Uint16Array([0xbbee]).buffer)[0] === 0xee;
if (!this._littleEndian) {
  throw 'woah - apps script should be little endian!!'
}

The two main types of ‘endianness’ (there used to be others) are ‘big endian’ and ‘little endian’, and it refers to whether the most significant bytes come first (big endian) or last (little endian).

Here’s how to test.

create an array with a single 2 byte(16bit) number and get that as a buffer
```
new Uint16Array([0xbbee]).buffer
```
convert that buffer to a 1-byte array, and take the first element
```
new Uint8Array(new Uint16Array([0xbbee]).buffer)[0]
```

If the value of that first byte matches the least significant part of the 16-bit number we first thought of, then the machine architecture is little endian.

this._littleEndian = new Uint8Array(new Uint16Array([0xbbee]).buffer)[0] === 0xee;
if (!this._littleEndian) {
  throw 'woah - apps script should be little endian!!'
}

Dataviews

In Apps Script V8: Arraybuffers and Typed arrays I used regular ArrayBuffer syntax like

this._id = new Int32Array(this._buffer, 0 ,1)

to map buffer offets to types of data, but given that we now have to deal with endianness of the binary data potentially being different from that of the machine architecture processing it, we need another mechanism. That’s where data views come in. A data view is defined like this and provides a ‘window’ onto the buffer from which data can be extracted as various types.

this._view = new DataView(this._buffer)

Extracting a 32-bit number from a buffer, taking account of endianness

getValue (offset) {
  return this.view.getUint32(offset, this._littleEndian )
}

Since the size of numbers will vary between file types, we can generalize this a bit with

  getValue (offset) {
    return this.view[`getUint${this._bits}`](offset, this._littleEndian )
  }

Where this._bits will be set for each image type.

Getters and methods

Here’s the complete base class with its getters and methods.

class AImg {

  constructor (bytes) {

    // confirm endianness
    this._littleEndian = new Uint8Array(new Uint16Array([0xbbee]).buffer)[0] === 0xee;
    if (!this._littleEndian) {
      throw 'woah - apps script should be little endian!!'
    }
    // create a buffer and populate it with the initial bytes 
    this._buffer = new Uint8Array(bytes).buffer
    this._type = 'unknown'
    
    // use this view to extract numbers
    this._view = new DataView(this._buffer)
  
    // which are generally 16 bit ones
    this._bits = 16
  }

  bufferToString (buf) {
    const b = new Uint8Array(buf)
    //terminated by \0 if its shorter than the length of the allocated buffer
    const len = b.indexOf(0)
    return String.fromCharCode.apply(null, buf.slice(0,len < 0 ? b.length : len ) || b.length)
  }
  // convert to hex and leading '0' fill
  bufferToHex (buf) { 
    return Array.from(buf).map(f=>f.toString(16).padStart(2,'0')).join('')
  }
  // return the current buffer as an array of bytes
  get bytes () {
    return Array.from(new Uint8Array(this.buffer))
  }
  get buffer () {
    return this._buffer
  }
  // the version is the file identifier
  get version () {
    return this.bufferToString(this._version)
  }
  // the view can be reused to overlay the buffer for different types
  get view () {
    return this._view
  }
  // use this to extract a number
  getValue (offset) {
    return this.view[`getUint${this._bits}`](offset, this._littleEndian )
  }
  get width (){
    return this.getValue (this._widthOffset) 
  }
  get height () {
    return this.getValue (this._heightOffset) 
  }
  get type () {
    return this._type
  }


  // validate that file ident is indeed what is expected for that type of file
  checkType ({ident, type, value = null, hex = true}) {
    // convenience to allow ident to be supplied in either hex or string
    // normally we're checking agsint the version, but this is to enable ad hoc checks
    if(hex) {
      value = this.bufferToHex(value || this._version)
    } else {
      // this.version is already a string so no need to convert it
      value = value || this.version
    }

    if (ident !== value) {
      throw `${value} is not ${ident} (${type})`
    } else {
      this._type = type
    }
  }
}

Checktype

These image files are generally identified by a signature of some kind. The checkType method will validate that the signature (usually mapped to this_.version) is indeed the expected one.

Class extensions

Extending a class means taking one already defined class, and making a new class, adding new stuff and/or overriding existing properties or methods in the original. We’ll create 4 new classes

ABmp, AGif, APng and AJpg

Mainly, we just need a constructor which sets up the parameters that differ between image types. These will generally refer to the offset to find the height and width.

Note the syntax for defining an extension, and also that the constructor of an extension first needs to call super (args). This executes the constructor of the base class it’s based on before continuing on with its own constructor tasks. It’s important that you always do this.

ABmp

class ABmp extends AImg {
  constructor (bytes) {
    super (bytes)
    this._version = new Uint8Array(this.buffer,0, 2 )
    this._widthOffset = 18
    this._heightOffset = 22

    this.checkType({
      ident:'424d', 
      type: 'bmp' 
    })
  }
}

AGif

class AGif extends AImg {
  constructor (bytes) {
    super (bytes)
    // specific to a GIF
    this._version = new Uint8Array(this.buffer,0, 6 )
    this._widthOffset = 6
    this._heightOffset = 8

    this.checkType({
      ident:'GIF89a', 
      type: 'gif',
      hex: false 
    })
  }
}

APng

The png file has a couple of specifics

it’s bigendian
the numbers are 32 bit rather than 16 bit

class APng extends AImg {
  constructor (bytes) {
    super (bytes)
    // starts with a PNG signature
    this._version = new Uint8Array(this.buffer,0, 8 )

    // PNG is big endian, so we need to use a view to swap the bytes
    this._littleEndian = false
    this._widthOffset = 16
    this._heightOffset = 20
    this._bits = 32
    // this is the signature of a png file
    this.checkType({
      ident:'89504e470d0a1a0a', 
      type: 'png'
    })
  }

}

AJpg

The JPG is the most complex of the group, as it’s divided into blocks, with each block holding information about what it is and how big it is. The means that we have to skip through the file looking for the block (known as SOF0) that contains the width and height. It’s also bigEndian. Like the other classes, the objective to find the offset of the width and height so we can use a data view to extract those values.

class AJpg extends AImg {
  constructor (bytes) {
    super (bytes)
    this._littleEndian = false
    const type = 'jpg'
    const data = new Uint8Array(this.buffer)
    
    // first find the position of the SOI marker
    let blockIndex = data.findIndex((f,i,a)=>f===0xff && a[i+1] === 0xd8)
    if(blockIndex === -1) throw new Error('couldnt find SOI in jpg file')

    // check the type
    const jfifIndex = blockIndex + 6
    const jfif = this.bufferToString(data.slice(jfifIndex, jfifIndex + 4))
    this.checkType({
      ident:'JFIF', 
      type,
      hex: false,
      value: jfif
    })
    // skip the SOI as its different format that the rest
    blockIndex += (this.view.getUint16(blockIndex+4, this._littleEndian) + 4)
   
    // sure we have a jpeg now
    // we need to keep skipping frames till we find the SOF0 frame, identified by 
    const sof0Marker = 0xc0
    while (blockIndex < data.length) {
      const blockSize = this.view.getUint16(blockIndex+2, this._littleEndian) + 2
      const startMarker = this.view.getUint8(blockIndex)
      const marker = this.view.getUint8(blockIndex+1)
      if (startMarker !== 0xff) {
        throw new Error(`expected a block marker ff but found ${startMarker.toString(16)}`)
      }
     
      if (marker === sof0Marker) {
        this._widthOffset = blockIndex +5
        this._heightOffset = blockIndex +7
        // force an exit
        blockIndex = data.length +1
      } else {
        // skip this block
        blockIndex += blockSize
      }

    }

  }
}

Using the classes

That’s the hard bit done. Now it’s very simple to use them to get the width/height info from Image files. Here’s a selector that will pick the correct class depending on the mime-type of a blob.

const getImg = (blob) => {
  const type = blob.getContentType()
  const bytes = blob.getBytes()
  switch (type) {
    case 'image/gif':
      return new AGif(bytes)
    case 'image/bmp':
      return new ABmp(bytes)
    case 'image/jpeg':
    case 'image/jpg':
      return new AJpg(bytes)
    case 'image/png':
      return new APng(bytes)
    default:
      throw new Error(`unknown type ${type}`)   
  }
}

Get a file and call that

const getFileInfo = (id) => {
  const file =  DriveApp.getFileById(id)
  const blob = file.getBlob()
  return {
    file,
    img: getImg(blob)
  }
}

Finally, an example for each

const getPng = () => {
  const {img} = getFileInfo('0B92ExLh4POiZSEl6d3lDc2xWSnc')
  console.log(`${img.width} ${img.height} ${img.type}`)
} // 280 280 png

const getJpg = () => {
  const {img} = getFileInfo('0B92ExLh4POiZV2QzQVEzUTRIOWc')
  console.log(`${img.width} ${img.height} ${img.type}`)
} // 407 640 jpg

const getGif = () => {
  const {img} = getFileInfo('1qOvw2VKX7CjQy7PmerbPC55sZrCgv3s9')
  console.log(`${img.width} ${img.height} ${img.type}`)
} //168 160 gif

const getBmp = () => {
  const {img} = getFileInfo('1-gfFSwZgcl7xowKjih9jfERquYS_yIiL')
  console.log(`${img.width} ${img.height} ${img.type}`)
} // 300 150 bmp

Summary

In an ideal world, a couple of simple setters is all that would be needed to update the width and height to rescale the file, but sadly it’s not as simple as that with image files. There are already ways of doing that using Apps Script APIS and libraries, so that’s for another day and another article.