Complex aynchronous web services

Here's where it gets really interesting. Let's say that we need to get some large number of spreadsheets from Google Docs. We want to do it asynchronously, and we also know that google docs will fail if it receives to many requests, or it's too busy. So we have to orchestrate and deal with all of the following.
  • Getting the schema to know which sheets exists
  • Getting some unknown number of sheets and populating the respective Excel sheets when they return
  • Throttle failures on Google Docs. We'll use exponential backoff  and SetTimer and VBA to deal with that
  • Ensuring that things don't slip out of memory when the calling procedure exits. 
  • JSON and Google Wire deciphering to interpret the returned data
All of this will be promise based, which should make the whole thing digestible. As input we'll use the university tables that were used in the javacript app, University Rankings visualized


The calling app

I've generalized this so it can be used to retrieve any Google Spreadsheet. Here's  the shell. You can substitute in the key for any Google Spreadsheet. The only thing of consequence here is that the import process is kicked off asynchronously, and control will immediately be returned to the caller, which will then exit.

NOTE:  with the new Google Sheets, this method has been largely deprecated. There is a complete revamp of this here Get Data From Google Docs. You can also check out the Sheets apps from Apps built with data abstraction, which can both read and write from Excel to Sheets, but this post may be useful since it is about asynchronicity more than the utility of the app itself.

Public Sub asynchDocs()
    Dim url As String, key As String
    key = "0At2ExLh4POiZdFd0YUhpZVRPUGxFcW85X2xkMm1vY2c"
    url = "https://spreadsheets.google.com/feeds/worksheets/" + key + "/public/basic?alt=json"
    
    ' kick this off in the background
    doDocsWithBackoff url
    '
    ' we can go and do something else now
    Debug.Print "Im doing something else - have a nice day"
    
End Sub

The promise orchestrator

Here's the code that kicks everything off. Let's walk through

Public Sub doDocsWithBackoff(url As String)
    Dim b As cBackoff, p As cPromise, prs As cPromise, _
        process As cDocsImporter, doneSchema As cDeferred, doneSheets As cDeferred
        
    ' always to this for a fresh start - it will clear up any memory/timers from previously
    clearRegister
    
    ' this class knows how to get google spreadsheet
    Set process = New cDocsImporter
    Set doneSchema = New cDeferred
    
    ' first phase, we get the schema, and release up memory
    Set b = New cBackoff
    b.execute(url) _
        .done(process, "getSchema", doneSchema) _
        .fail process, "show"
    
    ' although this will be done synchronously, using a promise means we can know its done
    Set prs = deleteAllTheSheets
    
    ' we'll have resolved doneSchema when the previous .done is executed
    ' we'll need another deferred to tell us when all this is over
    Set doneSheets = New cDeferred
    when(Array(prs, doneSchema.promise)) _
        .done(process, "getSheets", doneSheets, url) _
        .fail process, "show"

    ' now we can clear everything up
    doneSheets.promise _
        .done(register, "teardown") _
        .done(process, "teardown") _
        .fail process, "show"

End Sub

clearRegister

When executing this asynchronously, one of the issues is to stop things disappearing from memory when the module that initiates them goes out of scope. One way is create a tangle of public variables - but that will be hard to manage, even harder to clear out, and impossible when the task is data driven as it is in this case. The promise framework maintains a register of objects that are going to be needed later, and keeps them in memory. This simply clears out any hanging around from previous asynch sessions. It should be the first thing called. 

    ' always to this for a fresh start - it will clear up any memory/timers from previously
    clearRegister

callBack class

One of the limitations of VBA is that you can only callback class members. I recommend that you centralize all specific methods needed for a particular project  in a class like this - in my case I've created a class called cDocsImporter that will know all about how to handle google Docs Data. 

    ' this class knows how to get google spreadsheet
    Set process = New cDocsImporter

getting the schema

Here's the first deferred class - doneSchema. It 's promise will be used as a signal that we've managed to get (or failed to get) schema data from Google Docs. This is the definition of all the sheets in the Spreadsheet. 

    Set doneSchema = New cDeferred
  

getting schema data using cBackoff 

The cBackoff class knows how to get data asynchronously - it uses Promises in VBA too - and also how to deal with throttling using an exponential backoff algorithm. For that it needs to use SetTimer and VBA. The cBackoff will kick off retrieval data from the given url, and immediately give control back. It returns a promise. 

When that gets resolved (within the cBackoff class), it will execute  b.done() by calling process.getSchema(). The data retrieved by the cBackoff class will get passed to process.getSchema() also. If the cBackoff class is rejected (meaning it failed to get data), it will execute b.fail() by calling process.show() to report whatever the cause of the failure was.

Finally it will resolve (or reject) doneSchema. We can use doneSchema.promise() as a signal shortly to know that we've finished getting the schema.
  
    ' first phase, we get the schema, and release up memory
    Set b = New cBackoff
    b.execute(url) _
        .done(process, "getSchema", doneSchema) _
        .fail process, "show"
  

using promises for synchronous activities

The use of promises is not limited to asynchronous activities.  You can use it to detect whether something has happened successfully or failed. You can also start synchronously, but later convert to asynchronous without having to modify the code structure. In this case we need to execute something that Excel doesnt know how to do asynchronously - delete all the sheets in the workbook. Since we have a lull in activities whilst waiting for the schema to come back, this is the perfect time. 
    ' although this will be done synchronously, using a promise means we can know its done
    Set prs = deleteAllTheSheets

Here is the deleteallthesheets function. Note that it creates and resolves a promise in line, since the activity is synchronous, and returns a promise. 

Private Function deleteAllTheSheets() As cPromise
    ' its possible to use promises for synchronous tasks too
    Dim d As cDeferred
    Set d = New cDeferred
    ' dont want complaints
    Application.DisplayAlerts = False
    
    ' delete all but one sheet
    While sheets.Count > 1
      sheets(1).Delete
    Wend
    Application.DisplayAlerts = True
    
    d.resolve (Array("sheets deleted"))
    Set deleteAllTheSheets = d.promise
End Function

The reason we use a promise here is that next we'll want to retrieve each of the spreadsheets from withing the Google Workbook. However we don't want to start doing that until both the schema has finished being retrieved and the old sheets have been deleted. It is almost certain that the sheet deletion would be finished being executed before the Schema was returned anyway, but we can use both promises now before continuing.

Using When

You can test that multiple things have completed using when(). In our case we want to check that both the sheets are finished being deleted and the schema has been retrieved before starting to get data back.
    ' we'll have resolved doneSchema when the previous .done is executed
    ' we'll need another deferred to tell us when all this is over
    Set doneSheets = New cDeferred
    when(Array(prs, doneSchema.promise)) _
        .done(process, "getSheets", doneSheets, url) _
        .fail process, "show"

When actually returns a promise. This means that you can use .done() and .fail(). All the promises need to be successful for when().done() to execute. If any one fails, then when().fail() will be executed instead. The promises to be evaluated are presented as an array(promise1,promise2,...promisen). When() will receive a signal when each of those promises is either resolved or rejected. In our case we want both of array(prs, doneSchema.promise()) to be successfully resolved before continuing. We'll also need another promise. This time to signal that we've got all the data. process.getSheets() will resolve (or reject) doneSheets when all the data has been retrieved and the sheets populated.  Aside from taking multiple promises, When() will pass on the data from the last promise in the array - in this case the data that was signalled by doneScheme.resolve(data).

Clearing up

Since the procedure will have long exited by now, its up to you to clean up the objects that will have been forced to remain in memory beyond their regular scope. We can detect when everything is completed either by using When() . done() or fail(). In our case, we know that doneSheets.promise() could only have been resolved if everything else had been done, so we can simply test for the doneSheets.promise().done() signal
    ' now we can clear everything up
    doneSheets.promise _
        .done(register, "teardown") _
        .done(process, "teardown") _
        .fail process, "show"

It is good practice to create a .tearDown method for classes in order to properly remove it from memory. In many cases, setting its reference to Nothing will not work (where classes have dual links to each other), leading to a memory leak. In our case, we've managed to retain them in memory by creating a link to them in a public register. Register.teardown() will not only remove the link, but also execute each objects teardown method (if it has one). Doing this at the end of everything will recover the memory used during all the asynchronous activity.

The cDocsImporter class

All the hard work interpreting google docs formats is done by calling back various methods in this class. There's nothing much new in here, json and google docs importing having been covered elsewhere in this site, but here's how it all fits together, and how the promises are rejected or resolved.

Option Explicit

Public Sub getSchema(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called when I get the schema.
    ' the schema will be the 2nd argument of a(), 1st will be the url
    Dim c As cJobject
    Set c = getJsonDataFromArray(a)
    ' register it so it gets cleaned up later
    keepInRegister c, "schema", "cJobject"
    ' we'll pass back the parsed schema
    defer.resolve (Array(c))
End Sub
Public Sub storeData(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called each time I have some data from the vizAPI
    ' the schema will be the 2nd argument of a(), 1st will be the url
    
    Dim c As cJobject, job As cJobject, jr As cJobject, jc As cJobject, _
        s As String, joc As cJobject, r As Range, w As Worksheet
    Set c = getJsonDataFromArray(a, eDeserializeGoogleWire)
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    ' add a worksheet
    Set w = Worksheets.add()
    w.name = CStr(args(LBound(args)))
    Set r = w.Cells(1, 1)
    '
    'put the data
    '
    Set jr = c.find("rows")
    Set jc = c.find("cols")
    
    ' here's the column headings
    '
    If Not jc Is Nothing Then
        For Each job In jc.children
            s = job.child("label").value
            If s = vbNullString Then
                s = job.child("id").value
            End If
            r.Offset(, job.childIndex - 1).value = s
        Next job
    End If
    '
    ' and these are the rows
    '
    If Not jr Is Nothing Then
        For Each job In jr.children
            For Each joc In job.child("c").children
                r.Offset(job.childIndex, joc.childIndex - 1).value = joc.child("v").value
            Next joc
        Next job
    End If
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
    ' release the cJobject memory
    c.tearDown
    ' signal that all is good
    defer.resolve (a)
End Sub
Public Sub getSheets(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called when I want to get all the data
    ' the schema will be the 1st argument of a()
    '
    
    Dim c As cJobject, cj As cJobject, job As cJobject, _
        joc As cJobject, s As String, u() As String, url As String, i As Long, _
        d() As cPromise, b As cBackoff, def As cDeferred, testing As Boolean, maxU As Long
        
    ' limit processing for debugging by setting this true
    testing = False
    
    Set c = a(LBound(a))
    Set cj = c.find("feed.entry")
    If cj Is Nothing Then
        defer.reject (Array("could not find feed urls in schema entries"))
        Exit Sub
    End If
    ' these will be the urls for each sheet to get
    maxU = cj.children.Count
    If (testing) Then maxU = 1
    ReDim u(0 To maxU - 1)
    ReDim d(0 To maxU - 1)
    ReDim t(0 To maxU - 1)
    
    If (arrayLength(u) = 0) Then
        defer.reject (Array("there were no sheets in the schema"))
        Exit Sub
    End If
    
    ' extract the vizapi links
    For Each job In cj.children
        If job.childIndex > maxU Then Exit For
        url = vbNullString
        For Each joc In job.child("link").children
            s = joc.toString("rel")
            ' we're going to use the link for viz api
            If InStr(1, s, "visualizationApi") Then
                url = joc.toString("href")
            End If
        Next joc
        If (url = vbNullString) Then
            defer.reject (Array("couldnt find the vizapi link in the schema"))
            Exit Sub
        End If
        ' store it
        u(job.childIndex - 1) = url
        If (job.child("title.$t") Is Nothing) Then
            defer.reject (Array("couldnt find the tiles node in the schema"))
            Exit Sub
        End If
        t(job.childIndex - 1) = job.toString("title.$t")
    Next job
    ' when we get here, we have to set up a promise for each of the urls and get the data
    For i = LBound(u) To UBound(u)
        Set b = New cBackoff
        Set def = New cDeferred
        Set d(i) = def.promise
        b.execute(u(i)) _
            .done(Me, "storeData", def, Array(t(i))) _
            .fail Me, "show"
    
    Next i
    'signal the urls have been processed
    when(d) _
        .done(Me, "resolve", defer) _
        .fail Me, "reject", defer
        
End Sub
Public Sub resolve(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    defer.resolve (Array("urls done"))
End Sub
Public Sub reject(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    defer.resolve (Array("urls done"))
End Sub

Public Sub tearDown(a As Variant, Optional defer As cDeferred, Optional args As Variant)

End Sub
Private Sub class_initialize()
    keepInRegister Me, , "cProcessData"
End Sub

Private Function getJsonDataFromArray(a As Variant, _
        Optional t As eDeserializeType = eDeserializeNormal) As cJobject
    ' this will organize the data returned from a standard httpdeferred
    Dim c As cJobject, s As String
    Set c = New cJobject
    If t = eDeserializeGoogleWire Then
        s = cleanGoogleWire(CStr(a(LBound(a) + 1)))
    Else
        s = CStr(a(LBound(a) + 1))
    End If
    
    With c.init(Nothing)
        .add "url", CStr(a(LBound(a)))
        .add("data").append JSONParse(s, t)
    End With
    Set getJsonDataFromArray = c
End Function

Public Sub show(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    Dim i As Long
    Debug.Print "show:";
    For i = LBound(a) To UBound(a)
        Debug.Print a(i)
    Next i
    Debug.Print ""
End Sub

getSchema method

This will be called when it's time to extract the schema from the data returned by the cBackoff class. Every method that is likely to be called by a .done() or a .fail() should have the same argument list. 

    a As Variant - this is the data that will have been signalled in .resolve(array(data))  and will be an array of whatever
    Optional defer As cDeferred - this can be used to signal completion of this task through a promise
    Optional args As Variant - this is another array that this time can be passed via .done()


Public Sub getSchema(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called when I get the schema.
    ' the schema will be the 2nd argument of a(), 1st will be the url
    Dim c As cJobject
    Set c = getJsonDataFromArray(a)
    ' register it so it gets cleaned up later
    keepInRegister c, "schema", "cJobject"
    ' we'll pass back the parsed schema
    defer.resolve (Array(c))
End Sub

The json of the schema get parsed and creates a cJobject . We register it so that it doesnt slip out of memory, but also so that it can be easily cleaned up later (cjobject has a teardown class that will be executed by register.teardown()). Finally we resolve the work and pass on the cJobject containing the schema data.

getSheets method

Once getSchema (and deleteallthesheets) is resolved, this will be called to go and get each of the sheets referenced in the schema. Much f the code here is simply about converting google wire protocol. The interesting part is this section....

    ' when we get here, we have to set up a promise for each of the urls and get the data
    For i = LBound(u) To UBound(u)
        Set b = New cBackoff
        Set def = New cDeferred
        Set d(i) = def.promise
        b.execute(u(i)) _
            .done(Me, "storeData", def, Array(t(i))) _
            .fail Me, "show"
    
    Next i

Each sheet is retrieved asynchronously using a cBackoff, and its promise is stored in an array of promises. On each resolution, the storeData() method will be called to move the data into the Excel sheet. So there are a whole stream of data fetches and data stores all happening at the same time.

Public Sub getSheets(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called when I want to get all the data
    ' the schema will be the 1st argument of a()
    '
    
    Dim c As cJobject, cj As cJobject, job As cJobject, _
        joc As cJobject, s As String, u() As String, url As String, i As Long, _
        d() As cPromise, b As cBackoff, def As cDeferred, testing As Boolean, maxU As Long
        
    ' limit processing for debugging by setting this true
    testing = False
    
    Set c = a(LBound(a))
    Set cj = c.find("feed.entry")
    If cj Is Nothing Then
        defer.reject (Array("could not find feed urls in schema entries"))
        Exit Sub
    End If
    ' these will be the urls for each sheet to get
    maxU = cj.children.Count
    If (testing) Then maxU = 1
    ReDim u(0 To maxU - 1)
    ReDim d(0 To maxU - 1)
    ReDim t(0 To maxU - 1)
    
    If (arrayLength(u) = 0) Then
        defer.reject (Array("there were no sheets in the schema"))
        Exit Sub
    End If
    
    ' extract the vizapi links
    For Each job In cj.children
        If job.childIndex > maxU Then Exit For
        url = vbNullString
        For Each joc In job.child("link").children
            s = joc.toString("rel")
            ' we're going to use the link for viz api
            If InStr(1, s, "visualizationApi") Then
                url = joc.toString("href")
            End If
        Next joc
        If (url = vbNullString) Then
            defer.reject (Array("couldnt find the vizapi link in the schema"))
            Exit Sub
        End If
        ' store it
        u(job.childIndex - 1) = url
        If (job.child("title.$t") Is Nothing) Then
            defer.reject (Array("couldnt find the tiles node in the schema"))
            Exit Sub
        End If
        t(job.childIndex - 1) = job.toString("title.$t")
    Next job
    ' when we get here, we have to set up a promise for each of the urls and get the data
    For i = LBound(u) To UBound(u)
        Set b = New cBackoff
        Set def = New cDeferred
        Set d(i) = def.promise
        b.execute(u(i)) _
            .done(Me, "storeData", def, Array(t(i))) _
            .fail Me, "show"
    
    Next i
    'signal the urls have been processed
    when(d) _
        .done(Me, "resolve", defer) _
        .fail Me, "reject", defer
        
End Sub

Finally, we can use when() to resolve that we have completely finished all the retrievals, since we have an array of the all the promises that have been issued, each of which will have been resolved when the data from them has been transferred to excel by storeData. 

storeData method

This is the final step. Data from each sheet is passed here and converting to a cJobject. When done its promise will be resolved. These promises make up the list that getSheets() is waiting for before finally declaring victory.

Public Sub storeData(a As Variant, Optional defer As cDeferred, Optional args As Variant)
    ' this will be called each time I have some data from the vizAPI
    ' the schema will be the 2nd argument of a(), 1st will be the url
    
    Dim c As cJobject, job As cJobject, jr As cJobject, jc As cJobject, _
        s As String, joc As cJobject, r As Range, w As Worksheet
    Set c = getJsonDataFromArray(a, eDeserializeGoogleWire)
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    ' add a worksheet
    Set w = Worksheets.add()
    w.name = CStr(args(LBound(args)))
    Set r = w.Cells(1, 1)
    '
    'put the data
    '
    Set jr = c.find("rows")
    Set jc = c.find("cols")
    
    ' here's the column headings
    '
    If Not jc Is Nothing Then
        For Each job In jc.children
            s = job.child("label").value
            If s = vbNullString Then
                s = job.child("id").value
            End If
            r.Offset(, job.childIndex - 1).value = s
        Next job
    End If
    '
    ' and these are the rows
    '
    If Not jr Is Nothing Then
        For Each job In jr.children
            For Each joc In job.child("c").children
                r.Offset(job.childIndex, joc.childIndex - 1).value = joc.child("v").value
            Next joc
        Next job
    End If
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
    ' release the cJobject memory
    c.tearDown
    ' signal that all is good
    defer.resolve (a)
End Sub

Summary

Although conceptually this is a little harder to deal with than if this had been done synchronously, the use of promises helps  control the complex asynchronous orchestration. In fact  I can't imagine doing it without it now. See more about  Promises in VBA


For help and more information join our forum,follow the blog or follow me on twitter .

Comments