Acknowledgement

Much of the JavaScript and d3.js came from Mike Bostok’s site. If you use any of this material, please be sure to maintain the acknowledgement for his work.

There have been a few features added to this since this entry. See these blog posts – adding urls to Sankey link​s and nodes, inheriting data ​cell colors for Sankey links, and another Sankey example

These are implemented in the cDataSet workbook, downloadable from here.

If you are Google Sheets user, here’s an Add-on to create these directly in Sheets.

What are Sankey diagrams?

They are designed to show flows through a network, and are sometimes called flow diagrams. In this example we will take some Excel data and directly create an interactive Sankey diagram. This supposed to represent the breakdown of effort that a fictional company spends in various functions and roles. Here is a screenshot. You can try the interactive version here (note that this doesn’t work properly with IE8 and below)

 

Here’s a screenshot for those with older browsers.

sankey from excel

Data Structure

This kind of visualization can be thought of as being a column chart that also shows the connections between sections of each column. That means that each data item needs to describe

  • A source and target (which column sections are connected)
  • A value (which values transfer between each column section)
  • Labels (what to call each column section)

Now it may be that you want to give two columns the same name (let’s say this represented some kind of transformation where the labels before and after were the same), so it means that I needed to allow for an ID (to represent the source and target) and a label in the data

format. Here is a section of the input data that created the above, which you can find in the Sankey tab of the cDataSet.xlsm download 

data source for Sankey diagram

To create a Sankey diagram, you simply need to construct a table as above.

Tweaking the format

As in other projects on this site, the javaScript, css and html that does all this is stored as a series of parameter values in a parameter sheet- in this case the sankeyParameters sheet. This looks like this

tweaks to format Sankey diagram

Most of this you won’t need to touch, but you can tweak the style (these are css style definitions) and header parameters to affect the layout and content of the final web page. It is also possible to change the operation of the chart if you are comfortable with d3.js – you can find this in the chartCode parameter. The generated filename is set with the htmlName parameter.

Some directories on your machine may be read only to you – for example, if you have downloaded this and run it from the download directory on windows 8 you may not be able to create a file in that directory because of local permission settings. You should set this to a directory you have permission to create files in.

This is a single file with no dependencies (except to the d3.js library) and you can put it on a web site or mail it to someone as you wish. Note that IE8 and below may not work. Chrome and Firefox are good.

 

The VBA Code

 

All code is implemented in the cDataSet.xlsm download.  Here is the calling procedure to generate and execute the chart code.

Public Sub testSan()
    Dim ds As New cDataSet, cs As New cSankey, dsParam As New cDataSet, _
        js As String, content As String
    ' generate a sankey chart from excel data
    ' get parameters
    dsParam.populateData wholeSheet("sankeyparameters"), , , True, "Item", , True
    ' get data values as json
    js = cs.init(ds.populateData(wholeSheet("sankey"), , , , , , True)).jSon
    ' construct the executable file
    With dsParam
        content = .cell("titles", "value").toString
        content = content & _
            .cell("defaultStyles", "value").toString
        content = content & _
            .cell("sankeyStyles", "value").toString
        content = content & _
            .cell("header", "value").toString
        content = content & _
            .cell("sankeyCode", "value").toString
        content = content & _
            ""
        content = content & _
            .cell("chartCode", "value").toString
        With .cell("htmlname", "value")
            openNewHtml .toString, content
            If Not OpenUrl(.toString) Then
                MsgBox ("could not open " & .toString & " using default browser")
            End If
        End With
    End With
End Sub

I have introduced a new class – cSankey – which is reponsible for organizing the Excel data, and creating a suitable jSon representation as expected by the d3.js sankey plugin.

 

Here is the code for that class

' this is used to create sankey diagram input jSon data to D3.js plugin
Option Explicit
Private pUnique As cJobject
Private pDset As cDataSet
Private pIgnoreZero As Boolean
Public Property Get dSet() As cDataSet
    Set dSet = pDset
End Property
Public Property Get unique() As cJobject
    Set unique = pUnique
End Property
Public Property Get jSon(Optional optBeautify As Boolean = False) As String
    Dim s As String
    s = Trim(jObject.serialize(optBeautify))
    ' special tweak for json format needed- get rid of extra {}
    jSon = Mid(s, 2, Len(s) - 2)
End Property
Public Property Get jObject() As cJobject
    'return a jobject version for input to D3.js sankey diagram
    Dim cs As cJobject, dr As cDataRow, cc As cCell, _
            job As cJobject, nodes As cJobject, links As cJobject
    Set job = New cJobject
    Set job = job.init(Nothing)
    Set nodes = job.add("nodes").addArray
    For Each cs In pUnique.Children
        With nodes.add
            .add "name", cs.toString
        End With
    Next cs
    Set links = job.add("links").addArray
    For Each dr In pDset.rows
        If (dr.cell("Value").Value <> 0 Or Not pIgnoreZero) Then
            With links.add
                .add "source", getUniqueIndex(dr.cell("SourceID").toString)
                .add "target", getUniqueIndex(dr.cell("TargetID").toString)
                .add "value", dr.cell("Value").Value
            End With
        End If
    Next dr
    Set jObject = job
End Property
Private Function getUniqueIndex(s As String) As Long
    Dim job As cJobject
    Set job = pUnique.Find(makekey(s))
    If (job Is Nothing) Then
        getUniqueIndex = -1
    Else
        getUniqueIndex = job.childIndex - 1
    End If
End Function
Public Function init(ds As cDataSet, Optional optIgnore As Boolean = True) As cSankey
    Dim a As Variant, i As Long, cc As cCell, ct As Collection
    pIgnoreZero = optIgnore
    Set pUnique = New cJobject
    pUnique.init(Nothing).addArray
    Set pDset = ds
    ' populated with these fields
    If (ds.headingRow.validate(True, "SourceID", "TargetID", "SourceLabel", _
        "TargetLabel", "Value")) Then
        ' sort out the names - each source or target id needs a label
        ' these are all the source labels
        Set ct = ds.column("SourceID").uniqueValues(eSortAscending)
        For Each cc In ct
            pUnique.add makekey(cc.toString), ds.cell(cc.row, "SourceLabel").toString
        Next cc
        ' the missing ones we will use the target label
        Set ct = ds.column("TargetID").uniqueValues(eSortAscending)
        ' the unknown  ones need the ToLabel
        For Each cc In ct
            If (getUniqueIndex(cc.toString) = -1) Then
                  pUnique.add makekey(cc.toString), ds.cell(cc.row, "TargetLabel").toString
            End If
        Next cc
        Set init = Me
    End If
End Function

You can find out more about other d3.js visualizations from Excel here.