Simple implementation of R- melt

Hadley Wickham has added some data management capabilities to R which are pretty useful for getting data ready for tabulation or visualization. One of these is melt(). Melt can do lots of things, but one that caught my eye in particular was the ability to turn a tabular data representation in to a transactional one - a kind of unpivot. In other words to turn this



into this


Tony Hirst pointed this out in one of his posts about R, and it got me thinking that mashing up something like this (a kind of unpivot table) in both VBA and Google Apps Script would be pretty straightforward, since I have all the components already. 

Usage

So lets start with how to use it. At a minimum we need to say something like this. 

Public Sub testMelt()
    reshapeMelt "{'outputSheet':'meltOut','id':['id','time']}"
End Sub

This looks like a strange combination of javaScript and VBA - and I guess it is.  I'm trying to both minimize the work to make a google apps script version of everything so using jSon for specifying options is a good way to average things out.  If you need to recap,  I cover jSon arguments in Excel in this blog post.

reshapeMelt actually returns a cDataSet of the output data, so it can be used directly to chain to other procedures that know how to process these. I'll show and example of this in a later post.

Options

All options are specified in one jSon string (or as close to one as I can get within the confines of VBA syntax). The full range of options are as below. I'll no doubt be adding to this as a template for all default options for this topic.

{'complain':true, 
 'inputSheet':'meltOut',
 'variableColumn' : 'variable', 
 'valueColumn' : 'value', 
 'id':['id'] ,
 'outputSheet': 'rOutputData' , 
 'clearContents':true}

Walkthough

The arguments work like jQuery $.extend() in the sense that any arguments you do specify simply override the defaults. In this case "{'outputSheet':'meltOut','id':['id','time']}"  xx outputSheet specifies where to put the result, and  the ID area specifies the columns that need to be combined to produce a unique key for each generated transaction - just like in the rshape addon - melt(mydata, id=c("id","time")) 


The code

Most of this is just patching together existing capabilities - mainly cJobject and cDataSet. I had to add a few bits and pieces to cJobject but nothing major.  I'll post the Google Apps version shortly. 

Public Function reshapeMelt(options As String) As cDataSet
    
    ' this is a very basic start at vba implementation of Hadley Wickhams  melt(R)
    Dim jArgs As cJobject, ds As cDataSet, cj As cJobject, _
        r As Range, ws As Worksheet, dr As cDataRow, dsOut As cDataSet, _
        dc As cCell, dsre As cDataSet
        
    ' sort out the options
    Set jArgs = optionsExtend(options, rOptionDefaults)
    
    ' check for argument programming syntax error
    Debug.Assert Not jArgs Is Nothing
    With jArgs
        If .toString("inputsheet") = .toString("outputsheet") Then
            MsgBox ("Reading and writing to the same sheet - not allowed")
            Exit Function
        End If
    End With
    ' read input sheet
    Set ds = New cDataSet
    If ds.populateData _
        (wholeSheet(jArgs.toString("inputsheet")), , , , , , True) Is Nothing Then
        Exit Function
    End If
    
    ' check we have everything we need
    With jArgs
        For Each cj In .child("id").children
            If Not ds.headingRow.validate(.cValue("complain"), cj.toString) Then
                Exit Function
            End If
        Next cj
    ' check if output sheet exists?
        Set ws = sheetExists(.toString("outputSheet"), .cValue("complain"))
        If ws Is Nothing Then
            Exit Function
        End If
     ' good to go
        Set r = ws.Cells(1, 1)
        If .cValue("clearContents") Then
            ws.Cells.ClearContents
        End If
      ' make headings
        For Each cj In .child("id").children
            r.value = cj.value
            Set r = r.Offset(, 1)
        Next cj
        r.value = .toString("variableColumn")
        r.Offset(, 1).value = .toString("valueColumn")
        
        ' lets get that in a dataset for abstracted column access
        Set dsOut = New cDataSet
        dsOut.populateData ws.Cells.Resize(1, r.column + 1)
        
        ' now data
        Set r = dsOut.headingRow.Where.Offset(1).Resize(1, 1)
        For Each dr In ds.rows
            For Each dc In dr.columns
                
                ' need to generate a new row for each non ID cell
                If .child("id").valueIndex _
                        (ds.headings(dc.column).toString) = 0 Then
                    
                    ' the id fields
                    For Each cj In .child("id").children
                        r.Offset(, dsOut.headingRow.exists(cj.toString).column - 1).value = dr.value(cj.toString)
                    Next cj
                    
                    ' this variable value
                    r.Offset(, _
                        dsOut.headingRow.exists(.toString("valueColumn")).column - 1).value _
                            = dc.value
                    ' and its name
                    r.Offset(, _
                        dsOut.headingRow.exists(.toString("variableColumn")).column - 1).value _
                            = ds.headings(dc.column).value
                
                    Set r = r.Offset(1)
                End If
            Next dc

        Next dr
     End With
     ' send back what we just did
     Set dsre = New cDataSet
     Set reshapeMelt = dsre.populateData(dsOut.headingRow.Where.Resize(r.row - 1))
End Function

setting up the default options for re-use
Public Function rOptionDefaults() As String
    ' this sets up the defaults for all R related stuff
     rOptionDefaults = _
            "{'complain':true, 'inputSheet':'" & ActiveSheet.name & "'," & _
          "'variableColumn' : 'variable', 'valueColumn' : 'value', 'id':['id'] ," & _
          "'outputSheet': 'rOutputData' , 'clearContents':true}"
          
End Function

Modify jSon args to go as many levels deep as required for optional arguments
Public Function optionsExtend(givenOptions As String, _
            Optional defaultOptions As String = vbNullString) As cJobject
    Dim jGiven As cJobject, jDefault As cJobject, _
        jExtended As cJobject, cj As cJobject
    ' this works like $.extend in jQuery.
    ' given and default options arrive as a json string
    ' example -
    ' optionsExtend ("{'width':90,'color':'blue'}", "{'width':20,'height':30,'color':'red'}")
    ' would return a cJobject which serializes to
    ' "{width:90,height:30,color:blue}"
    Set jGiven = jSonArgs(givenOptions)
    Set jDefault = jSonArgs(defaultOptions)
    
    ' now we combine them
    If Not jDefault Is Nothing Then
        Set jExtended = jDefault
    Else
        Set jExtended = New cJobject
        jExtended.init Nothing
    End If
    
    ' now we merge that with whatever was given
    If Not jGiven Is Nothing Then
        jExtended.merge jGiven
    End If
    
    ' and its over
    Set optionsExtend = jExtended
End Function

a few more methods for cJobject to be able to tweak existing cJObjects
Public Function merge(mergeThisIntoMe As cJobject) As cJobject
    ' merge this cjobject with another
    ' items in merged with are replaced with items in Me
    Dim cj As cJobject, p As cJobject
    
    Set p = Me.find(mergeThisIntoMe.fullKey(False))
    
    If p Is Nothing Then
    ' i dont have it yet
        Set p = Me.append(mergeThisIntoMe)
    Else
    ' actually i do have it already
        If p.isArrayRoot Then
            ' but its an array - i need to get rid of it
            Set p = p.remove
            Set p = p.append(mergeThisIntoMe)
        Else
            p.value = mergeThisIntoMe.value
        End If
    End If
    ' now the other childreb tio merge in
    For Each cj In mergeThisIntoMe.children
       p.merge cj
    Next cj
    Set merge = Me

End Function
Public Function remove() As cJobject
    ' removes a branch
    Dim cj As cJobject, p As cJobject, i As Long
    
    Debug.Assert Not parent Is Nothing
    Debug.Assert parent.hasChildren
    
    parent.children.remove childIndex
    ' fix the childindices
    i = 0
    For Each cj In parent.children
        i = i + 1
        cj.childIndex = i
    Next cj
    Set remove = parent

End Function
Public Function append(appendThisToMe As cJobject) As cJobject
    ' append another object to me
    Dim cj As cJobject, p As cJobject
    If Not appendThisToMe.parent.isArrayRoot Then
        Set p = Me.add(appendThisToMe.key, appendThisToMe.value)
    Else
        Set p = Me.add(, appendThisToMe.value)
    End If
    If appendThisToMe.isArrayRoot Then p.addArray
    For Each cj In appendThisToMe.children
       p.append cj
    Next cj
    Set append = Me
End Function

and that's it - happy melting. You can download all this at Downloads in the cDataSet.xlsm. 

Why not join our forum,follow the blog or follow me on twitter to ensure you get updates when they are available. For more fooling around with jSon in VBA, take a look at jSon 


Comments