Why abstract anyway?

In a number of articles on this site and elsewhere, I have covered topics such as Getting data from Outlook Address Lists into Excel , Serializing Excel data for input to any Google visualizationExcel Generate High Quality RoadMaps, and several others, I’ve referred to data manipulation classes as building blocks and used them in the illustrating examples. Abstracting data from its location means that whatever code you write that will operate on some Excel data sets will be much cleaner than if you are constantly referring to particular columns or worksheets. This is especially important if the structure of your worksheet is likely to change, for example by introducing additional columns, or moving things around, which in the end, it probably will if your workbook is to have any kind of longevity. In this article I will demonstrate the use of abstraction classes for the Excel workbook object, all of which you can download for your own use.

What to download

All the examples contain all the classes needed for them to work and all projects can be found here.

If you just want the main utility classes used throughout this site, in the Downloads section cDataSet.xlsm includes everything you need including some examples.

If you are interested in further abstracting the data source, for example using SQL to populate your dataset, you may want to take a look at Execute SQL from Excel

What do I mean my abstracting?

The Excel tables capabilities from VBA 2007 certainly start to provide this capability, however in the real world we still have to deal with all kinds of Excel versions, so I have kept away from those for now. Additionally, once you have abstracted your data structure away from Excel, you can start to easily integrate other data sources such as JSON feeds , separate the business logic from the data access, and really start using Excel as a consolidation tools from multiple data sources. Let’s take a look at some typical VBA data manipulation type activities, using these new classes.

Consider the following code to populate a column of worksheet with some calculation based on other columns. You can see that this approach would continue to work even if the underlying worksheets or column positions changes, and there is no need to worry about how many rows in the “Orders” worksheet. It figures it out.

For Each dr In .Rows
dr.Cell(“Total”).Value = dr.Cell(“Quantity”).Value * dr.Cell(“Price”).Value
Next dr

Next look at this example of copying a few columns from one worksheet to another. This would copy just the four named columns, in the order given, from a dataset to another sheet, starting at the given output range. This can easily be done in just one line of code.

.bigCommit Range(“Summary!$a$1”), True, Array(“customer”, “contact”, “total”, “country”)

You can see from the above examples that data is referred to by its column heading, rather than its range. In a table with both row labels and column labels, it is possible to reference a single cell value, for example.

Msgbox ( “Total orders for Acme is “ & dSet.Cell(“Acme”,”Total”).toString)

How to set up data manipulation class structures.

Naturally we need to set all this up, but first a quick summary of these classes, you will see they are pretty much like their native Excel equivalents, and have a range of methods and properties that are also similar.

Class name Description
cDataSets A collection of cDataSet. Somewhat equivalent to an Excel Workbook
cDataSet The representation of structured data in a range, with column headings and optionally, row labels. Somewhat equivalent to an Excel Worksheet, except that there can be serveral cDataSets in a single worksheet
cDataRow A row of data in a cDataSet
cDataColumn A column of data in a cDataSet
cHeadingRow A special case of cDataRow that contains each of the column Headings in the cDataSet
cCell a single data cell.

Here is a simple main procedure using these, setting up the data structures, and doing something with them. The populate data method along with the start cell of the data is all that is typically required.

Public Sub mainExample()
Dim dSet As cDataSet

Set dSet = New cDataSet
With dSet
.populateData Range(“Orders!$a$1”), , , , , , True

If .Where Is Nothing Then
MsgBox (“No data to process”)
Else
Call dosomething(dSet)
End If
End With
End Sub

So lets put all that together, and do something with this new structure This is going to check that all the columns we need exist, do some calculation, write back the result, then copy a selection of those columns to another sheet. Note the use of the .Commit method. This is because the actual Excel sheet is not updated until a .Commit is executed. As well as simplifying rollback, this can make huge performance improvements, which I will demonstrate in a future article, using an automatic VBA profiler.

Private Sub dosomething(dSet As cDataSet)
    Dim dr As cDataRow
    With dSet
        ''... we can also check we have some mandatory fields if necessary
        If .HeadingRow.Validate(True, "Customer", "Country", "Total", "Quantity", "Price", "Contact") Then
            ' do some manipulation
            For Each dr In .Rows
                dr.Cell("Total").Value = dr.Cell("Quantity").Value * dr.Cell("Price").Value
            Next dr
            ' write the total column back, since we've updated it
            .Column("total").Commit
            ' now copy a couple of columns to another sheet
            .bigCommit Range("Summary!$a$1"), True, Array("customer", "contact", "total", "country")
        End If
    End With
End Sub

Aside from being very concise, this approach will continue to work as the structure and layout of the sheet changes. There is no need for dynamic named ranges or other mechanisms to figure out the size of the table – just a blank row to mark the end of the data.

A parameter sheet

Sometimes it’s useful to have a parameter sheet to provide treatment instructions depending on data contents. In this case we are going to develop a couple of different blocks of parameters, specified as simple Excel tables. The parameter tab contains 2 blocks, ‘Formats’ and ‘Country’. We are going to use our test data to look up values in this parameter table, copy them to the output table, and also set formats based on certain conditions.

In this case, then we are going to create 2 dataSets from one Worksheet, and we will be able access cells by specifying both the column name and row name.  For example dSets(“country”).cells(“France”,”Language”).Value will return “French”. In the previous example, we created a summary data table that now looks like this.

We are now going to create a new tab consisting of all this data, plus some from the parameter sheet, and format the cells based on the total expenditure according to the parameter sheet. This will give this output.
Here’s how.

Option Explicit
Public Sub parameterExample()
    Dim dSets As cDataSets
    Set dSets = New cDataSets
    With dSets
        .create
        ' input summary data
        .init Range("summary!$a$1"), , "summary", , , True
        ' parameter blocks
        .init wholeSheet("parameters"), , "formatparameters", True, "Formats"
        .init wholeSheet("parameters"), , "countryparameters", True, "Country"
    End With
    ' process
    CreateContacts dSets
End Sub
Private Sub CreateContacts(dSets As cDataSets)
    Dim dr As cDataRow, dSet As cDataSet, s As String
    With dSets.DataSet("summary")
        ''... we can also check we have some mandatory fields if necessary
        If .HeadingRow.Validate(True, "customer", "total", "country", "contact") Then
            ' now copy it all to another sheet
            .bigCommit Range("Contact!$a$1"), True
        Else
            Exit Sub
        End If
    End With
    ' we have copied the summary tab to the contacts tab - add to the datasets
    With dSets.init(Range("contact!$a$1"), , "contact", , , True)
        ' and add a couple of columns
        lastCell(.HeadingRow.Where).Offset(, 1).Value = "Language"
        lastCell(.HeadingRow.Where).Offset(, 2).Value = "Dial Code"
        lastCell(.HeadingRow.Where).Offset(, 3).Value = "Comments"
        .bigCommit
    End With
    ' now create the completed dataset with the new columns
    Set dSet = dSets.init(Range("contact!$a$1"), , "contactfinal", , , True)
    ' look up the country parameter data
    With dSets.DataSet("countryparameters")
        For Each dr In dSet.Rows
            dr.Cell("dial code", True).Value = .Cell(dr.Cell("country", True).Value, "dial code").Value
            dr.Cell("language", True).Value = .Cell(dr.Cell("country", True).Value, "language").Value
        Next dr
    End With
    dSet.bigCommit
    ' move in appropriate comment and clone the format
    With dSets.DataSet("formatparameters")
        For Each dr In dSet.Rows
            If dr.Cell("total", True).Value > 100 Then
                s = "big"
            Else
                s = "small"
            End If
            dr.Cell("comments").Commit (.Cell(s, "comments").Value)
            cloneFormat .Cell(s, "format").Where, dr.Where
        Next dr
    End With
End Sub
Private Sub cloneFormat(b As Range, a As Range)
    ' this probably needs additional properties copied over
    With a.Interior
        .Color = b.Interior.Color
    End With
    With a.Font
        .Color = b.Font.Color
        .Size = b.Font.Size
    End With
    With a
        .HorizontalAlignment = b.HorizontalAlignment
        .VerticalAlignment = b.VerticalAlignment
    End With
End Sub

Summary

You can read in more detail about each of the methods and properties implemented here. Aside from the benefits of abstraction, I encourage you to profile your code (you can find an automatic profiler here) to see how much more efficient it can be to manipulate in this way than to read and write cells in a large worksheet. The bigCommit in particular writes a whole sheet in one operation, rather than a cell at a time for example. For now, these functions only apply to Excel values (not formulas). You can download these examples and other stuff here. As always I appreciate your comments, questions or anything else on this forum or here .