Why abstract anyway?
In a number of articles on this site and elsewhere, I have covered topics such as Getting data from Outlook Address Lists into Excel , Serializing Excel data for input to any Google visualization, Excel Generate High Quality RoadMaps, and several others, I’ve referred to data manipulation classes as building blocks and used them in the illustrating examples. Abstracting data from its location means that whatever code you write that will operate on some Excel data sets will be much cleaner than if you are constantly referring to particular columns or worksheets. This is especially important if the structure of your worksheet is likely to change, for example by introducing additional columns, or moving things around, which in the end, it probably will if your workbook is to have any kind of longevity. In this article I will demonstrate the use of abstraction classes for the Excel workbook object, all of which you can download for your own use.
What to download
All the examples contain all the classes needed for them to work and all projects can be found here.
If you are interested in further abstracting the data source, for example using SQL to populate your dataset, you may want to take a look at Execute SQL from Excel
What do I mean my abstracting?
The Excel tables capabilities from VBA 2007 certainly start to provide this capability, however in the real world we still have to deal with all kinds of Excel versions, so I have kept away from those for now. Additionally, once you have abstracted your data structure away from Excel, you can start to easily integrate other data sources such as JSON feeds , separate the business logic from the data access, and really start using Excel as a consolidation tools from multiple data sources. Let’s take a look at some typical VBA data manipulation type activities, using these new classes.
Consider the following code to populate a column of worksheet with some calculation based on other columns. You can see that this approach would continue to work even if the underlying worksheets or column positions changes, and there is no need to worry about how many rows in the “Orders” worksheet. It figures it out.
For Each dr In .Rows
dr.Cell(“Total”).Value = dr.Cell(“Quantity”).Value * dr.Cell(“Price”).Value
Next look at this example of copying a few columns from one worksheet to another. This would copy just the four named columns, in the order given, from a dataset to another sheet, starting at the given output range. This can easily be done in just one line of code.
.bigCommit Range(“Summary!$a$1”), True, Array(“customer”, “contact”, “total”, “country”)
You can see from the above examples that data is referred to by its column heading, rather than its range. In a table with both row labels and column labels, it is possible to reference a single cell value, for example.
Msgbox ( “Total orders for Acme is “ & dSet.Cell(“Acme”,”Total”).toString)
How to set up data manipulation class structures.
Naturally we need to set all this up, but first a quick summary of these classes, you will see they are pretty much like their native Excel equivalents, and have a range of methods and properties that are also similar.
|cDataSets||A collection of cDataSet. Somewhat equivalent to an Excel Workbook|
|cDataSet||The representation of structured data in a range, with column headings and optionally, row labels. Somewhat equivalent to an Excel Worksheet, except that there can be serveral cDataSets in a single worksheet|
|cDataRow||A row of data in a cDataSet|
|cDataColumn||A column of data in a cDataSet|
|cHeadingRow||A special case of cDataRow that contains each of the column Headings in the cDataSet|
|cCell||a single data cell.|
Here is a simple main procedure using these, setting up the data structures, and doing something with them. The populate data method along with the start cell of the data is all that is typically required.
Public Sub mainExample()
Dim dSet As cDataSet
Set dSet = New cDataSet
.populateData Range(“Orders!$a$1”), , , , , , True
If .Where Is Nothing Then
MsgBox (“No data to process”)
So lets put all that together, and do something with this new structure This is going to check that all the columns we need exist, do some calculation, write back the result, then copy a selection of those columns to another sheet. Note the use of the .Commit method. This is because the actual Excel sheet is not updated until a .Commit is executed. As well as simplifying rollback, this can make huge performance improvements, which I will demonstrate in a future article, using an automatic VBA profiler.
Private Sub dosomething(dSet As cDataSet) Dim dr As cDataRow With dSet ''... we can also check we have some mandatory fields if necessary If .HeadingRow.Validate(True, "Customer", "Country", "Total", "Quantity", "Price", "Contact") Then ' do some manipulation For Each dr In .Rows dr.Cell("Total").Value = dr.Cell("Quantity").Value * dr.Cell("Price").Value Next dr ' write the total column back, since we've updated it .Column("total").Commit ' now copy a couple of columns to another sheet .bigCommit Range("Summary!$a$1"), True, Array("customer", "contact", "total", "country") End If End With End Sub
Aside from being very concise, this approach will continue to work as the structure and layout of the sheet changes. There is no need for dynamic named ranges or other mechanisms to figure out the size of the table – just a blank row to mark the end of the data.
A parameter sheet
Sometimes it’s useful to have a parameter sheet to provide treatment instructions depending on data contents. In this case we are going to develop a couple of different blocks of parameters, specified as simple Excel tables. The parameter tab contains 2 blocks, ‘Formats’ and ‘Country’. We are going to use our test data to look up values in this parameter table, copy them to the output table, and also set formats based on certain conditions.
In this case, then we are going to create 2 dataSets from one Worksheet, and we will be able access cells by specifying both the column name and row name. For example dSets(“country”).cells(“France”,”Language”).Value will return “French”. In the previous example, we created a summary data table that now looks like this.
We are now going to create a new tab consisting of all this data, plus some from the parameter sheet, and format the cells based on the total expenditure according to the parameter sheet. This will give this output.
Option Explicit Public Sub parameterExample() Dim dSets As cDataSets Set dSets = New cDataSets With dSets .create ' input summary data .init Range("summary!$a$1"), , "summary", , , True ' parameter blocks .init wholeSheet("parameters"), , "formatparameters", True, "Formats" .init wholeSheet("parameters"), , "countryparameters", True, "Country" End With ' process CreateContacts dSets End Sub Private Sub CreateContacts(dSets As cDataSets) Dim dr As cDataRow, dSet As cDataSet, s As String With dSets.DataSet("summary") ''... we can also check we have some mandatory fields if necessary If .HeadingRow.Validate(True, "customer", "total", "country", "contact") Then ' now copy it all to another sheet .bigCommit Range("Contact!$a$1"), True Else Exit Sub End If End With ' we have copied the summary tab to the contacts tab - add to the datasets With dSets.init(Range("contact!$a$1"), , "contact", , , True) ' and add a couple of columns lastCell(.HeadingRow.Where).Offset(, 1).Value = "Language" lastCell(.HeadingRow.Where).Offset(, 2).Value = "Dial Code" lastCell(.HeadingRow.Where).Offset(, 3).Value = "Comments" .bigCommit End With ' now create the completed dataset with the new columns Set dSet = dSets.init(Range("contact!$a$1"), , "contactfinal", , , True) ' look up the country parameter data With dSets.DataSet("countryparameters") For Each dr In dSet.Rows dr.Cell("dial code", True).Value = .Cell(dr.Cell("country", True).Value, "dial code").Value dr.Cell("language", True).Value = .Cell(dr.Cell("country", True).Value, "language").Value Next dr End With dSet.bigCommit ' move in appropriate comment and clone the format With dSets.DataSet("formatparameters") For Each dr In dSet.Rows If dr.Cell("total", True).Value > 100 Then s = "big" Else s = "small" End If dr.Cell("comments").Commit (.Cell(s, "comments").Value) cloneFormat .Cell(s, "format").Where, dr.Where Next dr End With End Sub Private Sub cloneFormat(b As Range, a As Range) ' this probably needs additional properties copied over With a.Interior .Color = b.Interior.Color End With With a.Font .Color = b.Font.Color .Size = b.Font.Size End With With a .HorizontalAlignment = b.HorizontalAlignment .VerticalAlignment = b.VerticalAlignment End With End Sub
You can read in more detail about each of the methods and properties implemented here. Aside from the benefits of abstraction, I encourage you to profile your code (you can find an automatic profiler here) to see how much more efficient it can be to manipulate in this way than to read and write cells in a large worksheet. The bigCommit in particular writes a whole sheet in one operation, rather than a cell at a time for example. For now, these functions only apply to Excel values (not formulas). You can download these examples and other stuff here. As always I appreciate your comments, questions or anything else on this forum or here .