You use a class to describe an object. An object is a collection of whatever you like, and encapsulates both the values and references (properties) and actions (methods) associated with that object. You need to understand classes and objects to follow along with most of the examples in this site, and this is not a tutorial. If you are a ‘class newbie’, you may want to first read Getting Started with Classes. In this section we are however going to build on a basic knowledge of classes to generate some classes that will be used throughout some of our example projects.
Classes to easily read a set of data with headings
One of the most common things we need to do when creating substantial VBA procedures is to go off and read a set of data, and process it in some way. Usually this data is structured as a table type format with headings in the first row, and the data ends at the last used row. This kind of repetitive coding , although easy, usually leads to very specific range and column references being built in to your code, and once you’ve done that, the next time you have to do something like it, you just start again. In this section we are going to provide a template to hold such data in a set of objects that are easily accessible and tightly coupled. Lets first look at what a typical VBA program would look like when we have implemented these classes. dSet.populateData(Range) is all that is required to set up your entire data structure, assuming that you have the standard table format with headings and rows of data.
Option Explicit Public Sub main() Dim dSet As cDataSet ' this will contain all our data Set dSet = New cDataSet ' just need to provide the range where data headings are With dSet .populateData Range("InputData!$a$1:$e$1") If .Where Is Nothing Then MsgBox ("No data to process") Else ''... we can also check we have some mandatory fields if necessary If .HeadingRow.Validate(True, "Activate", "Deactivate") Then ' do something with the data Call doSomething(dSet) End If End If End With End Sub
Given a place to find the headings, we are going to end up with all our data in a cDataSet object, which contains information about where it came from, what the headings are and so. The main benefits of this are
- You will access a data item through its heading name, rather than a specific spreadsheet column or row meaning that your underlying code can be independent of the physical spreadsheet layout.
- That item will always know which cell it came from in case you need to update it
- Accessing data in this object will be more efficient than continually going back and forwards to the spreadsheet cells, which can be quite slow
- Since this is a copy of the data, you don’t need to write back to the cells until you are finished – meaning that an undo capability is easy to implement
- You don’t need to write that boring code again to read through rows in a spreadsheet
- If the physical layout of the data changes, inserted or re-arranged columns you don’t need to change anything
- We can use this example to dive into classes…
As a matter of fact what we need to construct is a ‘mirror’ of many of the cell addressing capabilities that are part of the Excel range object, including Cell, Row, Column etc. Aside from the capabilities mentioned previously – you can separate the data location from what you want to do with it – this is a great example to learn about classes, and we will use all this in other examples to avoid getting sidetracked into what our input data actually needs to look like.
A concern would be that by adding an extra layer, we night be introducing a performance problem. As a matter of fact we will be improving performance in some cases, but to check that out, lets go through a stress test to exercise our class. Better to do that now, and avoid rather than react. If you haven’t read the section on performance testing now would be a good time. This gives you some downloadable tools to automatically measuring the time spent in the various procedures of your VBA procedures. The stress test on our Data Manipulation classes gives this results using the automatic profiler. The stress test is going to compare using these data manipulation classes, rather than access the spreadsheet cells directly, so we can see the performance benefit or hit in particular operations. Before that though, a quick overview on the most commonly used terms in these data manipulation classes
|The structure that will hold all the data items in the spreadsheet range defined by the heading range in dset.PopulateData
|set up the cDataSet structure and copy all the data from the spreadsheet cells
|A row of cells (cDataRow), or a row number, depending on context
|A Column of cells (cDataColumn), or a column number, depending on context
|The value contained in the referenced cCell
|Refresh the .Cell or range of .Cells with latest data from spreadsheet
|Write back the .Value to the appropriate spreadsheet cell or range of cells
|a cJobject structure that contains a hierarchical version of the dataset that can be easily serialized into jSon
The results of the stress test tells us something not only about these data manipulation classes but also about Excel behavior itself. A series of operations were repeated on a test data set of a few thousand rows of data. We want to compare the performance of those procedures that have ‘spreadsheet’ in the name- which read and write directly to the spreadsheet, versus the others which instead use these classes.
Firstly In normal Excel mode, with autoscreenupdate turned on, and calculation set to automatic
- cDataSet reading operations are 5 times faster than Excel, except where we address cells by their column name rather than their column number, where excel is 5 times faster.
- cDataSet writing operations range from 150-5000 times faster than Excel, depending on whether we access by column, row or name
- cDataSet commit operations are more or less the same as writing to the spreadsheet directly (since they have to write to the spreadsheet anyway), although interestingly, a mass commit of the entire dataset is actually 25% faster than excel directly. This is probably because the ranges are directly known and do not need to be incremented as per the direct write to excel.
When you turn off automatic calculation and screen updating in excel, screen updating makes almost no difference, but automatic calculation makes a big difference.
- cDataSet writing operations are still around 30 times faster than Excel, depending on whether we access by column, row or name
- cDataSet commit operations are very marginally slower than Excel directly but only a few %.
All this tells us that we would see very good performance gains using cDataSet rather than using Excel directly, in addition to the benefits mentioned earlier. A downloadable version of this stress test, is available on the downloads page. The next sections will dive into each of classes needed for this and how to use them.