You use a class to describe an object. An object is a collection of whatever you like, and encapsulates both the values and references (properties) and actions (methods) associated with that object. You need to understand classes and objects to follow along with most of the examples in this site, and this is not a tutorial. If you are a ‘class newbie’, you may want to first read Getting Started with Classes. In this section we are however going to build on a basic knowledge of classes to generate some classes that will be used throughout some of our example projects.
Classes to easily read a set of data with headings
Option Explicit Public Sub main() Dim dSet As cDataSet ' this will contain all our data Set dSet = New cDataSet ' just need to provide the range where data headings are With dSet .populateData Range("InputData!$a$1:$e$1") If .Where Is Nothing Then MsgBox ("No data to process") Else ''... we can also check we have some mandatory fields if necessary If .HeadingRow.Validate(True, "Activate", "Deactivate") Then ' do something with the data Call doSomething(dSet) End If End If End With End Sub
- You will access a data item through its heading name, rather than a specific spreadsheet column or row meaning that your underlying code can be independent of the physical spreadsheet layout.
- That item will always know which cell it came from in case you need to update it
- Accessing data in this object will be more efficient than continually going back and forwards to the spreadsheet cells, which can be quite slow
- Since this is a copy of the data, you don’t need to write back to the cells until you are finished – meaning that an undo capability is easy to implement
- You don’t need to write that boring code again to read through rows in a spreadsheet
- If the physical layout of the data changes, inserted or re-arranged columns you don’t need to change anything
- We can use this example to dive into classes…
Why?
Performance
Item | Meaning |
cDataSet | The structure that will hold all the data items in the spreadsheet range defined by the heading range in dset.PopulateData |
.populateData | set up the cDataSet structure and copy all the data from the spreadsheet cells |
.Cell | A cCell |
.Row | A row of cells (cDataRow), or a row number, depending on context |
.Column | A Column of cells (cDataColumn), or a column number, depending on context |
.Value | The value contained in the referenced cCell |
.Refresh | Refresh the .Cell or range of .Cells with latest data from spreadsheet |
.Commit | Write back the .Value to the appropriate spreadsheet cell or range of cells |
.jObect | a cJobject structure that contains a hierarchical version of the dataset that can be easily serialized into jSon |
The results of the stress test tells us something not only about these data manipulation classes but also about Excel behavior itself. A series of operations were repeated on a test data set of a few thousand rows of data. We want to compare the performance of those procedures that have ‘spreadsheet’ in the name- which read and write directly to the spreadsheet, versus the others which instead use these classes.
Conclusions
- cDataSet reading operations are 5 times faster than Excel, except where we address cells by their column name rather than their column number, where excel is 5 times faster.
- cDataSet writing operations range from 150-5000 times faster than Excel, depending on whether we access by column, row or name
- cDataSet commit operations are more or less the same as writing to the spreadsheet directly (since they have to write to the spreadsheet anyway), although interestingly, a mass commit of the entire dataset is actually 25% faster than excel directly. This is probably because the ranges are directly known and do not need to be incremented as per the direct write to excel.
When you turn off automatic calculation and screen updating in excel, screen updating makes almost no difference, but automatic calculation makes a big difference.
- cDataSet writing operations are still around 30 times faster than Excel, depending on whether we access by column, row or name
- cDataSet commit operations are very marginally slower than Excel directly but only a few %.