Optimizing the array formula

What can you learn here ?
  • Optimization and profiling
  • filling ranges using VBA
  • array formulas

Array formulas and large ranges

However, it really wasn't the best solution. What would have been better would have been to use VBA to automate the creation of formulas, and optimize the formulas so that they would be viable with large data sets. This article will walkthrough the optimization excercise.

The analysis

You can see the detail of analysis on the vba solution versus the array formula solution, as well as a description of the problem in Array formulas and large ranges

In the array formula version, the 2 biggest items were populating the .arrayformula property using a countifs array formula, and then the calculation speed. However, it is not necessary to use an array formula for the countifs operation if we know how to fill a range with VBA, which turns out to be a highly efficient operation. Making that small change gives us this result for various test sample sizes using the Automatic Profiling tools on this site. Clearly it makes a difference not only for execution, but also for population if we use an array or not. 

the data...

Profiling results and changing the formulae

The population of the Data Manipulation Classes and the creation of unique values are trivial, as is the MMULT array operation, so focusing first on the countifs operation we can move from 

    tableentries.FormulaArray = _
        "=COUNTIFS(" & list( _
            SAd(listcustomer), _
            SAd(tableHeadingCustomer), _
            SAd(listProduct), _
            SAd(tableheadingProduct)) & ")"

to this
    tableentries.Resize(1, 1).Formula = _
        "=COUNTIFS(" & list( _
            SAd(listcustomer), _
            SAd(tableHeadingCustomer, tableentries, True, True), _
            SAd(listProduct), _
            SAd(tableheadingProduct, tableentries, True, , True) _
            ) & ")"

this more or less entirely eliminates the setup time for the formula population, which was almost half the overall time - for the 50,000 test data set - 1500 seconds! Note that we have to make sure we have the $ syntax correct so that the fill operation increments for each cell. For more details on the sad() function,m see Formatting range addresses.

Now, we can focus on the calculation times for the workbook with these new versions of the formulae. 

Calculation profiling

Using the Execution Time Logging tools, we find that the recalculation times for the MMULT formulae is trivial and almost linear, and the countifs is exponential.

the data - elapsed time for calculation.