I previously covered a VBA solution for a problem that could have been solved using array formulas.
However, it really wasn’t the best solution. What would have been better would have been to use VBA to automate the creation of formulas, and optimize the formulas so that they would be viable with large data sets. This article will walk-through the optimization exercise.
You can see the detail of analysis on the vba solution versus the array formula solution, as well as a description of the problem in Array formulas and large ranges
In the array formula version, the 2 biggest items were populating the .arrayformula property using a countifs array formula, and then the calculation speed. However, it is not necessary to use an array formula for the countifs operation if we know how to fill a range with VBA, which turns out to be a highly efficient operation. Making that small change gives us this result for various test sample sizes using the Automatic Profiling tools on this site. Clearly it makes a difference not only for execution, but also for population if we use an array or not.
Profiling results and changing the formulae
The population of the Data Manipulation Classes and the creation of unique values are trivial, as is the MMULT array operation, so focusing first on the countifs operation we can move from
tableentries.FormulaArray = _
"=COUNTIFS(" & list( _
SAd(tableheadingProduct)) & ")"
tableentries.Resize(1, 1).Formula = _
"=COUNTIFS(" & list( _
SAd(tableHeadingCustomer, tableentries, True, True), _
SAd(tableheadingProduct, tableentries, True, , True) _
) & ")"
this more or less entirely eliminates the setup time for the formula population, which was almost half the overall time – for the 50,000 test data set – 1500 seconds! Note that we have to make sure we have the $ syntax correct so that the fill operation increments for each cell. For more details on the sad() function,m see Formatting range addresses.
Now, we can focus on the calculation times for the workbook with these new versions of the formulae.
Using the Execution Time Logging tools, we find that the recalculation times for the MMULT formulae is trivial and almost linear, and the countifs is exponential.
the data – elapsed time for calculation.
Although the end to end process time is now approximately 350 seconds against the original 3000 seconds, it is clearly still not a viable solution to have these kind of formulas in a workbook. The next step will be to see what we can do instead of countifs. If you want to read more about the detail of this problem, see Array formulas and large ranges.
Why not join our forum, follow the blog or follow me on twitter to ensure you get updates when they are available. You can also submit anything you want considered for publication on this site to our forum.