#### What can you learn here?

- array formulas VBA
- using VBA profiler
- using worksheet profiler

#### Are Array formulas viable with large ranges

On the Excel Blackbelts forum, someone asked a very interesting question about how to analyze relationships between customers and products. I came up with a solution based on VBA, which was used as the basis for the Create a heatmap in Excel article and related blog post. As usual this code is implemented in the arrayFormulas.xlsm workbook downloadable from Download Complete Projects

#### The problem scenario

The question was how to create a heatmap of associated products, starting from a list of pairs of customer and product number. At first this seemed like a job for some combination of array formulas involving countifs, sumproduct, but the user also said that the input data could be 100000 rows with as many as 30 products. This seemed intuitively too big for an array formula solution, so I pressed ahead with a VBA solution using the Data Manipulation Classes to speed up development.

#### Why not array formula ?

One of the group members, Peter Bartholomew came up with what I thought was a very elegant solution involving MMULT and COUNTIFS, and it got me wondering whether this could be automated and was it a viable solution to a large data problem. If you just want the answer – then yes, it can easily be automated and no, it’s not a viable solution. It’s not practical to use array formulas on very large datasets like this to solve this kind of problem. Read on for the analysis. However, I believe that if we can find a way to optimize this, a hybrid of VBA automation and formulae is probably the best solution. See Optimizing the array formula, and for a more viable solution see Leveraging sorted data with Match

#### Generating test data

The first problem was generating enough random valid test data. Here’s how. It’s not great data, since with a sample this size, most cells will end up about even by chance – but the main thing is it gives me volume

`Public Sub generateTestMatrixData()`

` Dim r As Range, n As Long, kr As Long, kc As Long, i As Long, j As Long, a() As String`

` Dim id As Long`

` kr = 100000`

` kc = 30`

` ReDim a(1 To kc)`

` For i = 1 To kc`

` a(i) = "a" & Format(i, "0#")`

` Next i`

` Set r = Sheets("matrixin").Range("a1")`

` id = 1`

` i = 0`

` While i < kr`

` aShuffle a`

` n = Int(kc * Rnd + 1)`

` id = id + 1`

` For j = 1 To n`

` i = i + 1`

` If i <= kr Then`

` r.Offset(i).Value = id`

` r.Offset(i, 1).Value = a(j)`

` Else`

` Exit For`

` End If`

` Next j`

` Wend`

`End Sub`

`Private Sub aShuffle(p() As String)`

` Dim i As Long, S As String, j As Long`

` Dim x As Long`

` x = UBound(p) - LBound(p) + 1`

` Randomize`

` For i = LBound(p) To UBound(p)`

` j = Int(x * Rnd + 1)`

` S = p(j)`

` p(j) = p(i)`

` p(i) = S`

` Next i`

`End Sub`

This gives me a 100000 rows that start like this

#### Target solution

In the screenshots following, I’m only looking at the first 10,000 rows for the sake of speed. If you are playing around with this downloaded sheet, I suggest you set calculation to manual. If you run the example it will do that anyway. The objective is to first transform the list into a list of unique customers down the side, with a list of potential products across the top, and a series of 0/1 in the table body that would indicate whether this product was used by this customer, like this The final array formula to do this is (entered as a CSE array formula) is (thanks Peter) `=COUNTIFS(MatrixIn!$A$2:$A$10001, tempmatrix!$A$2:$A$662, MatrixIn!$B$2:$B$10001, tempmatrix!$B$1:$AE$1)`

Next step is do matrix multiplication on this to create a table of product by product, which gives us this using this formula (and thanks again Peter) `=MMULT(TRANSPOSE( tempmatrix!$B$2:$AE$662),tempmatrix!$B$2:$AE$662)`

#### Adding heatmap

Next we’ll apply heatmapping ramp formats to that table. and lets do a couple of heatmap chart as well. A smaller sample size with more randomness, gives us a nicer chart of course

#### The performance analysis

There are 4 main stages to this process

- Populating – reading and organizing the input data
- Creating formulas to populate the temporary usage table
- Creating formulas for the matrix operation
- Calculating the worksheet.

Using Automatic Profiling tools which you can download on this website I ran various scenarios from 100 to 50,000 rows. Here are the calculation times in seconds. I didn’t even bother with 100,000 and you can see why from the graph below.

#### Code

Here is the code to automate all this `Sub matrixInExampleX()`

` Dim maxrows As Long`

` maxrows = 100`

` Dim dsIn As cDataSet, dsout As cDataSet`

` Dim cop As Collection, r As Range, coc As Collection`

` Dim cht As Chart`

` Set dsIn = New cDataSet`

` Set dsout = New cDataSet`

` ' sort it first`

` Application.Calculation = xlCalculationManual`

` Application.ScreenUpdating = False`

`worksheets("matrixin").Range("$A:$b").Sort _`

`Key1:=Worksheets("matrixin").Range("A1"), _`

`Key1:=Worksheets("matrixin").Range("b1"), _`

` Header:=xlYes`

` With dsIn`

` ' create a dset with input data`

` .populateData wholeSheet("matrixin"), , "matrixin", , , , True, , maxrows`

` ' make collections of customers and products that appear`

` Set cop = .Column("product").uniqueValues(eSortAscending)`

` Set coc = .Column("customer").uniqueValues(eSortNone)`

` ' create the output matrix`

`wholeSheet("matrixoutx").Cells.ClearContents`

`wholeSheet("tempmatrix").Cells.ClearContents`

` End With`

` 'get newly created output matrix into a dataset`

` dsout.populateData createMatrixFormulas(coc, cop, dsIn), _`

` , "matrixout"`

` Set dsIn = Nothing`

` Application.ScreenUpdating = True`

` Application.Calculate`

` ' create heatmap`

` Set cht = createSurfaceChart(dsout, "heatmap")`

` ''''Set cht = createSurfaceChart(dsOut, "heatmaptop", xlSurfaceTopView)`

` Set cht = Nothing`

` Set dsout = Nothing`

` Set dsIn = Nothing`

`End Sub`

`Private Function createMatrixFormulas(coc As Collection, cop As Collection, _`

` dsIn As cDataSet) As Range`

` Dim r As Range, wr As Range`

` Dim listcustomer As Range 'MatrixIn'!$A$2:$A$98685`

` Dim listProduct As Range 'MatrixIn'!$B$2:$B$98685`

` Dim tableentries As Range 'Sheet2'!$B$2:$J$35`

` Dim tableHeadingCustomer As Range 'Sheet2'!$A$2:$A$35`

` Dim tableheadingProduct As Range 'Sheet2'!$B$1:$J$1`

` Dim matrix As Range`

` Set wr = firstCell(wholeSheet("tempMatrix").Cells)`

` wr.Value = wr.Worksheet.Name`

` Set listProduct = dsIn.Column("product").where`

` Set listcustomer = dsIn.Column("customer").where`

` Set tableHeadingCustomer = _`

` wr.Offset(1).Resize(coc.Count, 1)`

` Set tableheadingProduct = _`

` wr.Offset(, 1).Resize(1, cop.Count)`

` Set tableentries = tableHeadingCustomer.Offset(, _`

` 1).Resize(coc.Count, cop.Count)`

` Set r = tableHeadingCustomer.Resize(1, 1)`

` For Each cc In coc`

` r.Value = cc.Value`

` Set r = r.Offset(1)`

` Next cc`

` Set r = tableheadingProduct.Resize(1, 1)`

` For Each cc In cop`

` r.Value = cc.Value`

` Set r = r.Offset(, 1)`

` Next cc`

` ' the temporary array`

` tableentries.FormulaArray = _`

` "=COUNTIFS(" & list( _`

` SAd(listcustomer), _`

` SAd(tableHeadingCustomer), _`

` SAd(listProduct), _`

` SAd(tableheadingProduct)) & ")"`

` ' the final table`

` Set matrix = firstCell(wholeSheet _`

` ("matrixOutx")).Resize(cop.Count, cop.Count).Offset(1, 1)`

` Set r = firstCell(matrix.Offset(-1))`

` r.Offset(, -1).Value = "matrix"`

` For Each cc In cop`

` r.Value = cc.Value`

` Set r = r.Offset(, 1)`

` Next cc`

` Set r = firstCell(matrix.Offset(, -1))`

` For Each cc In cop`

` r.Value = cc.Value`

` Set r = r.Offset(1)`

` Next cc`

` matrix.FormulaArray = _`

` "=MMULT(TRANSPOSE( " & SAd(tableentries) & ")," & _`

` SAd(tableentries) & ")"`

` Set createMatrixFormulas = wr.Resize(cop.Count + 1, _`

` cop.Count + 1)`

`End Function`

#### Conclusion

Whereas you can create a very elegant solution using array formulas, they quickly run out of steam. In the case of this example, the exponential nature of the execution time means you need to consider very carefully the final data set size when implementing. This article referenced a number of other items on this site such as Data Manipulation Classes , Automatic Profiling and Create a heatmap in Excel. You can download everything at Download Complete Projects. See also Optimizing the array formula for how to mitigate some of these problems. This example is contained in the arrayFormulas.xlsm workbook. Why not join our forum, follow the blog or follow me on twitter to ensure you get updates when they are available. You can also submit anything you want considered for publication on this site to our forum.