### Array formulas and large ranges

What can you learn here ?
• array formulas VBA
• using VBA profiler
• using worksheet profiler

#### Are Array formulas viable with large ranges

On the Excel Blackbelts forum, someone asked a very interesting question about how to analyze relationships between customers and products.

I came up with a solution based on VBA, which was used as the basis for the Create a heatmap in Excel article and related blog post. As usual this code is implemented in the arrayFormulas.xlsm workbook downloadable from Download Complete Projects

### The problem scenario

The question was how to create a heatmap of associated products, starting from a list of pairs of customer and product number. At first this seemed like a job for some combination of array formulas involving countifs, sumproduct, but the user also said that the input data could be 100000 rows with as many as 30 products. This seemed intuitively too big for an array formula solution, so I pressed ahead with a VBA solution using the Data Manipulation Classes to speed up development.

### Why not array formula ?

One of the group members, Peter Bartholomew came up with what I thought was a very elegant solution involving MMULT and COUNTIFS, and it got me wondering whether this could be automated and was it a viable solution to a large data problem. If you just want the answer - then yes, it can easily be automated and no, it's not a viable solution. It's not practical to use array formulas on very large datasets like this to solve this kind of problem. Read on for the analysis. However, I believe that if we can find a way to optimize this, a hybrid of VBA automation and formulae is probably the best solution. See Optimizing the array formula, and for a more viable solution see Leveraging sorted data with Match

#### Generating test data

The first problem was generating enough random valid test data. Here's how. It's not great data, since with a sample this size, most cells will end up about even by chance - but the main thing is it gives me volume.

Public Sub generateTestMatrixData()
Dim r As Range, n As Long, kr As Long, kc As Long, i As Long, j As Long, a() As String
Dim id As Long
kr = 100000
kc = 30
ReDim a(1 To kc)
For i = 1 To kc
a(i) = "a" & Format(i, "0#")
Next i
Set r = Sheets("matrixin").Range("a1")
id = 1
i = 0
While i < kr
aShuffle a
n = Int(kc * Rnd + 1)
id = id + 1
For j = 1 To n
i = i + 1
If i <= kr Then
r.Offset(i).Value = id
r.Offset(i, 1).Value = a(j)
Else
Exit For
End If
Next j
Wend

End Sub
Private Sub aShuffle(p() As String)
Dim i As Long, S As String, j As Long
Dim x As Long
x = UBound(p) - LBound(p) + 1

Randomize
For i = LBound(p) To UBound(p)
j = Int(x * Rnd + 1)
S = p(j)
p(j) = p(i)
p(i) = S
Next i

End Sub

This gives me a 100000 rows that start like this

#### Target solution

In the screenshots following, I'm only looking at the first 10,000 rows for the sake of speed. If you are playing around with this downloaded sheet, I suggest you set calculation to manual. If you run the example it will do that anyway.

The objective is to first transform the list into a list of unique customers down the side, with a list of potential products across the top, and a series of 0/1 in the table body that would indicate whether this product was used by this customer, like this

The final array formula to do this is (entered as a CSE array formula) is (thanks Peter)
=COUNTIFS(MatrixIn!\$A\$2:\$A\$10001, tempmatrix!\$A\$2:\$A\$662, MatrixIn!\$B\$2:\$B\$10001, tempmatrix!\$B\$1:\$AE\$1)

Next step is do matrix multiplication on this to create a table of product by product, which gives us this

using this formula (and thanks again Peter)
=MMULT(TRANSPOSE( tempmatrix!\$B\$2:\$AE\$662),tempmatrix!\$B\$2:\$AE\$662)

Next we'll apply heatmapping ramp formats to that table.

and lets do a couple of heatmap chart as well.

A smaller sample size with more randomness, gives us a nicer chart of course

### The performance analysis

There are 4 main stages to this process
1. Populating - reading and organizing the input data
2. Creating formulas to populate the temporary usage table
3. Creating formulas for the matrix operation
4. Calculating the worksheet.
Using Automatic Profiling tools which you can download on this website I ran various scenarios from 100 to 50,000 rows. Here are the calculation times in seconds. I didn't even bother with 100,000 and you can see why from the graph below.

### Code

Here is the code to automate all this
Sub matrixInExampleX()
Dim maxrows As Long
maxrows = 100
Dim dsIn As cDataSet, dsout As cDataSet
Dim cop As Collection, r As Range, coc As Collection
Dim cht As Chart

Set dsIn = New cDataSet
Set dsout = New cDataSet

' sort it first
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False

Worksheets("matrixin").Range("\$A:\$b").Sort _
Key1:=Worksheets("matrixin").Range("A1"), _
Key1:=Worksheets("matrixin").Range("b1"), _

With dsIn
' create a dset with input data
.populateData wholeSheet("matrixin"), , "matrixin", , , , True, , maxrows
' make collections of customers and products that appear
Set cop = .Column("product").uniqueValues(eSortAscending)
Set coc = .Column("customer").uniqueValues(eSortNone)
' create the output matrix
wholeSheet("matrixoutx").Cells.ClearContents
wholeSheet("tempmatrix").Cells.ClearContents
End With
'get newly created output matrix into a dataset
dsout.populateData createMatrixFormulas(coc, cop, dsIn), _
, "matrixout"

Set dsIn = Nothing
Application.ScreenUpdating = True
Application.Calculate
' create heatmap
Set cht = createSurfaceChart(dsout, "heatmap")
''''Set cht = createSurfaceChart(dsOut, "heatmaptop", xlSurfaceTopView)
Set cht = Nothing
Set dsout = Nothing
Set dsIn = Nothing

End Sub
Private Function createMatrixFormulas(coc As Collection, cop As Collection, _
dsIn As cDataSet) As Range
Dim r As Range, wr As Range

Dim listcustomer As Range   'MatrixIn'!\$A\$2:\$A\$98685
Dim listProduct As Range 'MatrixIn'!\$B\$2:\$B\$98685
Dim tableentries As Range   'Sheet2'!\$B\$2:\$J\$35
Dim matrix As Range
Set wr = firstCell(wholeSheet("tempMatrix").Cells)
wr.Value = wr.Worksheet.Name

Set listProduct = dsIn.Column("product").where
Set listcustomer = dsIn.Column("customer").where
wr.Offset(1).Resize(coc.Count, 1)
wr.Offset(, 1).Resize(1, cop.Count)
1).Resize(coc.Count, cop.Count)

For Each cc In coc
r.Value = cc.Value
Set r = r.Offset(1)
Next cc

For Each cc In cop
r.Value = cc.Value
Set r = r.Offset(, 1)
Next cc
' the temporary array
tableentries.FormulaArray = _
"=COUNTIFS(" & list( _
' the final table
Set matrix = firstCell(wholeSheet _
("matrixOutx")).Resize(cop.Count, cop.Count).Offset(1, 1)
Set r = firstCell(matrix.Offset(-1))
r.Offset(, -1).Value = "matrix"
For Each cc In cop
r.Value = cc.Value
Set r = r.Offset(, 1)
Next cc
Set r = firstCell(matrix.Offset(, -1))
For Each cc In cop
r.Value = cc.Value
Set r = r.Offset(1)
Next cc
matrix.FormulaArray = _
"=MMULT(TRANSPOSE( " & SAd(tableentries) & ")," & _