Page Content

#### What can you learn here?

• Optimization and profiling
• filling ranges using VBA
• using match to partition data Another approach to Array formulas and large Ranges

previously covered a VBA solution for a problem that could have been solved using array formulas. Even after Optimizing the array formula the result still wasn’t acceptable because of  the exponential time to calculate cells using countifs on large scale dataset. This article explores how you can use MATCH() to leverage sorted data to cut calculation times down to a fraction of their natural calculation times. You can download this workbook, arrayformulas.xlsm, from Downloads

#### What’s good enough?

In our example, through Optimizing the array formula we have reduced the calculation times by a factor of 10 for a medium sized (50,000 * 30) test data sample. Althttps://ramblings.mcpher.com/array-formulas-doing-lots-of-things-at-once/array-formulas-and-large-ranges/hough this sounds like a good result, it’s not good enough – since workbook calculation still takes 5 minutes, and run times are exponential rather than linear. So we need to find a way to get this down to under a second, and at the same time get to a predictable run time for bigger data sets. Here are the current calculation results using countifs for the cell call calculations. The problem here is that the bigger the data set, the more countifs has to search multiple times – so how to get a small, predictable cut of the data for each search. Our input data is sorted, and looks like this We are trying to create a matrix of 1/0 with customer down the side and product along the top that looks like the below so that we can perform MMULT() on it to get this If we could encourage an operation in the body of the 1/0 operation to only search in the part of the list for the customer relevant for that row, we could reduce the processing that needs to be done by a large factor. Luckily, MATCH(), has 3 modes And since our input data is sorted we can use a combination of 0 and -1 match options to partition the input data into small chunks. To help with that we are going to use a couple of helper columns to minimize the amount of recalculations – so the 1/0 tabel now looks like this – the first two columns showing the offset row number and length of the input data concerning the customer in column C The formula for cell B3is –`=MATCH(\$C3,OFFSET(MatrixIn!\$A\$2,\$A2+\$B2,0,31,1),0)+\$A2+\$B2-1` This is saying that we need to do an exact match on the customer number in cell C3, but we only need to look at the small strip of the input data that starts after the section belonging to the previous customer is finished. The length of the section is in cell A3, and can be calculated as follows`=MATCH(\$C3+1,OFFSET(MatrixIn!\$A\$2,\$B3,0,31,1),)-1` In this case we are going to count from the place that this customer’s section starts to the place when the data doesn’t belong to this customer any more (\$c3+1). Although most people use 0 as the 3rd parameter to MATCH(), the default behavior is to stop when we match >=  \$c3+1. These 2 simple formulas can be used to slice the input data into small manageable chunks. We now know which row each customer’s data starts on, and how many entries there are.  They can just be filled down since care has been taken with the use of \$ as to what gets incremented in a fill operation. (but note that the formula for cell c2 is a little different to all the others since we don’t have a previous customer to refer to)

#### How to replace the countifs functionality with MATCH.

Using these two helper columns, the body of the table formula is a piece of cake. This is cell D2 `=--ISNUMBER(MATCH(D\$1,OFFSET(MatrixIn!\$A\$2,\$B2,1,\$A2,1),0))` This is checking the section of the input data used by this customer and defined by columns A & B, to see whether there is a match with the product associated with this column, and converting the true/false returned by isnumber() to 1/0. Again, this can be filled across an down for the whole table body.

#### Efficiency

We started this exercise at 1500 seconds for a 50,000 * 30 matrix, then through Optimizing the array formula got that down to 300+ seconds. Incredibly, this approach gets us to .5 seconds – a 600x improvements. When you think of it though, we are searching on average about 15 rows per customer instead of 50,000 so actually it is not really surprising that we see a dramatic improvement. Not only that, but the time to calculate is linear. Doubling to 100,000 records actually takes not much more than double the time at 1.2 seconds Here are the results  #### Automating

We could just leave it at that, but a big data set like would be error prone to setup manually as well as taking a lot of time, so using some of the VBA approach used in Array formulas and large ranges, this is pretty straightforward to automate.Here is the code, which uses Data Manipulation Classes `Sub matrixmatchExample()``    Dim maxrows As Long``    maxrows = 100000``    Dim dsIn As cDataSet, dsout As cDataSet``    Dim cop As Collection, r As Range, coc As Collection``    Dim cht As Chart``    Set dsIn = New cDataSet``    Set dsout = New cDataSet``    ' sort it first``    Application.Calculation = xlCalculationManual``Application.ScreenUpdating = False``    Worksheets("matrixin").Range("\$A:\$b").Sort _``        Key1:=Worksheets("matrixin").Range("A1"), _``        Key2:=Worksheets("matrixin").Range("b1"), _``        header:=xlYes``    With dsIn``        ' create a dset with input data``        .populateData wholeSheet("matrixin"), , "matrixin", , , , True, , maxrows``        ' make collections of customers and products that appear``        Set cop = .Column("product").uniqueValues(eSortAscending)``        Set coc = .Column("customer").uniqueValues(eSortNone)``        ' create the output matrix``        wholeSheet("matrixout").Cells.ClearContents``        wholeSheet("tempmatrix").Cells.ClearContents``    End With``        'get newly created output matrix into a dataset``    dsout.populateData creatematchMatrixFormulas(coc, cop, dsIn), _``            , "matrixout"``    Set dsIn = Nothing``    Application.ScreenUpdating = True``    Application.Calculate``    Application.Calculation = xlCalculationAutomatic``    ' create heatmap``    'executeHeatMapScale dsout``    ''Set cht = createSurfaceChart(dsout, "heatmapx")``    Set cht = createSurfaceChart(dsout, "heatmap", xlSurfaceTopView)``    Set cht = Nothing``    Set dsout = Nothing``    Set dsIn = Nothing``End Sub` and to populate the workbook with the formulae discussed earlier. `Private Function creatematchMatrixFormulas(coc As Collection, cop As Collection, _``                dsIn As cDataSet) As Range``    Dim r As Range, wr As Range, cc As cCell``    Dim listCustomer As Range``    Dim listProduct As Range``    Dim tableentries As Range``    Dim tableHeadingCustomer As Range``    Dim tableheadingProduct As Range``    Dim helperStart As Range``    Dim helperLength As Range``    Dim matrix As Range``    Set wr = firstCell(wholeSheet("tempMatrix").Cells)``    ' these 2 columns will contain the start row and length of``    ' list of products used by each customer``    Set helperStart = wr.Resize(coc.Count, 1).Offset(1, 1)``    helperStart.Offset(-1).Cells(1, 1).Value = "Start"``    Set helperLength = helperStart.Offset(, -1)``    helperLength.Offset(-1).Cells(1, 1).Value = "Length"``    Set tableHeadingCustomer = helperStart.Offset(, 1)``    tableHeadingCustomer.Offset(-1).Cells(1, 1).Value = "Customer"``    Set tableheadingProduct = tableHeadingCustomer.Offset(-1, 1).Resize(1, cop.Count)``    Set tableentries = tableHeadingCustomer.Offset(, _``        1).Resize(coc.Count, cop.Count)``    Set listCustomer = dsIn.Column("customer").Where``    Set listProduct = dsIn.Column("product").Where``    Set r = tableHeadingCustomer.Resize(1, 1)``    For Each cc In coc``        r.Value = cc.Value``        Set r = r.Offset(1)``    Next cc``    Set r = tableheadingProduct.Resize(1, 1)``    For Each cc In cop``        r.Value = cc.Value``        Set r = r.Offset(, 1)``    Next cc``    'START``    ' the first cell is a little different``    '=MATCH(\$C2,MatrixIn!\$A\$2:\$A\$100001,0)-1``    helperStart.Cells(1, 1).Formula = _``        "=Match(" & list( _``            SAd(tableHeadingCustomer.Cells(1, 1), helperStart, , True), _``            SAd(listCustomer, helperStart), 0) & ")-1"``    '=MATCH(\$C3,OFFSET(MatrixIn!\$A\$2,\$B2+\$A2,0,31,1),0)+\$A2+\$B2-1``    With helperStart.Offset(1).Resize(helperStart.Rows.Count - 1)``        .Cells(1, 1).Formula = _``        "=Match(" & list( _``            SAd(tableHeadingCustomer.Cells(2, 1), helperStart, True, True), _``            "offset(" & list( _``                    SAd(listCustomer, helperStart, True), _``                    SAd(helperLength, helperStart, True, True) & "+" & _``                    SAd(helperStart, helperStart, True, True), _``                    0, cop.Count + 1, 1) & ")", 0) & ")+" & _``                    SAd(helperLength, helperStart, True, True) & "+" & _``                    SAd(helperStart, helperStart, True, True) & "-1"``        .FillDown``    End With``    'LENGTH``    '=MATCH(\$C2+1,OFFSET(MatrixIn!\$A\$2,\$B2,0,31,1),)-1``    helperLength.Cells(1, 1).Formula = _``        "=Match(" & list( _``            SAd(tableHeadingCustomer.Cells(1, 1), helperStart, True, True) & "+1", _``            "offset(" & list( _``                    SAd(listCustomer, helperStart, True), _``                    SAd(helperStart, helperStart, True, True), _``                    0, cop.Count + 1, 1) & ")") & ",)-1"``    helperLength.FillDown``    'table body``    '=--ISNUMBER(MATCH(D\$1,OFFSET(MatrixIn!\$A\$2,\$B2,1,\$A2,1),0))``    tableentries.Cells(1, 1).Formula = _``        "=--isnumber(match(" & list( _``            SAd(tableheadingProduct, tableentries, True, , True), _``            "offset(" & list( _``                SAd(listCustomer, tableentries, True), _``                SAd(helperStart, tableentries, True, True), _``                1, SAd(helperLength, tableentries, True, True), 1) & ")", 0 _``            ) & "))"``    tableentries.Resize(1).FillRight``    tableentries.FillDown``    ' the final table``    Set matrix = firstCell(wholeSheet _``        ("matrixOut")).Resize(cop.Count, cop.Count).Offset(1, 1)``    Set r = firstCell(matrix.Offset(-1))``    r.Offset(, -1).Value = "matrix"``    For Each cc In cop``        r.Value = cc.Value``        Set r = r.Offset(, 1)``    Next cc``    Set r = firstCell(matrix.Offset(, -1))``    For Each cc In cop``        r.Value = cc.Value``        Set r = r.Offset(1)``    Next cc``    matrix.FormulaArray = _``     "=MMULT(TRANSPOSE( " & SAd(tableentries) & ")," & _``            SAd(tableentries) & ")"``    Set creatematchMatrixFormulas = matrix.Offset(-1, -1).Resize(cop.Count + 1, _``            cop.Count + 1)``End Function`