Strings and garbage

VBA has a garbage collector that runs from time to time in order to free up memory that is no longer being used. in Objects and the garbage collector I showed how to teardown objects so they would become eligible for garbage collection.

There's another problem related to strings.  When you concatenate a string, it creates a new copy of the concatenated versions. 

Consider this loop

    s = vbnullstring
    for  each c in collect
        s = s & c.text
    next c
    debug.print len(s) , s

Each time that s is appended to, a new version of s gets created by copying the old s and and the next text to a new area of memory. That leaves the old s hanging around, waiting for the garbage collector to arrive to free up the memory. So 2 bad things are happening
  • We keep re-copying an every increasing amount of text 
  • The garbage collector interrupts everything to do its work according to some rule, and/or we gradually use up memory.

 

 
This chart shows 100 iterations of a process similar to the above. See how each iteration takes longer than the previous one. I ran this a few times and always got the same kind of pattern. I suspect the little spikes are the garbage collection kicking in, since they are followed by a short relief period. 

This can have a very significant effect on run times if the loops are large and the strings long.

The StringBuilder class

In VB there is a StringBuilder class which allegedly deals with this problem. It doesn't exist in VBA, so I created something similar with i've called the cStringChunker. Here is the result of the two methods together, with the StringChunker in red.


 The cStringChunker timings - shown in red-  barely register on this chart against the regular concatenation method.

Aside from being orders of magnitude faster , you can see that cStringChunker holds pretty consistent performance regardless of the string being concatenated. 

You'll see just a couple of red blips - this is when the cache it uses needs extending, and its possible that the garbage collector pays a visit.




How does it work

To make this possible, you have to be able to add characters in place to a string to encourage it not to replicate. I found the key to this in this wikibook article, from which I quote this.....

You can also use Mid$ on the left hand side of an assignment:

 Dim s As String
 s = "abcdef"
 Debug.Print s
 Mid$(s, 2, 3) = "xxx"
 Debug.Print s
 Mid$(s, 3, 1) = "abcdefgh"
 Debug.Print s
 Mid$(s, 2, 3) = "y"
 Debug.Print s
 
 abcdef
 axxxef
 axaxef
 ayaxef

Notice that when Mid$ is on the left it doesn't change the total length of the string it just replaces the specified number of characters. If the right hand side specifies fewer characters than are asked for, then only that number of characters is replaced; if it specifies more, only the number asked for is used.


So Mid() can be on the left. And it replaces characters in place inside a string. Therefore if you always have a string big enough to take the concatenation required, you will never need to make a new copy of it. Here is the code rewritten using the 
cStringChunker which is based on that principle.


    dim chunk as cStringChunker
    set chunk = new cStringChunker

    for  each c in collect
        chunk.add c.text
    next c
    debug.print chunk.size, chunk.content

A cache is maintained into which the added string is inserted. If the cache needs to be extended, it extends first by a modest amount, then increasing amounts depending on the current content size. That means it only very occassionally needs to go off and extend itself, avoiding the garbage collector's attention and unecessary copying, yet keeping the allocation modest for smaller string operations. 

Other useful things

With such a class we can  simplify some common string things, taking advantage of the efficiency of 'not having to make a copy'. Let's take this example - a common problem - making a comma separated list

One way to do it, would be this, but  as we've already established, it would keep generating bigger and bigger versions of s. We also have that pesky IF. And what if the first element was blank ? the result would have too few commas.

    dim s as string
    s = vbnullstring

    for  each c in collect
        if s <> vbnullstring then s = s & ","
        s = s & c.text
    next c
    debug.print s, len(s)

cStringChunker has a .chop(n) method, where n (default 1) characters will be discarded from the end of the string. Note that it also returns itself from the add and chop operations so that multiple methods can be added together.

    dim chunk as cStringChunker
    set chunk = new cStringChunker

    for  each c in collect
        chunk.add (c.text).add(",")
    next c
    debug.print chunk.size, chunk.chop.content

The cStringChunker class

Code is on github, and below. (to find out how to embed code in your blog like this, see Step by Step Gas Publisher)

cStringChunker class

Methods and properties

To find out how to automatically create documentation like this from your VBA modules see Automatic documentation

cStringChunker class

For more on this topic , see the excel liberation blog

For help and more information join our forumfollow the blogfollow me on twitter

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, All formats are available from O'ReillyAmazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.



Comments