Manipulating strings in Excel.
As you will know, Excel very quickly runs of out steam as you try to do any kind of string manipulation with cells. Even very basic requirements cause you reach for VBA, but then you end up with very complex functions highly specific to each string processing task. Many tools nowadays, especially in the Unix/Linux world allow you to use “Regular Expressions” right as part of their basic functionality. This section will show you how to implement Regular Expressions (RegEX) that you can use both in your workbooks and in your VBA procedures to harness their string processing capabilities, as well as a way to start a RegEx library that you can build up with common tasks. The examples and procedures can all be downloaded. Note that there are many regexes already written on sites such as, http://regexlib.com . Before writing your own, I recommend you check to see if anyone has already done something like it. Many of the examples here come from such libraries.
What is a Regular Expression.
These are commonly called RegEX, which is how I will refer to them from now on. A regEX is a pattern that can be applied to a string, either to test or transform a string, depending on whether there is a match according to rules contained in that pattern. As with most things that come up through the Unix route, RegEX syntax is both powerful, and concise. This section is not supposed to be a tutorial on RegEx- there are thousands of them available on the internet. Just google "regular expression tutorial". Some of these examples have come from a great regex resource, http://regexlib.com/
Here's a few examples
Numeric [\0-9]
Numeric exactly 3 characters long [\0-9]{3}
Non Numeric [^\0-9]
So applying 'Numeric' to the string abc123 yields 123. Contrast that with a generalized formula in excel to do that.
RegEX can get also very complex - but the are expressing very complex validation concepts that would be complex in VBA and unimaginably so with Excel functions. Here's one that describes a valid UK post code, from regexlib.com, as are a few in this section.
(((^[BEGLMNS][1-9]\d?)|(^W[2-9])|(^(A[BL]|B[ABDHLNRST]|C[ABFHMORTVW]| D[ADEGHLNTY]|E[HNX]|F[KY]| G[LUY]|H[ADGPRSUX]|I[GMPV]| JE|K[ATWY]|L[ADELNSU]|M[EKL]|N[EGNPRW]| O[LX]|P[AEHLOR]|R[GHM]|S[AEGKL-PRSTWY]|T[ADFNQRSW]|UB|W[ADFNRSV] |YO|ZE)\d\d?)|(^W1[A-HJKSTUW0-9])|(((^WC[1-2])|(^EC[1-4])|(^SW1))[ABEHMNPRVWXY]))(\s*)?([0-9][ABD-HJLNP-UW-Z]{2}))|(^GIR\s?0AA)Aside from extracting string components, you can also use regEX to transform a string.
This one will reduce multiple spaces inside a string to single spaces, when used in conjunction with the procedures discussed on this page
(\S+)\x20{2,}(?=\S+)and this one will format an ip number.
^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\. (\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$Although all that looks rather complex, there are millions of already written regEX on the web you can find for almost any purpose. The regEX implementation you can download here also provides the concept of a 'preset'. The idea here is that you add regEX patterns to your own personal library and refer to them by name rather than needing to reference the patterns themselves. This makes for easy maintenance and less typing errors.
So for example, you can setup a preset which translates to a regEX pattern, but in your formula you refer to a preset name. You'll see more of that later, but for example, you could set up a preset called IPNUMBER. The formula you would use in your worksheet would be
=rxString ( "ipnumber", A1) instead of
=rxString ( "^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$", A1) RegEX implementation.
There are 3 functions that a generally available in implementations of RegEX
- Test - Check if a string matches a RegEx Pattern
- Replace - Replace any matching string portion(s) with something else
- Execute - Apply a RegEx pattern to a string and return the result(s) so you can manipulate them.
Excel does not have an implementation of RegEX, but VBscript does, and that is available to VBA developers simply by including a reference to it. If you are writing VBA code then you can build that right into your procedures. However, you cannot directly access it as an Excel formula so in this case you need to wrap them in some VBA user defined functions to make them accessible. In this case, I have created both a wrapper class (cRegEXLib) and some wrapper functions that access that class. Details are at the end of this article and are available as a download.
examples
Note that the full regEX pattern and formulas may not be shown below in the interests of space, but you will find them in the example sheet in the download section.
These are the formulas for the above
Public Function rxTest(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean
Public Function rxTest(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean
Dim rx As cregXLib
' create a new regx
Set rx = rxMakeRxLib(sName)
rx.ignorecase = ignorecase
' extract the string that matches the requested pattern
rxTest = rx.getTest(s)
End Function
Public Function rxString(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean
Public Function rxString(sName As String, s As String, Optional ignorecase As Boolean = True) As String
Dim rx As cregXLib
' create a new regx
Set rx = rxMakeRxLib(sName)
rx.ignorecase = ignorecase
' extract the string that matches the requested pattern
rxString = rx.getString(s)
End Function
Public Function rxReplace(sName As String, sFrom As String, sTo as string Optional ignorecase As Boolean = True) As Boolean
Public Function rxReplace(sName As String, sFrom As String, sTo As String, Optional ignorecase As Boolean = True) As String
Dim rx As cregXLib
' create a new regx
Set rx = rxMakeRxLib(sName)
rx.ignorecase = ignorecase
' replace the string that matches the requested pattern
rxReplace = rx.getReplace(sFrom, sTo)
End Function
Public Function rxPattern(sName As String) As string
Public Function rxPattern(sName As String) As String
Dim rx As cregXLib
' create a new regx
Set rx = rxMakeRxLib(sName)
' just returnthe pattern
rxPattern = rx.Pattern
End Function
cRegEXLib class
Option Explicit' for building up a library of useful regex expressionsPrivate pName As StringPrivate pRegex As RegExpPublic Property Get Pattern() As String Pattern = pRegex.PatternEnd PropertyPublic Property Let Pattern(p As String) pRegex.Pattern = pEnd PropertyPublic Property Get Name() As String Name = pNameEnd PropertyPublic Property Let Name(p As String) pName = pEnd PropertyPublic Property Get ignorecase() As Boolean ignorecase = pRegex.ignorecaseEnd PropertyPublic Property Let ignorecase(p As Boolean) pRegex.ignorecase = pEnd PropertyPublic Property Get rGlobal() As Boolean rGlobal = pRegex.GlobalEnd PropertyPublic Property Let rGlobal(p As Boolean) pRegex.Global = pEnd PropertyPublic Sub init(sName As String, _ Optional spat As String = "", _ Optional bIgnoreSpaces As Boolean = True, _ Optional bIgnoreCase As Boolean = True, _ Optional bGlobal As Boolean = True) Dim s As String s = spat If bIgnoreSpaces Then s = Replace(s, " ", "") End If Set pRegex = New RegExp With pRegex .Pattern = s .ignorecase = bIgnoreCase .Global = bGlobal End With pName = sNameEnd SubPublic Function getString(sFrom As String) As String Dim mc As matchcollection, am As Match, rs As String Set mc = pRegex.Execute(sFrom) rs = "" For Each am In mc rs = rs & am.Value Next am getString = rsEnd FunctionFunction getReplace(sFrom As String, sTo As String) As String getReplace = pRegex.Replace(sFrom, sTo)End FunctionFunction getTest(sFrom As String) As Boolean getTest = pRegex.Test(sFrom)End FunctionPresets
Function rxMakeRxLib(sName As String) As cregXLib Dim rx As cregXLib, s As String Set rx = New cregXLib ' normally sname points to a preselected regEX ' if not known, silently assume its a regex pattern s = Replace(UCase(sName), " ", "") Select Case s Case "POSTALCODEUK" rx.init s, _ "(((^[BEGLMNS][1-9]\d?) | (^W[2-9] ) | ( ^( A[BL] | B[ABDHLNRST] | C[ABFHMORTVW] | D[ADEGHLNTY] | E[HNX] | F[KY] | G[LUY] | H[ADGPRSUX] | I[GMPV] |" & _ " JE | K[ATWY] | L[ADELNSU] | M[EKL] | N[EGNPRW] | O[LX] | P[AEHLOR] | R[GHM] | S[AEGKL-PRSTWY] | T[ADFNQRSW] | UB | W[ADFNRSV] | YO | ZE ) \d\d?) |" & _ " (^W1[A-HJKSTUW0-9]) | (( (^WC[1-2]) | (^EC[1-4]) | (^SW1) ) [ABEHMNPRVWXY] ) ) (\s*)? ([0-9][ABD-HJLNP-UW-Z]{2})) | (^GIR\s?0AA)" Case "POSTALCODESPAIN" rx.init s, _ "^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$" Case "PHONENUMBERUS" rx.init s, _ "^\(?(?<AreaCode>[2-9]\d{2})(\)?)(-|.|\s)?(?<Prefix>[1-9]\d{2})(-|.|\s)?(?<Suffix>\d{4})$" Case "CREDITCARD" 'amex/visa/mastercard rx.init s, _ "^((4\d{3})|(5[1-5]\d{2}))(-?|\040?)(\d{4}(-?|\040?)){3}|^(3[4,7]\d{2})(-?|\040?)\d{6}(-?|\040?)\d{5}" Case "NUMERIC" rx.init s, _ "[\0-9]" Case "ALPHABETIC" rx.init s, _ "[\a-zA-Z]" Case "NONNUMERIC" rx.init s, _ "[^\0-9]" Case "IPADDRESS" rx.init s, _ "^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$" Case "SINGLESPACE" ' should take a replace value of "$1 " rx.init s, _ "(\S+)\x20{2,}(?=\S+)" Case "EMAIL" rx.init s, _ "^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$" Case "EMAILINSIDE" rx.init s, _ "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b" Case Else rx.init "Adhoc", sName End Select Set rxMakeRxLib = rxEnd Function



