Manipulating strings in Excel.
As you will know, Excel very quickly runs of out steam as you try to do any kind of string manipulation with cells. Even very basic requirements cause you reach for VBA, but then you end up with very complex functions highly specific to each string processing task. Many tools nowadays, especially in the Unix/Linux world allow you to use “Regular Expressions” right as part of their basic functionality. This section will show you how to implement Regular Expressions (RegEX) that you can use both in your workbooks and in your VBA procedures to harness their string processing capabilities, as well as a way to start a RegEx library that you can build up with common tasks. The examples and procedures can all be downloaded. Note that there are many regexes already written on sites such as, http://regexlib.com . Before writing your own, I recommend you check to see if anyone has already done something like it. Many of the examples here come from such libraries. This section is a wrapper class for regexes to make them easier to use from Excel, but also as a library implementation to give complex regexes a name so you can reuse them without recreating the contents.
If’ you’d rather use some of the more complex features of regexes and use them directly in Excel, you’ll find some fairly extreme regexes in pinyin conversions, and an example of using simple regexes to do complicated tasks here Regex, scraping and visualizing
What is a Regular Expression.
These are commonly called RegEX, which is how I will refer to them from now on. A regEX is a pattern that can be applied to a string, either to test or transform a string, depending on whether there is a match according to rules contained in that pattern. As with most things that come up through the Unix route, RegEX syntax is both powerful, and concise. This section is not supposed to be a tutorial on RegEx- there are thousands of them available on the internet. Just google “regular expression tutorial”. Some of these examples have come from a great regex resource, http://regexlib.com/
Here’s a few examples
Numeric [\0-9] Numeric exactly 3 characters long [\0-9]{3} Non Numeric [^\0-9]
So applying ‘Numeric’ to the string abc123 yields 123. Contrast that with a generalized formula in excel to do that.
=MID(H1,MATCH(TRUE,ISNUMBER(--MID(H1,ROW(INDIRECT("$1:$"&LEN(H1))),1)),0),COUNT(--MID(H1,ROW(INDIRECT("$1:$"&LEN(H1))),1)))
RegEX can get also very complex – but the are expressing very complex validation concepts that would be complex in VBA and unimaginably so with Excel functions. Here’s one that describes a valid UK post code, from regexlib.com, as are a few in this section.
(((^[BEGLMNS][1-9]\d?)|(^W[2-9])|(^(A[BL]|B[ABDHLNRST]|C[ABFHMORTVW]| D[ADEGHLNTY]|E[HNX]|F[KY]| G[LUY]|H[ADGPRSUX]|I[GMPV]| JE|K[ATWY]|L[ADELNSU]|M[EKL]|N[EGNPRW]| O[LX]|P[AEHLOR]|R[GHM]|S[AEGKL-PRSTWY]|T[ADFNQRSW]|UB|W[ADFNRSV] |YO|ZE)\d\d?)|(^W1[A-HJKSTUW0-9])|(((^WC[1-2])|(^EC[1-4])|(^SW1))[ABEHMNPRVWXY]))(\s*)?([0-9][ABD-HJLNP-UW-Z]{2}))|(^GIR\s?0AA)
Aside from extracting string components, you can also use regEX to transform a string.
This one will reduce multiple spaces inside a string to single spaces, when used in conjunction with the procedures discussed on this page
(\S+)\x20{2,}(?=\S+)
and this one will format an ip number.
^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\. (\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$
Although all that looks rather complex, there are millions of already written regEX on the web you can find for almost any purpose. The regEX implementation you can download here also provides the concept of a ‘preset’. The idea here is that you add regEX patterns to your own personal library and refer to them by name rather than needing to reference the patterns themselves. This makes for easy maintenance and less typing errors.
So for example, you can setup a preset which translates to a regEX pattern, but in your formula you refer to a preset name. You’ll see more of that later, but for example, you could set up a preset called IPNUMBER. The formula you would use in your worksheet would be
=rxString ( "ipnumber", A1)
instead of
=rxString ( "^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$", A1)
RegEX implementation.
There are 3 at least functions that a generally available in implementations of RegEX
-
- Test – Check if a string matches a RegEx Pattern
- Replace – Replace any matching string portion(s) with something else
- Execute – Apply a RegEx pattern to a string and return the result(s) so you can manipulate them.
Excel does not have an implementation of RegEX, but VBscript does, and that is available to VBA developers simply by including a reference to it. If you are writing VBA code then you can build that right into your procedures. However, you cannot directly access it as an Excel formula so in this case you need to wrap them in some VBA user defined functions to make them accessible. In this case, I have created both a wrapper class (cRegEXLib) and some wrapper functions that access that class. Details are at the end of this article and are available as a download.
Examples
Note that the full regEX pattern and formulas may not be shown below in the interests of space, but you will find them in the example sheet in the download section.
These are the formulas for the above
Public Function rxTest(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean
Public Function rxTest(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean Dim rx As cregXLib ' create a new regx Set rx = rxMakeRxLib(sName) rx.ignorecase = ignorecase ' extract the string that matches the requested pattern rxTest = rx.getTest(s) End Function
Public Function rxString(sName As String, s As String, Optional ignorecase As Boolean = True) As Boolean
Public Function rxString(sName As String, s As String, Optional ignorecase As Boolean = True) As String Dim rx As cregXLib ' create a new regx Set rx = rxMakeRxLib(sName) rx.ignorecase = ignorecase ' extract the string that matches the requested pattern rxString = rx.getString(s) End Function
Public Function rxReplace(sName As String, sFrom As String, sTo as string Optional ignorecase As Boolean = True) As Boolean
Public Function rxReplace(sName As String, sFrom As String, sTo As String, Optional ignorecase As Boolean = True) As String Dim rx As cregXLib ' create a new regx Set rx = rxMakeRxLib(sName) rx.ignorecase = ignorecase ' replace the string that matches the requested pattern rxReplace = rx.getReplace(sFrom, sTo) End Function
Public Function rxPattern(sName As String) As string
Public Function rxPattern(sName As String) As String Dim rx As cregXLib ' create a new regx Set rx = rxMakeRxLib(sName) ' just returnthe pattern rxPattern = rx.Pattern End Function
cRegEXLib class
Option Explicit ' for building up a library of useful regex expressions Private pName As String Private pRegex As RegExp Public Property Get Pattern() As String Pattern = pRegex.Pattern End Property Public Property Let Pattern(p As String) pRegex.Pattern = p End Property Public Property Get Name() As String Name = pName End Property Public Property Let Name(p As String) pName = p End Property Public Property Get ignorecase() As Boolean ignorecase = pRegex.ignorecase End Property Public Property Let ignorecase(p As Boolean) pRegex.ignorecase = p End Property Public Property Get rGlobal() As Boolean rGlobal = pRegex.Global End Property Public Property Let rGlobal(p As Boolean) pRegex.Global = p End Property Public Sub init(sName As String, _ Optional spat As String = "", _ Optional bIgnoreSpaces As Boolean = True, _ Optional bIgnoreCase As Boolean = True, _ Optional bGlobal As Boolean = True) Dim s As String s = spat If bIgnoreSpaces Then s = Replace(s, " ", "") End If Set pRegex = New RegExp With pRegex .Pattern = s .ignorecase = bIgnoreCase .Global = bGlobal End With pName = sName End Sub Public Function getString(sFrom As String) As String Dim mc As matchcollection, am As Match, rs As String Set mc = pRegex.Execute(sFrom) rs = "" For Each am In mc rs = rs & am.Value Next am getString = rs End Function Function getReplace(sFrom As String, sTo As String) As String getReplace = pRegex.Replace(sFrom, sTo) End Function Function getTest(sFrom As String) As Boolean getTest = pRegex.Test(sFrom) End Function
Presets
Function rxMakeRxLib(sName As String) As cregXLib Dim rx As cregXLib, s As String Set rx = New cregXLib ' normally sname points to a preselected regEX ' if not known, silently assume its a regex pattern s = Replace(UCase(sName), " ", "") Select Case s Case "POSTALCODEUK" rx.init s, _ "(((^[BEGLMNS][1-9]\d?) | (^W[2-9] ) | ( ^( A[BL] | B[ABDHLNRST] | C[ABFHMORTVW] | D[ADEGHLNTY] | E[HNX] | F[KY] | G[LUY] | H[ADGPRSUX] | I[GMPV] |" & _ " JE | K[ATWY] | L[ADELNSU] | M[EKL] | N[EGNPRW] | O[LX] | P[AEHLOR] | R[GHM] | S[AEGKL-PRSTWY] | T[ADFNQRSW] | UB | W[ADFNRSV] | YO | ZE ) \d\d?) |" & _ " (^W1[A-HJKSTUW0-9]) | (( (^WC[1-2]) | (^EC[1-4]) | (^SW1) ) [ABEHMNPRVWXY] ) ) (\s*)? ([0-9][ABD-HJLNP-UW-Z]{2})) | (^GIR\s?0AA)" Case "POSTALCODESPAIN" rx.init s, _ "^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$" Case "PHONENUMBERUS" rx.init s, _ "^\(?(?<AreaCode>[2-9]\d{2})(\)?)(-|.|\s)?(?<Prefix>[1-9]\d{2})(-|.|\s)?(?<Suffix>\d{4})$" Case "CREDITCARD" 'amex/visa/mastercard rx.init s, _ "^((4\d{3})|(5[1-5]\d{2}))(-?|\040?)(\d{4}(-?|\040?)){3}|^(3[4,7]\d{2})(-?|\040?)\d{6}(-?|\040?)\d{5}" Case "NUMERIC" rx.init s, _ "[\0-9]" Case "ALPHABETIC" rx.init s, _ "[\a-zA-Z]" Case "NONNUMERIC" rx.init s, _ "[^\0-9]" Case "IPADDRESS" rx.init s, _ "^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$" Case "SINGLESPACE" ' should take a replace value of "$1 " rx.init s, _ "(\S+)\x20{2,}(?=\S+)" Case "EMAIL" rx.init s, _ "^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$" Case "EMAILINSIDE" rx.init s, _ "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b" Case Else rx.init "Adhoc", sName End Select Set rxMakeRxLib = rx End Function
Summary
Why not join our forum, follow the blog or follow me on Twitter to ensure you get updates when they are available.