What is Regex?
- Regular Expressions
- a way to specify part of a string that matches something
- a way to check if a string contains something
Why Regex?
- More versatile
- Universal - available in Tableau & Alteryx
- It is used for finding, removing or replacing parts of string.
- \w{3} - 3 word like characters - My cat is called Bob (matches to every 3 letters next to each other - ‘cat’, ‘cal’, ’led’, ’Bob’
- String processing
How to work well with Regex?
- Be vague enough to capture everything you want
- Build up expressions for trial and testing
Quantifiers - how many of something?
Special Characters:
.[]{}()\*+?|^$
Character Classes:
. = everything!
\w = character
\d = digit
\s = whitespace
Formula:
REGEX_CountMatches(String, Pattern)
How many times in this string does this pattern show up? Output is a number
REGEX_Replace(String, Pattern, Replace)
Output is the String with replacement In this string, find this pattern and replace it with this string
REGEX_Match(String, Pattern)
Does this string contain this pattern? Output is -1 or 0 (True or False)
Examples:
"My cat is called Bob and born 10th May. I have Many cats." - For reference
- “.” - Every separate character
- “.*” - Sentence as a whole
- “.+” - Sentence as a whole - one or more
- “\w” = alphanumeric and ‘_’ - every character that isnt whitespace
- “\w+” - each word - greedy
- “\w{3}” = 3 alphanumerics - 3 word like characters - My cat is called Bob (matches to every 3 letters next to each other - ‘cat’, ‘cal’, ’led’, ’Bob’
- “B\w+” - words beginning with B
- “[A-Z]\w+” - includes a list - will look for those characters in a string - [A-Z] words beginning with a capital letter
- “[A-Z]*\w+” - words beginning with 0 or more capitals
- “^\w+” = start of a string
- “\w+$” - end of the string
- “My” - the word “My”
- “M.y” - M followed by any character followed by y - only 1 character - word, blank, word
- “M..y” M followed by any character followed by y - only 2 character - word, blank, blank, word
- “M.+y” - will detect all y’s (greedy) until the last y. Sentence as a whole - M, 1 or more blanks, y (greedy)
- “M.+?y” - ‘?’ makes it not greedy - M, blank y, - Stops at ‘May’
- “M\w*y” - M, 0 or more alphanumerics, y (greedy) - My, May, Many
- ( ) - Group Greedy
- “\s” - whitespace - one character - matches with all the whitespace
- “\d” - digit - 0-9
- “\d{2}” - works for double digit number
- “\u” - uppercase letter
- “\u\u?\d\u?\s\d\u\u - ‘?’ - makes it optional - e.g. Postcodes (because its unique)
- “\n” - New Line
- “.+” - one or more of anything (any character)
- \S - NOT space
- [^s] - not an S