Regex, short for regular expressions, is a way of finding patterns in text. It can be used to check whether text matches a certain format, extract part of a string, replace unwanted characters, or split messy text into useful pieces.
Regex can look intimidating at first because it uses a lot of symbols, but most patterns are built from a fairly small set of building blocks. Once you understand what those building blocks mean, regex becomes much less mysterious, and can be incredibly useful.
| Use Case | Example |
|---|---|
| Validate a format | Check whether an ID, postcode or email address follows the expected structure |
| Extract text | Pull the domain from an email address |
| Clean text | Remove punctuation, extra spaces or unwanted characters |
| Find repeated patterns | Identify duplicated words such as the the |
| Split strings | Separate dates, codes, names or categories into useful parts |
Regex can appear in lots of different tools and languages. The exact syntax can vary slightly, so it is always worth checking the documentation for the tool you are using, but the basic ideas are usually very similar.
| Tool | Where Regex Might Appear |
|---|---|
| Alteryx | REGEX_Match, REGEX_Replace, REGEX_CountMatches, REGEX_Parse |
| Python | re.search(), re.findall(), re.sub(), re.match() |
| SQL | Functions such as REGEXP_LIKE, REGEXP_REPLACE or similar, depending on the SQL version |
| Tableau / Tableau Prep | Functions such as REGEXP_MATCH, REGEXP_EXTRACT, and REGEXP_REPLACE can be used to match, extract or replace text patterns |
| Text editors | Find and replace tools often support regex for more flexible searching |
| Power Query | Regex is not as directly built in as it is in some other tools, but similar text cleaning can often be done with text functions or custom approaches |
Matching Different Types of Characters
At its simplest, regex matches characters. You can match exact text by typing the text you want to find. For example, the pattern below would find the letters “cat” in a string.
cat Regex becomes more powerful when you use special characters to describe the type of character you are looking for. For example, you can search for any digit, any whitespace character, or any uppercase letter.
| Pattern | Meaning | Example Match |
|---|---|---|
. | Any single character, except usually a new line | a, 7, ! |
\d | Any digit | 0 to 9 |
\w | Any word character, usually letters, numbers and underscore | A, 7, _ |
\s | Any whitespace character | Space, tab or new line |
\t | A tab character | A tab space |
\n | A new line character | A line break |
Square brackets create a character set. This means “match one character from this set”. For example, you could use a set to match any vowel, any uppercase letter, or anything except a number.
One slightly confusing thing is that some symbols behave differently depending on where they are used. For example, ^ means “start of string” when it appears outside square brackets, but inside square brackets it means “not”.
Also, be careful not to use [A-z] in place of [A-Za-z], as this will include characters that sit between Z and a in the ASCII/Unicode system.
| Pattern | Meaning | Example Match |
|---|---|---|
[A-Z] | Any uppercase letter | A, B, C |
[a-z] | Any lowercase letter | a, b, c |
[A-Za-z] | Any uppercase or lowercase letter | A, b, Z |
[0-9] | Any digit from 0 to 9 | 4 |
[aeiou] | Any vowel from the set | a, e, i |
[^aeiou] | Anything except a lowercase vowel | E,Q, w, ! |
[^0-9] | Anything except a digit | A, !, space |
Controlling How Many Characters Match
Once you have described what type of character you want, you often need to say how many of them should appear. These blocks are called quantifiers.
A useful way to think about regex is:
type of character + number of characters For example, you might want exactly four digits, one or more letters, or an optional space.
| Pattern | Meaning | Example |
|---|---|---|
+ | One or more | \d+ matches 7 or 123 |
* | Zero or more | A* matches no As, one A, or many As |
? | Optional / zero or one | colou?r matches color and colour |
{3} | Exactly 3 | \d{3} matches exactly three digits |
{2,4} | Between 2 and 4 | \d{2,4} matches two, three or four digits |
{2,} | 2 or more | \d{2,} matches at least two digits |
You can combine character types and quantifiers to build useful patterns:
| Regex | Meaning |
|---|---|
\d{4} | Four digits |
[A-Z]{2} | Two uppercase letters |
[A-Za-z]+ | One or more letters |
\w+ | One or more word characters |
\s? | An optional whitespace character |
Start, End and Word Boundaries
By default, regex often looks for a pattern anywhere in the string. For example, a pattern for four digits might find 2026 inside a longer piece of text.
That can be useful if you are extracting values, but it is less useful if you are validating whether the whole field matches a specific format. Anchors let you control where the match should happen.
| Pattern | Meaning | Example |
|---|---|---|
^ | Start of string | ^Hello matches text that starts with Hello |
$ | End of string | world$ matches text that ends with world |
\b | Word boundary | \bcat\b matches cat as a whole word |
This distinction is very useful when validating formats.
\d{4} finds four digits somewhere.
^\d{4}$ only matches if the whole string is exactly four digits long.
The word boundary pattern, \b, is useful when you want to match a whole word rather than a sequence of letters inside another word.
It does not match a letter, space or punctuation mark itself. Instead, it matches the position where a word character meets a non-word character, such as the edge between a word and a space, punctuation mark, or the start/end of the string.
For example, you might want to match cat as a complete word, but not the cat inside scatter or category.
| Regex | Matches | Does Not Match |
|---|---|---|
\bcat\b | cat | scatter, category |
Groups, OR and Backreferences
Brackets can be used to group part of a regex pattern. Groups are useful when you want to apply logic to one section of a pattern, such as choosing between two options. They are also useful when you want to refer back to something you have already matched.
| Pattern | Meaning | Example |
|---|---|---|
() | Creates a group | (cat) groups the word cat |
| | OR | cat|dog matches cat or dog |
\1 | Refers back to the first captured group | (\w+) \1 can find repeated words |
For example, the OR symbol lets you match one option or another.
I like (cats|dogs)
Matches I like cats or I like dogs.
Backreferences let you reuse a captured group later in the pattern. This can be useful for finding repeated words. For example:
\b(\w+) \1\b
This can match repeated words such as the the, very very or no no.
(\w+) captures the first word. The \1 then says “match that same thing again”. So if the first group captures the, the backreference looks for the again.
| Part | Meaning |
|---|---|
\b | Start at a word boundary |
(\w+) | Capture one or more word characters as a group |
| Match the space between the words |
\1 | Match the same text captured by the first group |
\b | End at a word boundary |
Lookaheads and Lookbehinds
Lookarounds are used when you want to match something based on what comes before or after it, without including that surrounding text in the result.
For example, you might want to extract the number after a pound sign, but not include the pound sign itself. Or you might want to extract the number before a percentage sign, but not include the percentage sign. A lookahead looks forwards. A lookbehind looks backwards.
| Pattern | Name | Meaning |
|---|---|---|
(?=...) | Positive lookahead | Match only if this comes next |
(?!...) | Negative lookahead | Match only if this does not come next |
(?<=...) | Positive lookbehind | Match only if this came before |
(?<!...) | Negative lookbehind | Match only if this did not come before |
Some useful examples include:
| Goal | Regex | Example Result |
|---|---|---|
| Find digits after a pound sign | (?<=£)\d+ | Matches 25 in £25 |
| Find digits before a percentage sign | \d+(?=%) | Matches 75 in 75% |
Escaping Special Characters
Some characters have special meanings in regex.
For example, a full stop means “any character”, not “a literal full stop”. So if you want to match an actual full stop, you usually need to escape it with a backslash.
. means any character.
\. means a literal full stop.
| To Match | Use |
|---|---|
| A full stop | \. |
| A question mark | \? |
| An opening bracket | \( |
| A closing bracket | \) |
| A plus sign | \+ |
| An asterisk | \* |
Case-Insensitive Matching
Sometimes you want to match text regardless of whether it uses uppercase or lowercase letters. Depending on the tool, this might be handled by a setting. In some regex flavours, you can use a case-insensitive flag in the pattern itself.
| Pattern | Meaning | Example Matches |
|---|---|---|
(?i)cat | Case-insensitive match, if supported by the tool | cat, Cat, CAT |
[Cc]at | Match either uppercase or lowercase C | cat, Cat |
If your tool does not support the case-insensitive flag, you may need to use a setting or write the pattern differently.
Greedy and Lazy Matching
Regex quantifiers are usually greedy by default. This means they try to match as much as possible. This can be helpful, but sometimes it means regex grabs more text than you expected. Adding a question mark after the quantifier can make it lazy, meaning it matches as little as possible.
In the following example, we can see how the greedy version and the lazy version differ when looking at text inside and outside of quotation marks.
| Pattern | Behaviour | Example Text | Example Match |
|---|---|---|---|
".*" | Greedy: matches as much as possible | "apple" and "banana" | "apple" and "banana" |
".*?" | Lazy: matches as little as possible | "apple" and "banana" | "apple", then "banana" |
Useful Example Patterns
Here are a few practical regex patterns using the building blocks above.
| Goal | Regex | Example Match |
|---|---|---|
| Four digit number | ^\d{4}$ | 2026 |
| One or more letters | ^[A-Za-z]+$ | Hello |
| Simple UK postcode-style pattern | ^[A-Z]{1,2}\d[A-Z\d]? \d[A-Z]{2}$ | SW1A 1AA |
| Email domain | (?<=@)[A-Za-z0-9.-]+ | gmail.com |
| Repeated word | \b(\w+) \1\b | the the |
| Text after a pound sign | (?<=£)\d+ | 25 in £25 |
| Optional spelling | colou?r | color or colour |
| Text before a percentage sign | \d+(?=%) | 75 in 75% |
The postcode example above is deliberately labelled as postcode-style rather than a perfect postcode validator. Real UK postcodes have more detailed rules, so this is a useful starting pattern rather than a complete validation system.
Final Reference Table
Here is a quick summary of the main symbols.
| Regex | Meaning |
|---|---|
| Character Types | |
. |
Any single character |
\d |
Any digit |
\w |
Any word character |
\s |
Any whitespace character |
| Character Sets | |
[A-Z] |
Any uppercase letter |
[^A-Z] |
Anything except an uppercase letter |
| Quantifiers | |
+ |
One or more |
* |
Zero or more |
? |
Optional / zero or one |
{3} |
Exactly three |
{2,4} |
Between two and four |
| Anchors and Boundaries | |
^ |
Start of string |
$ |
End of string |
\b |
Word boundary |
| Groups and Backreferences | |
() |
Group |
| |
OR |
\1 |
Backreference to the first group |
| Lookarounds | |
(?=...) |
Positive lookahead |
(?!...) |
Negative lookahead |
(?<=...) |
Positive lookbehind |
(?<!...) |
Negative lookbehind |
Regex is easiest to learn by building patterns in small pieces. Rather than trying to write the whole thing at once, start with one part. Choose the type of character you want, choose how many of it you want, then decide whether the pattern needs to appear anywhere or match the whole string.
If you want to practice Regex to get to grips with it, I've found https://regex101.com/ especially useful for doing exercises and sense checking my patterns.
