Introduction to the world of REGEX


What is Regex?
Regex is short for Regular Expressions, it is a powerful and standardized pattern matching language which is used to search manipulate and validate text. It is the brain child of Stephen Kleene, an American mathematician. Regex allows for a standardized, concise and flexible way to describe complex search patterns. This allows it to be used for processing text, extracting data and manipulating strings.

Regex uses something called pattern matching to do this, allowing you to define specific patterns in text using certain special characters known as Metacharacters. These metacharacters have a predefined meaning in Regex; there are also character classes which can be specified, for example in regex [0-9] matches any digit, and [A-Za-z] matches any uppercase or lowercase letter. Regex also has Quantifiers allowing you indicate how many times a character or a group of characters should occur, Quantifiers include +, *, ? etc. You are also able to group and capture certain parts of the pattern using the parentheses allowing you to select a portion of text between certain delimiters for example these are called capture groups. Regex also allows you to specify if the pattern you are looking for is in the beginning or the end of the line for example, these are your Anchor characters. Another important concept to know in regex is the idea of an Escape Character, certain characters have special meanings in Regex, such as the period, which means any character, if you want to match characters like these literally you can escape them using the backslash \.
Regex is supported in various programming languages, text editors, and command-line tools, including Python, JavaScript, Bash and even data wrangling software such as Alteryx.

Personally this is a topic that might be a bit difficult to grasp theoretically, and the more time you spend practically implementing and experimenting the more solid you grasp will be on the subject.

Check the cheat-sheet below to get an idea of what certain characters stand for in Regex. There are websites like regexone.com which allow you to learn regex interactively, walking you through the entire toolset. You can also practice your pattern matching on regex101.com, which allows you to input certain strings and input regex to get a live look at which part of the string will match when you use certain metacharacters, capture groups etc.

Author:
Afnan Foyez
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab