What is RegEx?
RegEx (Regular Expression) is a sequence of characters that defines a pattern in a text field. The characters could be words, numbers, symbols, punctuation, unicode, etc. It is used for extracting, replacing, and matching text.
For example, a simple . finds any character except a new line in RegEx.
Notice how everything except for the new line in the middle of the two sentences is highlighted.
data:image/s3,"s3://crabby-images/44d0c/44d0c86cf21b583624389a24e30061b7d4d10855" alt=""
\w finds any alphanumeric character
data:image/s3,"s3://crabby-images/48c67/48c675e581f6f3364b7b36235492b6beb4600743" alt=""
\d finds digits
data:image/s3,"s3://crabby-images/e4a3d/e4a3d8675b390b5ae63dd3060deb65c168f9db07" alt=""
\s finds spaces
data:image/s3,"s3://crabby-images/99f34/99f34b75eb3ca98dfc3f4cb420e9fdfe94157528" alt=""
Why is it useful?
RegEx is used in the software world, largely by programmers and analysts. It can solve a diverse number of problems and will become your best friend if you are working with any of the following:
- free input text fields
- data validation
- inconsistent data
- web scraping
It also comes in handy when you need to extract information from a text field, such as: email addresses, post codes, various dates formats, names, id numbers.
Here’s an example:
data:image/s3,"s3://crabby-images/9903d/9903d6eb72166360b85a00c07bbd01ea6e175587" alt=""
The syntax \w+@\w+.\w+ successfully matched the email address above. It reads: find one or more characters followed by @ followed by one or more characters, a . and one or more characters.
data:image/s3,"s3://crabby-images/20b90/20b90cec7d5eb32917fe48cd5b2c4cd105db4b58" alt=""
To find a postcode, I included the \d pattern which finds digits.
\w+\d+\s\d\w+ reads: find one or more characters followed by one or more digits followed by a space, another digit and one or more characters.
Note: Overusing RegEx is a great way to make your co-workers very angry with you. Do not use it if there is an alternative simpler solution to an issue that does not require a complex syntax (e.g. a text field that can be split into two columns based on a common delimiter).
How to learn it?
As described by Craig Dewar from The Data School Australia ‘Learning Regex is a bit like learning a second language – except you don’t speak it – you just think it.’
Luckily, there are numerous resources online and you would not need to remember the RegEx syntax by heart. You can find a full reference ‘cheatsheet’ with all of it listed on www.regexr.com.
data:image/s3,"s3://crabby-images/d0447/d044799920431ebd496cc4b6dcdc59cf07c302e1" alt=""
This great website also allows you to write and test your RegEx. Another one is www.regexone.com which provides bite-size lessons and practice examples.
If you have any questions about RegEx, or would like to chat about it in general, do not hesitate to get in touch on Twitter @nataliatamiteva.