Regex, short for regular expressions, is a sequence of characters that specifies a match pattern in a string. In Alteryx, Regex can be used to replace, split, parse, and match text using two tools: the Formula tool and the Regex tool. In this blog, we'll mainly explore the Regex tool and the basics of creating a Regex syntax to parse texts.
1. What is Regex and when would you need it?
Regex is a tool that you can use to specify patterns within texts. Let's say you have a field of email addresses, and you need to extract the domain of that email. Because email addresses follow a pattern, you can tell Alteryx to look for a certain pattern within the email addresses to parse out the domain.
2. Configuring the Regex Tool
First, we'll explore the configuration of the Regex tool.
The Regex tool in Alteryx has four main configurations.
- Column to Parse: In this field, you'll identify the field you want to manipulate using the Regex tool.
- Format to Convert: This is where you'll specify the string pattern you are looking for. You can click on the '+' button for a detailed list of expressions you can use to build your pattern.
- Output Method: Here, you'll specify how Alteryx should handle the pattern. There are four options: Replace, Tokenize, Parse, and Match. In addition, you can rename and set the data type of the new column. In this blog, we'll focus on the Tokenize and Parse options. In this blog, we'll focus on parsing strings.
- Replacement Text: This is where you would specify the replacement text if you if you selected Replace in the Output Method.
3. Writing a Regex
There are two strategies you should follow when creating a Regex.
- Be vague enough to not exclude your desired match but specific enough to not capture irrelevant strings
- Build up expressions in an iterative process
Let's put these strategies to the test by writing a Regex for an example scenario where you'll have to parse strings from a field.
In this scenario, we have to extract the age of each individual from the 'Approximate Age' column.
- Connect the data to the Regex tool
- Set the 'Column to Parse' and the 'Output Method' configurations
3. Write the expression. To parse out a digit, you need to use the \d expression, which represents a single digit. However, because there are multiple digits, you want to add a '+' sign to let the tool know that there may be one or more digits. Finally, in order to specify the text to parse, you need to capture the pattern in parentheses . Every pattern enclosed in parentheses will be parsed and captured in separate columns.
The final expression should look like this: (\d+).
The final output should look like this.