Regular Expressions in Alteryx

by Viktoria Von Laer

Did you ever needed to extract a specific string or number from a bigger string?

Regular expressions provides this possibility. Regular expressions (also called RegEx) mean a sequence of characters that specify a search pattern. This post gives a short introduction to regular expressions in Alteryx.

In Alteryx, you can use regular expressions in different tools. In the olive-green Parse section, choose RegEx. Besides that, you can use regular expressions the blue Preparation section, in the formula tool and the filter tool.

Regexes with the Parse Tool

In the settings of the parse regular expressions tool, you first need to select the field of the table, you want to apply the regular expression on. Under the section “Format to Convert”, you have to type in the actual regular expression. At the bottom of this box, there is a checkbox for case insensitivity. If the results of the regex are unexpected, it can be due to this checkbox.

In the last section “Output Method”, you can choose, which action you want to use.

Output Methods

There are four possible output methods: replace, tokenize, parse and match.

If you want to replace for example an underscore with a dash, you can use the replace method; in addition you have to specify the “Replacement Text”- the text/characters, you want to see instead of the original.

The tokenize method is useful, if you want to split a column into different columns or rows. If you want to extract several hashtags out of one string ( and you don’t know, how many hashtags there are actually ), use the tokenize method. If you use the “Split to Columns”-option, you have to select the numbers of columns- this is why you should choose the “To Rows”-option, if you are not absolutely sure about the matches in your string.

The parse method extracts the wanted part of the string and outputs it automatically into a new column. For making the parse method work, you have to put the regular expression into brackets as a capture group ( see below ), otherwise Alteryx will throw an error.

The Matching method just returns a new column with True, if the regular expression matched and with False, if not.

Regexes with Formula and Filter Tool

In the formula and in the filter tool, you can choose between three different regex options. They are accessible over the formula sign (fx) under the section “String”. The options are REGEX_CountMatches, REGEX_Match, REGEX_Replace; the usage is slightly different from the usage in the Regex tool and slightly more complicated.

Regular Expressions

Regular expressions are a sequence of characters, that describes specific groups of characters. There is a collection of characters to describe different groups of characters. Some of the most important and mostly used are “[a-z]” or “\l” for small case letters, “[A-Z]” or “\u” for upper case letters, “\d” for digits, “\w” for digits, letters ( lower and upper case and underscore ); the “\D” excludes all digit and the “\W” excludes everything, the “\w” includes; the “\s” means space and the “\S” means no space. The most powerful character is the “.”; it’s the wildcard, which describes any single character.

The above mentioned characters are so called qualifiers; they qualify the searched character/string.

To extend the abilities of regular expressions, there exist quantifiers besides those qualifiers. Quantifiers- as the name speaks- give the possibility, to specify, how often a single character has to be in the searched string. The “*” means, the character of the regular expression has to occur zero times or more, the “+” means, it has to occur one time or more. Both of these quantifiers are greedy- they return as much characters in their result as possible. For a non-greedy search, one can use the “?”, what means zero or one occurrence of the searched character. A number in curly brackets “{2}” means, the searched character has to exactly occur as much times, as written in the curly brackets.

Besides that, you can specify the needed occurrence with “{x,y}”, what means a range between x and y or “{x,}”, what means the minimum occurrence.

It is possible, to combine the different qualifiers and quantifiers to exactly describe, what is needed.

With brackets (), it is possible to define capture groups. Capture groups provide the possibility, to treat multiple characters as a single unit.