Cleaning data is an important part of preparing your data for working with it at any stage. By cleaning your data you are ensuring that the data is the correct quality. When preparing for analysis we want to spend extra care within the cleaning phase because there is always the potential of having data quality issues (look out for a future post about potential data quality issues).
Before diving into the Clean Step of Tableau Prep, I'd like to share why it's one of my favorite steps. IT IS EVERYWHERE! The beauty of the Clean Step is your ability to do certain operations in a variety of other steps throughout Tableau Prep. This means if you forget to do it at the beginning of your process you can most likely do it in another step.
For this breakdown, I will be using the Sample Superstore provided by Tableau. To begin working with Tableau Prep, you must first import a data source. This can be done by dragging and dropping into Tableau Prep or by clicking the + icon next to connections and selecting the data source you want to import (which can be seen below).
Once the data is imported you can then drag in one table to begin working with your data. I have chosen to work within the Orders table as shown below.
Now that we have added the Clean Step (labeled as Clean 1), we can begin diving into the steps' functionalities. The Clean Step has 10 options to help clean your data. For this, I will be looking into Filter and Clean.
Filter
Within Filter, you can perform the following:
- Calculations...: This will open a dialog box, that then allows you to enter in calculations, confirm that it is valid, and then save.
- Selected Values: Here you can pick which values you want to keep or exclude.
- Wildcard Match: A wildcard match will help you filter field values to keep or exclude values that match a specific pattern. There are 4 matching options: contains, starts with, ends with, and exact match.
- Null Values: Here you can filter values to include null values or only show non-null values.
Clean
Within Clean, you can perform the following:
- Make Uppercase: This will make all letters uppercase no matter their placement.
- Make Lowercase: This will make all letters lowercase no matter their placement.
- Remove Letters: Here you are able to remove all letters within a column.
- Remove Number: Will remove all numbers within a column.
- Remove Punctuation: This will remove punctuations such as : , . ; -
- Trim Spaces: Removes leading and trailing spaces.
- Remove Extra Spaces: Removes extra spaces.
- Remove All Spaces: This will remove all spaces even intentional ones.