Intro to Tableau Prep

As part of Learning What the Data School Learns, I prepared a walkthrough of a Tableau Prep workflow which covers the following steps:

  1. Union
  2. String, Date, and Number manipulation
  3. Join
  4. Aggregate

The scenario is that you, a Tableau Prep user, have been given a task from the National Sales Director. The task is to collate regional sales data and calculate the region with the highest sales. Additionally, they want the data table to be anonymised using customer ID instead of customer name.

UNION

The first step is to union the data from all four regions into one table. In order to union the data correctly the field names must match and thankfully for this dataset they do. Once unioned, Tableau Prep will add an additional field called Table Names which can be removed.

DATE MANIPULATION

There are two date fields in the dataset, Ship Date and Order Date. Order Date however is in a different format to Ship Date.

By changing the data type of Order Date from String to Date, Tableau Prep recognises the parts of the string that is a date and will parse this data through. We can then duplicate Order Date and convert the date into a weekday, month, or quarter to analyse seasonal trends.

STRING MANIPULATION

As mentioned before, Customer Name is needed to obtain Customer ID from a separate lookup table. To do this we must clean the First Name and Last Name fields and then combine them together to create a new field called Customer Name.

To clean these fields the following steps are carried out:

  • Remove Punctuation
  • Remove Numbers
  • Make Titlecase
  • Trim Spaces

We then combine the fields together using a calculated field with the following formula:

[First Name] + " "+ [Last Name]

Another field that we want to clean is the Ship Mode field as we can see below there are a few spelling mistakes.

To fix this we use another powerful Tableau Prep tool, Group By, to group values together based on spelling. This will correctly group the values we want but Tableau Prep is not quite smart enough to decide which value in the group is the correct one. It assumes that the value that appears the most is the correct one which is not always the case. Regardless, we can rename the group to 'First Class'.

NUMBER MANIPULATION

Once again, one of the tasks given by the National Sales Director was to calculate regional sales. To do this we need to calculate sales at the row level using the Unit Price, Quantity and Discount fields in the calculate field shown below.

JOIN

Next we are going to use Customers Table and join it onto our current dataset to add Customer ID as a field. There are two things to consider when doing a join, the join clause and the join type. The join clause is where you identify the fields that are present in both tables that can be matched. The join type decides whether you want to keep all or some matches. The inner join type means only exact matches are joined together and returned.

As Customer ID is now added to the dataset we can remove any Customer Name fields and output this table.

AGGREGATE

To calculate regional sales we want to add an aggregate step to summarise our dataset. When aggregating we must decide what category/field we are grouping by and what calculation we want to run for each value in this field. As we are looking at the total sales for each region we would set up the aggregation as below.

And to finally answer our question, the region with the most sales is the West region. The workflow showcases some of the common use cases of cleaning data with Tableau Prep but there is so much more you can do with this powerful tool including splitting columns, adding new rows and pivoting data to name a few.

Author:
Joseph Darton
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab