Cleansing and Organizing Data in Tableau Prep

by Serena Purslow

My first Friday at the Data School meant my first presentation. I had 3.5 hours to create a presentation on Cleansing and Organizing Data in Tableau Prep, with the objective of teaching the topic to a Tableau Prep newbie (i.e. myself the day before yesterday).

My presentation topic was cleansing and organizing data using different workflow steps in Tableau Prep. Below are the slides and notes from my first presentation.

What is Tableau Prep?

  • A Tableau tool designed to help you prepare your data for use in things like data visualization and analysis.
  • Like Tableau Desktop, it is designed to make data tasks such as cleaning, shaping and combining data, simple and intuitive.
  • Allows you to clean data much quicker than in an application such as Excel.

What can you do with Tableau Prep?

Two very important and useful functions of Tableau Prep are its ability for:

  • Cleansing data
  • Organizing data

Cleansing Data:

  • When data is brought in via an Excel file, it is often ‘dirty’.
  • This means the data is not yet suitable for analysis.
  • There may be duplicated of rows, misspelt variable names, blank spaces and general poor formatting.

Tableau Prep allows you to clean data and rectify these issues so that you can start analysis on a squeaky clean dataset.‌‌

Organizing Data:

  • Raw data may not be organized in a way most useful to getting insights out of your data.
  • Tableau prep allows you to organize your data to make it more usable.
  • This may look like renaming fields to be more meaningful, grouping values, or changing dates to suit your needs, based on what outcome you want from your data.

‌‌Navigating Tableau Prep

A - Connections Pane

  • This is where you connect to your data sources.

B - Flow Pane

  • This pane allows you to display your data prep steps as you go along. You can build your workflow here.

C - Profile Pane

  • This pane shows you the structure of your data, as you go along. You can represent this in different ways depending what you want to check.
  • This also is where you can find lots of your data cleaning options, and can move fields around, rename them, and use automatic processes implemented by Tableau.

D - Data Grid

  • This gives you a snapshot of your data at row level - good for visualizing the changes you have made to your data.

E - Changes Pane

  • This pane allows you to track the changes you have made to your data, and edit, remove or add steps.
  • If you make a mistake, this is the place to check.
  • A workflow is basically a bunch of steps you have taken to get your data from point A (dirty data) to point B (clean data).
  • It is built up of different workflow steps.
  • Once you have created a workflow, you can automate the steps you have taken.
  • This means, if you get a weekly report that is ALWAYS dirty, but has the same structure. You can load it into Tableau prep, and run it through the workflow you created months ago, saving you time and energy.
  • Tableau Prep allows you to clean your data, and you can do this by adding a workflow step.
  • It is a good idea to label your steps as you go along, allowing you to keep track of your work, and also helping others understand what steps you have taken.
  • For this cleaning data workflow step, I wanted to change the date values from regular dates, to ‘days of the month’.
  • Here you can see the data pre-cleaning - dates are recorded in the format dd/mm/yyyy.
  • Tableau has in-built functions allowing you to clean your data.
  • Following these steps allowed me to change Date to ‘Days of the Month’.
  • Here you can see the cleaned field - Date is now showing the date values as days of the month.
  • You can add as many workflow steps as you need, as you build your workflow.
  • It can be useful to add a new workflow step for different cleaning processes, allowing you to annotate your work as you go.
  • As you can see here, I have built my workflow using different cleaning steps, to take my workflow from A to B.
  • An example of organizing data is renaming fields.
  • Here I have added a workflow step in which I want to rename two fields.
  • By double clicking on the field name, Tableau prep allows you to rename the field.
  • Here I have renamed my chosen field 'Store'.
  • I renamed another field 'In store or Online'.
  • Using these workflow steps to clean and organize data, I took my data set from this dirty dataset....
  • To this lovely clean one!

Here are a few pointers to keep in mind, when using Tableau Prep.