Join Me! - Working with Multiple Data Sources

Core Concepts of Data Preparation - Working with Multiple Data Sources

What is Data Preparation?

According to the Collins dictionary, it’s ‘the process of converting data or information into a form that can be read by a computer, so that the data can then be entered into a computer’. However, we take this a step further. Beyond just structuring data for computer processing, data preparation also involves analysing its structure to ensure fields contain relevant data in the correct format, ultimately enabling effective data analysis.

Data preparation isn't just about formatting; it's also a crucial "sense check" to ensure data is clean. This means it should be: up-to-date, have missing values identified and remedied, and follow these key rules of data structure:

  • One data field per category or measure
  • One data type for each data field
  • A single date column where possible
  • Each row should be a unique record containing all relevant values

... but remember, the specifics often depend on your data's intended use.

Understanding Joins: Combining Multiple Data Sources

A core part of data preparation, is the concept of Joins, used for combining multiple data sources. Joins help create a single, unified input - which is what most analytical tools expect to read from. This process ensures that the combined data sources have matching granularities, unnecessary data fields are removed, and related data fields are effectively connected. Essentially, joins combine columns from two tables, enriching your data by linking them based on matching field values.

There are two key elements to consider when performing a join:

  • Join Condition/ Clause: The core logic that defines how the two tables and the corresponding field(s) match. i.e. the clause will specify the field(s) that is the same in both.
  • Join Type: This determines how the tables are combined. Types include: Inner, Left (Outer), Right (Outer), Left Only, Right Only, Not Inner, and Full (Outer).

When combining two tables, the first step is to understand that one table will be designated as the "left" table and the other as the "right" table. Next, you'll identify the join condition or clause - the field(s) that match between the tables and on which you'd like to combine them. Finally, you'll combine the tables using the chosen join type. These types are best understood by visualising how the tables 'overlap'.

Inner Join: This join keeps only the records that have a match in both tables. Any records without a match in either table are discarded.

Inner Join

Left (Outer) Join: This join includes all values from the left table and any corresponding matches from the right table. If a record in the left table doesn't have a match in the right table, the fields from the right table will show as a null value.

Left (Outer) Join

Right (Outer) Join: This join is the opposite of a left join. It includes all values from the right table and any corresponding matches from the left table. If a record in the right table doesn't have a match in the left table, the fields from the left table will show as a null value.

Right (Outer) Join

Full (Outer) Join: This join includes all records from both the left and right tables. Where a match exists, the data is combined. Where there is no match in either table, the corresponding fields will be filled with null values.

Full (Outer) Join

Left Only Join: This join includes only the records from the left table that have no match in the right table. The fields from the right table will appear as null. This join is useful for finding unique records in one table that don't exist in another.

Left Only

Right Only Join: This join includes only the records from the right table that have no match in the left table. The fields from the left table will appear as null.

Right Only

Not Inner Join: This join includes all rows from both the left and right table that do not have a match in the other table. Essentially the opposite of an inner join.

Not Inner

Mastering joins is a critical skill for any data preparation. They allow us to combine and enrich our datasets, transforming raw data into a powerful tool for insight and analysis.

Author:
James Gastaldello
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab