Joins, Blends and Relationships: what is the difference?

You may have heard these terms used when using multiple datasets, but do you know the difference between them?  

When there is a common column in two tables, a join condition can be made which combines these columns. For instance, these tables can be joined below.

This results in a new table which is a combination of the original tables. This can be seen as a physical change to the dataset which in this teacher's case is helpful!

An alternative is to create a relationship, this joins tables of data when it is required. This allows for an aggregate step with grouping before joining, therefore the granularity of each individual record is not lost. This allows you to keep levels of details without manipulating the data which a join may do. An easy way to think about this is this occurs on an outer layer (termed the logical layer) from your physical tables.

Lastly, a blend joins the data tables at the front end after all the steps encoded in your analysis. This may be computationally more difficult to process.

Remember there is no perfect way to prepare your data, it all depends on the view you need.

Author:
Numa Begum
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab