Data Cube - An introduction to the concept

In layman terms, a data cube refers to a multi-dimensional data structure - so, data within a data cube is shown by specific dimensional values.

It doesn't always live in 3 dimensions even though it is called a 'cube' - it can be 2-dimensional in the form of a rectangle, or 3D or even 4D (which is a little hard to visualize).

As a concept, it is a little hard to comprehend without being able to visualize it, and visualize it through a use case or an example so lets dive right in.

Lets take a company that sells technology, they sell 4 items - mobiles, TVs, laptops and headphones. Below we have 2022 data of the sales made in a one-dimensional array;

The company, however, are selling these products accross four cities in the UK - London, Manchester, Leicester, Birmingham. So we need to split this table further by products and cities resulting in;

Now we can calculate the sum across each column for products and each row for cities. Great. We can add another column and/or row with 'ALL' for their totals.

But... what if we wanted to split this revenue by date? It would be easiest to have 4 tables for each quarter or 12 tables for each month. Simple. However, this is where data cubes come into polay. We can create a third dimension called 'Date' by stacking all 4 tables together:

There is now one more dimension we can sum over. As an example, we'll calculate the sales for a full year of mobile purchases in Leicester:

£1,000 + £1,000 + £1,000 + £1,000 = £4,000

Similar to the 'ALL' we would've created for Products and Cities, we can do this for Date as well. In this way, all 3 data cube dimensions will meet in a single data cell, storing the total sales for a full year.

Author:
Sherina Mahtani
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab