SQL and Snowflake #5: LODs, ranks and CTEs

When learning a new skill, it is always important to practice, practice and practice! Thus, I have been using the Preppin' Data Challenges to hone my skills in SQL and Snowflake. These are weekly exercises to help you learn and develop data preparation skills. The challenges are designed for learning Tableau Prep but they can be completed in R, Python, SQL, DBT, EasyMorph and many other tools.

To start, I will be completing challenge 2023- Week 5. This challenge involves comparing the bank transactions to target figures. The link to the challenge is: https://preppindata.blogspot.com/2023/02/2023-week-5-dsb-ranking.html

To start, I brought the data in to inspect it, using the select * to bring in all columns from the tables.

The main table needs to be at the Bank and Month level, but before that we need to do some small transformations.

We create the bank code by splitting out off the letters from the Transaction code. We use the split_apart function to take keep the letters after the first '-', I called this field 'Bank'.
We change transaction date to the just be the month of the transaction. First, we format the transformation_date field (which is a string) to a date field, which we then turned it into a month using the monthname function.
We total up the transaction values using sum(value)
Finally, we rank each bank for their value of transactions each month against the other banks. 1st is the highest value of transactions, 3rd the lowest.

This is how we structure a rank in SQL

The next part of the challenge requires us to add on two extra columns that are at different levels of aggregations, without losing all of the other data fields. We can either use subqueries or CTEs join these columns, I decided to go with CTEs due to their readability.

I put the first part in a CTE using the label t1.

The first field we need is the average rank a bank has across all of the months, call this field 'Avg Rank per Bank'. Using a second CTE (t2), we reference the first table t1 and aggregate the data to find the avg_rank_per_bank.

Finally, we need to find the average transaction value per rank, call this field. Using a third CTE (t3), we reference the first table t1 and aggregate the data to find the avg_trans_per_rank.

Finally, we need to select all the field that we need (with the correct lable/alias). Then we need to perform two joins. First, connecting t1 and t2 on the bank field and the second connecting t1 and t3 on the bank rank.

This is the full query and table. Again, I feel the CTEs make the code easier to read and interpret.

And that's everything! This Preppin' Data Challenge has been useful to practice LODs, ranking and CTEs

Author:

Harvey Joyce

View Profile