Below shows the order of operation for SQL. I will go through each of these with an example so you can see what the results are from doing this. This is in Snowflake.
But first what is SQL?
SQL is (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database. Like a lot of coding languages it has a order that you have to follow for this communication to work- this is what this blog is focusing on.
What Can SQL do?
We have to have a select and from in order for these operations to work. Below is a picture of the order:
This is basically saying you need to SELECT which columns you wish to look at FROM a certain table WHERE you have a condition for non aggregated data. If data is aggregated then we GROUP BY all the columns that are not aggregated. HAVING allows for conditions on these aggregate functions and finally if we want to ORDER BY ascending or descending.
Now lets consider each operation ( as mentioned above I will have to do select and From within one operation for it to work). I will be working in snowflake for this.
To run just one query we end it with ;
SELECT- this is to select the column you are interested in such as "Region" or " Sales", etc.
FROM- is the table we want to get it from- for this blog we will be focusing on just one table which is Superstore_Orders.
Example: Select all product names
As shown below this is what it looks like in snowflake:
when you run this, this is what is shown in the results pane:
Example: if the question said Select ALL the records from Superstore_Orders
You can use a * which will bring in all fields.
as shown below this brings back every column:
The other main select you will use is SELECT DISTINCT this will bring back the unique records in this field.
Example: Return a unique list of product names
Once again if we consider the results we can see that the row count has gone down from 9,994 to 1,849. This suggests that there are multiple product names that are the same.
WHERE- this is if you want to set a condition for example we want the region to be "Central" or "Sales">20000.
RULE this is a condition for non aggregated filters and dimensions. What this means is that we can look at sales here but we cant do sum of sales here- this can be done in HAVING.
Example: Get all columns and records in the Central region
Below shows once again how it is written in snowflake ( I like to write it in different lines because it makes it clear of the operations order).
The results we can see shows all the regions to be central this is because that is what we have set the condition to:
GROUP BY- this is often used if you are trying to aggregate something for example SUM("Sales") we would group by everything that is not being aggregated so for instance region and state. These are the different types of aggregation:
Example: Find the average profit by Sub-category where the region is Central
For this one I have used // to comment throughout to show what I meant by my original statement for group by and I have used ^^ to indicate the comment is for the line(s) above:
This is the following results. ( The column profit is avg profit but we had changed the alias to profit):
HAVING- this is a condition tool that is used on aggregate functions. So the difference being is if it has SUM or AVG or MAX, etc. in the select statement we can create a condition for sum of sales or avg profit ratio. This goes after group by because if it does not know what it is grouped by then there will be an error- this is quite like the summarize tool in Alteryx with the grouping and then the aggregation of what your interested in.
Example: What are the most profitable products in the Central Region that have an average discount of zero?
(I am trying to keep similar examples so that when I add a new operations order in the order is still clear). Below is how it looks like in snowflake whilst being commented- you don't have to comment as much as this but its trying to keep the order clear.
The results from this is:
ORDER BY- finally the last main operation is order by this allows you to sort your query whether you want it ascending or descending, whether that is alphabetical or highest to lowest.
Example: What are the most profitable products in the Central Region that have an average discount of zero? From largest profit to lowest.
This is the same as the example above just adding order by to the orders of operation. The results show the profit going from largest to smallest.
(There are other functions within SQL that can be added to this operations but these are the basic ones you will initially learn). I will do a second blog looking at some of these other functions that are really useful. This will look at
- IN
- AND
- OR
- BETWEEN
- LIKE
- TOP and LIMIT
- COUNT
- JOINS: inner, left, right and full outer
- IS NULL and IS NOT NULL