What is Snowflake & why is it used?
- Snowflake offers a cloud-based data storage and analytics service
- Snowflake is an SaaS (software as a service) - which just means the software is licensed on a subscription basis and is centrally hosted. [Also known as; "on-demand software" and Web-based/Web-hosted software]
- As is it an SaaS - there is no hardware or software to install, configure or manage. All the maintenance, management, upgraded are handled by snowflake - Fully Serverless Model & No User Maintenance
- Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.
Things to note about Snowflake:
- Log in through a browser - making it easy to use different devices - Not required to download a software to log into snowflake
- Snowflake using SQL querying language
- With Snowflake, the script that you have written in your worksheets is automatically saved even if you change to another device - because everything is stored in the cloud.
- supports 3 cloud platforms – AWS (Amazon Web Services), Azure and GCP (Google Cloud Platform).
- No limits on storage capacity (super cheap)
- Priced per second of compute
- Data Sharing between account / clients
Key competitors of Snowflake:
- Databricks - All your data, analytics and AI on one lakehouse platform
- Google Big Query - BigQuery is a serverless, cost-effective and multicloud data warehouse designed to help you turn big data into valuable business insights.
- Amazon Redshift - Analyze all of your data with the fastest and most widely used cloud data warehouse
Data Terms:
- Data Warehouse - a collection of databases, storing lots of different types of data from various sources. This becomes an almost library of data sources. Each person can 'own' multiple warehouses.
- Database - an organised collection of tables that contain rows and columns of structured data.
- Data Lake - a repository where structured and unstructured data is stored before it is brought into a data warehouse. An almost 'dumping ground' of data - It can store data in its native format and process any variety of it, ignoring size limits.
ETL & ELT:
In ETL, the data transformation step happens before data is loaded into the target (e.g. a data warehouse). In ELT, data transformation is performed after the data is loaded into the target.
- Extraction - getting your data out of a source system. E.g. - CRM systems
- Transformation - blending, shaping and trimming your data - this can be done withing snowflake
- Load - load into your database or Tableau
Snowflake can use both ETL & ELT.