Catalyzing Business Insights | Leveraging Alteryx Cloud's Machine Learning

by Calvin Gao

Recently, I attended an Alteryx Cloud Training where Alteryx Trainers demonstrated the Machine Learning Product. Here is everything I learned:

Functions of Alteryx Machine Learning

Alteryx Machine Learning helps uncover insights in data and builds optimized machine learning models through functionalities such as education mode, powerful models, and explainable AI.

Specifically, Alteryx can help with 

1. Analyzing and preparing data
2. Making models
3. Feature engineering and feature selection
4. Model tuning

Features of Alteryx Machine Learning

  • Education mode helps the user to learn about machine learning best practices, terminology and what to do next during every step of the process.
  • Explainable AI communicates methodology and outcomes throughout each step.
  • Users can build machine learning models using industry standard algorithms in a cloud-base AutoML platform.
    • Methods available on Machine Learning
      • Classification for categorical prediction.
      • Regression for numerical prediction.
      • Time Series for trend/pattern prediction.

Stages to use this Product

The 4 Stages of the Machine Learning Product (Image 1)

Step 1: Problem Setup

First, familiarize yourself with the business context, formulate a pertinent question, and then proceed to data understanding and preparation. This stage is where Alteryx Machine Learning requires some additional input from you.

Once the data is prepared, it needs to be uploaded to initiate the machine learning process. Alteryx will conduct parsing, which may take a few moments. Upon completing the initial parsing, Alteryx Machine Learning provides an opportunity to review and modify the data before advancing to the subsequent stage.

For instance, you may identify a variable as a unique identifier. In such cases, Alteryx enables you to designate it as an "ID column" (see Image 2).

Mark as ID (Image 2)

Other variables typically only offer the option to drop the column if it's not relevant to your analysis (see Image 3). However, it is important to note that if you choose to drop a column and later realize you need it, you will have to reupload the entire dataset.

Drop Column (Image 3)

Aside from the data profiling view, which presents the dataset comprehensively, Alteryx offers a feature to promptly identify outliers and eliminate rows containing them by selecting the "Outliers" option. This action generates a concise table displaying the outliers within the dataset (See Image 4).

Outliers (Image 4)

There is also the option to access data insights, including a correlation matrix (See Image 5) and a chord diagram for quick visualization (See Image 6). Additionally, Alteryx enables you to adjust the data type of columns as needed (Image 7).

Correlation Matrix (Image 5)

Chord Diagram (Image 6)

Manage Columns (Image 7)

Next, you will need to choose a target column (See Image 8), representing the variable you aim to predict. Upon selection, Alteryx will provide automatic recommendations for suitable machine learning models (See Image 9).

Target Column (Image 8)
Machine Learning Method (Image 9)

Step 2: Auto Model

This stage encompasses selecting modeling techniques and fine-tuning hyperparameters to their optimal values. Each time you progress forward or backward a step, the education mode popup appears on the right side of the website (if education mode is enabled). This feature proves quite helpful as it refreshes my understanding of machine learning (See Image 10).

During this stage, Alteryx will iterate through numerous available models, determined by the machine learning model chosen in the previous step. It will rank these models according to the specified ranking metric (See Image 11). In this instance, Alteryx recommends the Elastic Net Regressor, as it achieves the highest ranking metric score (R2). Additionally, Alteryx provides options for feature engineering (See Image 12) and advanced model settings (See Image 13).

Education Mode Popups (Image 10)
Leaderboard (Image 11)
Feature Engineering (Image 12)
Advanced Model Settings (Image 13)

Step 3: Evaluate Model

After finalizing all settings and configurations for the selected model, Alteryx will assess the model's performance using the 20% of data that it reserved from the original dataset (the 80/20 method). This evaluation provides general information on the model's effectiveness (see Image 14). 

The 80/20 method, also known as the Pareto Principle, is a widely used technique in data analysis, machine learning, and model evaluation. It involves splitting a dataset into two portions: 80% for training the model and 20% for testing its performance. This approach helps to ensure that the model is trained on a sufficiently large dataset and tested on unseen data to evaluate its predictive abilities. Alteryx machine learning automatically splits the dataset at the start of the machine learning process. However, if you wish to hold out, for example, 30% of the dataset, you can adjust this through the Advanced Model Settings in the Auto Model Step (see Image 13).

Evaluate Model (Image 14)

Evaluate Model Advanced Insights (Image 15)
Evaluate Model Simulations (Image 16)

Step 4: Export and Predict

Alteryx also empowers users to download this stage as a PowerPoint presentation or a zip file containing all the images. Additionally, the model can be exported as a YXMD file for Designer or even as a Python script. Should there be new data available, Alteryx offers the option to upload it for prediction.

Export and Predict (Image 17)

Some Personal Thoughts: Based on my experience in the data science field, the conventional workflow typically entails selecting a model guided by our understanding of the data or the specific problem at hand. This is often followed by manual coding in R or other statistical software to evaluate the model's performance, focusing on metrics like the R-squared value. One significant advantage of leveraging Alteryx Machine Learning is its ability to swiftly execute a variety of models with just a few clicks, provided that the data is properly prepared. This streamlined process eliminates the need for manual coding and running multiple models. Furthermore, Alteryx facilitates seamless adjustments to models, metrics, and hyperparameters. Notably, Alteryx Machine Learning incorporates a Version Control feature known as Sessions, enabling easy tracking and reverting to previous model iterations if needed. It's worth mentioning that the dataset used in this example comprises approximately 300 rows. As of now, I have not had the opportunity to assess the computational demands of handling larger datasets.