Before jumping to how to configure the linear regression tool in Alteryx, I will explain the basic idea of the model and the assumptions we made for the data if we use it to model our data.
In simple term, a simple linear regression uses the best straight line to explain the relationship of the independent variables (x-axis) and the dependent variables (y-axis). How the model get the 'best' linear line is by finding a line that outputs the minimum errors between the independent variables and the dependent variables and these errors are often called the residuals. These residuals are computed using the difference between the predicted values (the values on the linear line) and the actual values.
With that basic understanding, we need to be aware of the assumptions we made when we use a simple linear regression model:
- a linear relationship exists between the dependent variables and the independent variables
- the residuals must be or approximately normally distributed
- The independent variables cannot be highly correlated to one another (only if you do multiple linear regression)
How do we configure the linear regression tool in Alteryx?
NOTE: The linear regression tool doesn't come by default in Alteryx, you have to install RInstaller from the downloads
In this example we are going to explore whether we can use the petal width to predict the petal length in the Iris dataset using a simple linear regression. We can simply name the model under model name, choose the dependent variable under the "select the target variable" and select the independent variable(s) by checking the boxes under the "select the predictor variables" section. Put three browse tools on the outputs and hit run to see the results!
NOTE: If you are doing multiple linear regression, remember to check the correlation of the independent variables using association tool in Alteryx!