# Regression

This category groups posts on regression models. Each post focuses on a specific algorithm. The emphasis on understanding how these models actually work at a technical level. Here you will learn to build regression models in Python from scratch.

## Implement the KNN Algorithm in Python from Scratch

Implement the KNN Algorithm in Python from Scratch What is the KNN Algorithm? K Nearest Neighbours (KNN) is a supervised machine learning algorithm that makes predictions based on the ‘closest‘ training data points to our point of interest, in data space. We evaluate the closest data points through the use of a distance metric, of …

## Implement Gradient Boosting Regression in Python from Scratch

Implement Gradient Boosting Regression in Python from Scratch Motivation for Gradient Boosting Regression in Python In the previous post, we covered how Gradient Boosting works, and outlined the general algorithm for this ensemble technique. Gradient Boosting was initially developed by Friedman 2001, and the general algorithm is referred to as Algorithm 1: Gradient_Boost, in that paper. Furthermore, …

## Understanding the Gradient Boosting Regressor Algorithm

Understanding the Gradient Boosting Regressor Algorithm Motivation: Why Gradient Boosting Regressors? The Gradient Boosting Regressor is another variant of the boosting ensemble technique that was introduced in a previous article. Development of gradient boosting followed that of Adaboost. In an effort to explain how Adaboost works, it was noted that the boosting procedure can be …

## A Complete Introduction to Cross Validation in Machine Learning

A Complete Introduction to Cross Validation in Machine Learning What is Cross Validation? A natural question to ask, when building any predictive model, is how good are the predictions? Having a clear, quantitative measure for the expected model performance, is a key element to any machine learning project.   Cross validation is a family of …

## Understanding the Adaboost Regression Algorithm

Understanding the Adaboost Regression Algorithm Motivation: What is a Adaboost Regressor? In this post, we will describe the Adaboost regression algorithm. We will start with the basic assumptions and mathematical foundations of this algorithm, and work straight through to an implementation in Python from scratch. Adaboost stands for “Adaptive Boosting”, and this was the first boosting technique …

## Introduction to Simple Boosting Regression in Python

Introduction to Simple Boosting Regression in Python Motivation: Why Boosting? This post will consist of an introduction to simple boosting regression in Python. Boosting is a popular ensemble technique, and forms the basis to many of the most effective machine learning algorithms used in industry. For example, the XGBoost package routinely produces superior results in competitions …

## Build a Random Forest in Python from Scratch

Build a Random Forest in Python from Scratch Motivation to Build a Random Forest Model For this post, we will Build a Random Forest in Python from scratch. I will include examples in classification and regression. Bagging ensembles are an approach to reduce variance, and thereby increase model performance. In this algorithm, multiple weak learner models produce predictions …

## Coefficient of Determination

Coefficient of Determination Introduction The Coefficient of Determination is a metric for evaluating the goodness of fit for a linear regression model. This quantity is often defined as:    (1) where is the sum of squared errors, and is the sum of squared total variance. Let’s define these, along with the sum of squared regression …

## Mean Squared Error

Mean Squared Error Introduction In this post we’ll cover the Mean Squared Error (MSE), arguably one of the most popular error metrics for regression analysis. The MSE is expressed as:    (1) where are the model output and are the true values. The summation is performed over individual data points available in our sample. The advantage …

## Mean Absolute Error

Mean Absolute Error Introduction With any machine learning project, it is essential to measure the performance of the model. What we need is a metric to quantify the prediction error in a way that is easily understandable to an audience without a strong technical background. For regression problems, the Mean Absolute Error (MAE) is just such …