Build a Multilayer Perceptron Regressor with Numpy

There are various frameworks available to help anyone interested in coding a Neural Network. Popular options include TensorFlow and Pytorch. Despite the availability of such tools, it is still very useful to work through the example of building a Neural Network from the ground-up. In this way, we can gain a deeper understanding and intuition for how these powerful algorithms work. In this post, we’ll work through the implementation of a Neural Network regressor from scratch. Our sole dependancy will be with the Numpy package, a popular Python library for numerical calculations.

Table of Contents

Build a Multilayer Perceptron Regressor with Numpy – image by author

Model Description

At a high level, the type of Neural Network we will build here is a Multilayer Perceptron Regressor. Another common name for this type of model is a Feedforward Neural Network Regressor. I described the technical details, for the exact architecture we will work through here, in my previous post on how these algorithms learn. Please refer to that article if you’re interested in the mathematical details of the implementation.

Pictorially, the architecture of the model we will build is as follows:

Figure 1: Schematic of the 3-layer Neural Network Regressor to be built. Input, hidden, and output layers are highlighted, and the relevant activation functions are written within each neuron.

The densely connected Neural Network illustrated in Figure 1 consists of 3 layers: a input, hidden, and output layer. The raw input predictor features ( $x1$ , $x2$ , $x3$ ) are clamped to the input layer, that distributes these features to the hidden layer. The hidden layer consists of 5 neurons with sigmoid activation functions. The final output layer is made up of 2 neurons with linear activation functions. The choice of linear activations in the output layer is necessary for regression Neural Networks.

At a high level, the learning procedure will follow this sequence of steps:

$\text{for e} = 1..epochs:$
Do a Forward Pass
Do a Backward Pass
Update the model weights

For the sake of simplicity, the Backward Pass will make use of backpropagation with batch gradient descent. As such, we will only need to iterate through our training procedure for the number of epochs desired (as all the training data will be applied when updating the weights at each iteration).

To generate predictions we will need to do a single Forward Pass, with a trained model, on a set of input predictor features.

Implementation of a Multilayer Perceptron Regressor

We can now proceed to implement our Neural Network Regressor. As mentioned previously, if you’re interested in the technical details for how this algorithm is structured mathematically, please see my previous article on this subject.

To start let’s import all the packages required:

# imports
from typing import List, Tuple
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
import matplotlib.pyplot as plt

I will encapsulate the implementation within a single class, MLP352Regressor. Let’s now setup the initialiser and destructor functions:

class MLP352Regressor(object):
    """
    Class to encapsulate a Multilayer Perceptron / FeedForward Neural Network regressor
    """
    
    def __init__(self, lr: float=1e-2, epochs: int=100) -> None:
        """
        Initialiser function for a class instance
        
        Inputs:
            lr      -> learning rate
            epoches -> number of epoches to use during training
        """
        self.lr      = lr
        self.epochs  = epochs
        self.layers  = [3,5,2]
        self.weights = []
        self.biases  = []
        self.loss    = []
        
    def __del__(self) -> None:
        """
        Destructor function for a class instance
        """
        del self.lr
        del self.epochs
        del self.layers
        del self.weights
        del self.biases
        del self.loss

The loss I will use is the squared difference error. We can implement this along with it’s derivative with respect to the predictions $\vec{y}^{pred}$ :

    def _loss(self, y_true: np.array, y_pred: np.array) -> np.array:
        """
        Function to compute squared-error loss per sample
        
        Inputs:
            y_true -> numpy array of true labels
            y_pred -> numpy array of prediction values
        Output:
            loss value
        """
        return 0.5*(y_true - y_pred)**2
    
    def _derivative_loss(self, y_true: np.array, y_pred: np.array) -> np.array:
        """
        Function to compute the derivative of the squared-error loss per sample
        
        Inputs:
            y_true -> numpy array of true labels
            y_pred -> numpy array of prediction values
        Output:
            loss value
        """
        return -(y_true - y_pred)

Both sigmoid and linear activation functions will be used in this class. Let’s define these, along with their derivatives $\frac{df}{dz}$ :

    def _sigmoid(self, z: np.array) -> np.array:
        """
        Function to compute sigmoid activation function
        
        Input:
            z -> input dot product w*x + b
        Output:
            determined activation
        """
        return 1/(1+np.exp(-z))
    
    def _derivative_sigmoid(self, z: np.array) -> np.array:
        """
        Function to compute the derivative of the sigmoid activation function
        
        Input:
            z -> input dot product w*x + b
        Output:
            determined derivative of activation
        """
        return self._sigmoid(z)*(1 - self._sigmoid(z))
    
    def _linear(self, z: np.array) -> np.array:
        """
        Function to compute linear activation function
        
        Input:
            z -> input dot product w*x + b
        Output:
            determined activation
        """
        return z

    def _derivative_linear(self, z: np.array) -> np.array:
        """
        Function to compute the derivative of the linear activation function
        
        Input:
            z -> input dot product w*x + b
        Output:
            determined derivative of activation
        """
        return np.ones(z.shape)

We can now put together all the code needed for a single Forward Pass through the network:

    def _forward_pass(self, X: np.array) -> Tuple[List[np.array], List[np.array]]:
        """
        Function to perform forward pass through the network
        
        Input:
            X -> numpy array of input predictive features with assumed shape [number_features, number_samples]
        Output:
            list of activations & derivatives for each layer
        """ 
        # record from input layer
        input_to_layer = np.copy(X)
        activations    = [input_to_layer]
        derivatives    = [np.zeros(X.shape)]
        
        # hidden layer
        z_i = np.matmul(self.weights[0],input_to_layer) + self.biases[0]
        input_to_layer = self._sigmoid(z_i)
        activations.append(input_to_layer)
        derivatives.append(self._derivative_sigmoid(z_i))
        
        # output layer
        z_i = np.matmul(self.weights[1],input_to_layer) + self.biases[1]
        input_to_layer = self._linear(z_i)
        activations.append(input_to_layer)
        derivatives.append(self._derivative_linear(z_i))
        
        # return results
        return(activations, derivatives)

Similarly, the logic required for a single Backward Pass can also be placed in its own function:

    def _backward_pass(self, 
                       activations: List[np.array], 
                       derivatives: List[np.array], 
                       y: np.array) -> Tuple[List[np.array], List[np.array]]:
        """
        Function to perform backward pass through the network
        
        Inputs:
            activations -> list of activations from each layer in the network
            derivatives -> list of derivatives from each layer in the network
            y           -> numpy array of target values 
                           with assumed shape [output dimension, number_samples]
        Output:
            list of numpy arrays containing the derivates of the loss function wrt layer weights
        """ 
        # record loss
        self.loss.append((1/y.shape[1])*np.sum(self._loss(y, activations[-1])))
        
        # output layer
        dl_dy2 = self._derivative_loss(y, activations[2])
        dl_dz2 = np.multiply(dl_dy2, derivatives[2])
        dl_dw2 = (1/y.shape[1])*np.matmul(dl_dz2, activations[1].T)
        dl_db2 = (1/y.shape[1])*np.sum(dl_dz2, axis=1)
        
        # hidden layer
        dl_dy1 = np.matmul(self.weights[1].T, dl_dz2)
        dl_dz1 = np.multiply(dl_dy1, derivatives[1])
        dl_dw1 = (1/y.shape[1])*np.matmul(dl_dz1, activations[0].T)
        dl_db1 = (1/y.shape[1])*np.sum(dl_dz1, axis=1)
        
        # return derivatives
        return([dl_dw1, dl_dw2], [dl_db1, dl_db2])

Let’s write one last protected function to take care of updating the model weights and biases:

    def _update_weights(self, dl_dw: List[np.array], dl_db: List[np.array]) -> None:
        """
        Function to apply update rule to model weights
        
        Input:
            dl_dw -> list of numpy arrays containing loss derivatives wrt weights
        """
        self.weights[0] -= self.lr*dl_dw[0]
        self.weights[1] -= self.lr*dl_dw[1]
        self.biases[0]  -= self.lr*dl_db[0].reshape(-1,1)
        self.biases[1]  -= self.lr*dl_db[1].reshape(-1,1)

We now have all the protected member functions done for this class. Let’s now write two public functions that a developer can interact with, fit and predict:

    def fit(self, X: np.array, y: np.array) -> None:
        """
        Function to train a class instance
        
        Inputs:
            X -> numpy array of input predictive features with assumed shape [number_samples, number_features]
            y -> numpy array of target values with assumed shape [number_samples, output dimension]
        """
        # initialise the model parameters
        self.weights.clear()
        self.biases.clear()
        self.loss.clear() 
        for idx in range(len(self.layers)-1):
            self.weights.append(np.random.randn(self.layers[idx+1], self.layers[idx]) * 0.1)
            self.biases.append(np.random.randn(self.layers[idx+1], 1) * 0.1)         
        # loop through each epoch
        for _ in range(self.epochs):
            # do forward pass through the network
            activations, derivatives = self._forward_pass(X.T)
            # do backward pass through the network
            dl_dw, dl_db = self._backward_pass(activations, derivatives, y.T)
            # update weights
            self._update_weights(dl_dw, dl_db)   
            
    def predict(self, X: np.array) -> np.array:
        """
        Function to produce predictions from a trained class instance
        
        Input:
            X -> numpy array of input predictive features with assumed shape [number_samples, number_features]
        Output:
            numpy array of model predictions
        """
        # do forward pass through the network
        activations, _ = self._forward_pass(X.T)
        # return predictions
        return activations[2].T

Create a Toy Dataset

Let’s produce a dataset by making use of scikit-learn’s make_regression, with 10000 samples, 3 predictive input features, and targets with a dimension of 2. Some minimal gaussian noise is also added:

# generate data
X, y = make_regression(n_samples=10000, n_features=3, n_targets=2, noise=1, random_state=42)

We can produce some box plots to get a sense of the distribution in these data:

plt.boxplot(X)
plt.title("Distribution of Input Predictive Features")
plt.xlabel("feature")
plt.ylabel("input unit")
plt.show()

Figure 2: Distribution of input predictive features. All three features are approximately zero-centered, and range mostly between (-3,+3). Some outliers are present.

plt.boxplot(y)
plt.title("Distribution of Targets")
plt.xlabel("target")
plt.ylabel("output unit")
plt.show()

Figure 3: Distribution of output targets. The two features are approximately zero-centered, and range mostly between (-300,+300). Some outliers are present.

Finally, we can do a train-test split. We’ll allocate 80% of the data for training, and the remainder will be left for testing:

# perform train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Test our Multilayer Perceptron Regressor

We now have everything in place to test out our Neural Network! Let’s create an instance of our implemented class, and train it on the training data. I’ll use a learning rate of 0.01, and we’ll train for 1500 epochs:

# train the regressor
model = MLP352Regressor(lr=0.01, epochs=1500)
model.fit(X_train,y_train)

As a sanity check, let’s view the recorded loss as a function of epochs:

# plot the training loss
plt.plot(model.loss)
plt.title("Training Loss")
plt.xlabel("epochs")
plt.ylabel("loss")
plt.show()

Figure 4: Training loss as a function of epochs.

Plotting the training loss shows a smooth decline in the loss value as each subsequent epoch is executed. This highlights that our training procedure is working as expected. Now let’s evaluate how well our regressor performs in making predictions:

# generate predictions
y_pred = model.predict(X_test)

# evaluate model performance
print(f"mse: {mean_squared_error(y_test, y_pred):.2f} and mae: {mean_absolute_error(y_test, y_pred):.2f}")

mse: 664.84 and mae: 16.53

The MAE result indicates that the bulk of the predictions are fairly close to the true values. However the large MSE value shows there are outliers where the model hasn’t made very good predictions. Let’s view a few samples to check for this:

y_test

array([[  96.0652615 ,  105.2753081 ],
       [  70.01592472,  169.33894413],
       [ 136.81237065,  165.87554143],
       ...,
       [ -76.98996266, -113.78059034],
       [ 206.96193285,  256.91509498],
       [ -39.808533  ,   -9.71756154]])

y_pred

array([[  97.76151633,  101.12347451],
       [  70.88374713,   99.54522216],
       [ 136.31562322,  126.56560826],
       ...,
       [ -85.5870685 , -100.21340972],
       [ 169.65014163,  141.53248125],
       [ -46.55959464,   -9.32616964]])

Looking at these numbers, it’s evident that while most of the visible predictions are fairly reasonable, there are some cases where the deviation between prediction and truth is significant. This isn’t necessarily surprising, as Figure’s 2 & 3 do show the presence of outliers in both the input predictive features and targets. It is plausible that the presence of these outliers, in the training data, has degraded the overall performance of our regressor.

Final Remarks

In this post we have covered:

The general architecture of a Multilayer Perceptron Regressor.
How to implement a Multilayer Perceptron Regressor from scratch just using numpy.
How to test our implementation to confirm that it works.

I hope you enjoyed this article, and gained some value from it. If you would like to take a closer look at the code presented here, please take a look at my GitHub. If you have any questions or suggestions, please feel free to add a comment below. Your input is greatly appreciated.

Hi I'm Michael Attard, a Data Scientist with a background in Astrophysics. I enjoy helping others on their journey to learn more about machine learning, and how it can be applied in industry.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.