DIY Gradient Descending Linear Regression

Published in

Analytics Vidhya

8 min readFeb 10, 2022

Why would you create a machine learning module that probably has been done ages ago and has been done much better? Well, I am not sure about you, but I have my reasons which I am going to share with those interested.

First of all, happy new year (solar / lunar, your choice)! Second of all, I know I still owe a second part to the article about discipline in online societies, and I apologise for its immense delay — but some things are worth waiting for (not necessarily this piece though, but some things). Third of all:

Recently, I got involved in making Python modules. Not so much for the sake of efficiency — probably for just any problem, there are modules already developed with better quality control, more efficient processing and more advanced features — but for the sake of learning new things. And by ‘things’, I believe we can distinguish three main areas.

The first is learning about the concepts one would like to package. In case if the concept is new to one, obviously, there is plenty to learn. But even if the concept is already known, making it into a module requires a more rigorous and systematic engagement with — especially abstract — subjects. Which brings us to the second point.
It is one thing to conceptualise something mentally / verbally, but it is quite another to conceptualise it as a piece of code. In doing so, one has to consider the relationships and applications of the real-world concept and then consider how these relations and use cases can be translated to a programming language with its unique methods, entities and limitations — as well as, tools.
And finally, learning how to create a module, what are its contents, requirements, nice to haves etc. and testing how it works. Also, how a module could possibly be utilised (or abused) by other users — and documenting it to that end.

So, if this all sounds exciting, I am just going to elaborate how I went about the above steps in creating my own linear regression module in Python.

The Concept

The linear regression module I created is based on gradient descent. Which apparently is unorthodox to use to this end; however, quite educative as well. Of course, it is not my genius that came up with the idea, but I read an article on Machine Learning Mastery that piqued my curiosity (so, huge-huge thank you, praises and all credit to Jason for creating such quality content all the time).

Frankly, gradient descent in detail was new to me. I was aware of the general idea, but not how its underlying mechanism works — and was still quite contempt. In either case, the concept in an ordinary use case is quite simple. You need four things: a prediction, the actual value you are trying to predict, a penalty coefficient (learning rate) and iteration.

What one does, is calculating the error, that is, the difference between the prediction and the actual value based on some model parameters. Then, using this error, one updates the model weights — how much each parameter would contribute to the outcome — as the difference of current parameters and the error multiplied by the learning rate.

Or:

weight(n+1) = weight(n) ⎯ learning rate * error

Which is:

weight(n+1) = weight(n) ⎯ learning rate * (model(n)(X) ⎯ y)

Where weight(n) feeds into model(n) and then weight(n+1) would feed into model(n+1) — of course. In the case of linear regression, the parameters would be the intercept and the variable coefficients. The process of updating these weights would be repeated for each data point in the set — and a full circle of updates is called an epoch.

Now, this all sounds fine and dandy, but let’s actually implement it in Python.

Simple Linear Regression

As a warm up, let’s just consider what a simple linear regression algorithm would look like. That is, a model using only one independent predictor:

Model = Intercept + Coefficient * Variable

So, we will have two weights, one for the intercept and one for the coefficient. Because these are often represented as the Greek letter beta, I used w_b0 for the intercept and w_b1 for the coefficient, while alpha is the learning rate:

w_b0 = [0]
w_b1 = [0]
errs = []
alpha = 0.01for n in range(len(data['X'])):
    error = (w_b0[-1] + w_b1[-1] * data['X'][n]) - data['y'][n]
    errs.append(error)
    w_b0.append(w_b0[-1] - alpha * error)
    w_b1.append(w_b1[-1] - alpha * error * data['X'][n])

Of course, one does not need to have a list of previous weights or even errors, but is interesting to see how those numbers change. If we let X and y be:

data = {'X': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'y': [1, 3, 2, 3, 5, 4, 8, 8, 7, 9]}

And we then run our algorithm through the data four times, we get the following prediction:

We can, of course, plot our errors too. Ideally, they would, in some rather quirky way, approach 0:

We can see that the graph is rather periodic, that is because we go through 10 different data points and calculate their error independently in each cycle. So, the pattern repeats every 10th step. Reassuringly, it appears to be stagnating around 0.

But I can tell that I already bore you so let’s take this thing to the next level…top floor!

Multiple Linear Regression

And other fun things. Well, what is different from the simple linear regression here? It is only that now we can have more than one predictors. So, our formula for the model would look like:

Model = Int + Coef(1) * Var(1) + Coef(2) * Var(2) + … + Coef(n) * Var(n)

The obvious problem here is that we might not necessarily know how many input variables we get. At this point I also thought the module should be made widely usable and so should take numpy arrays and matrices rather than lists. By this reasoning, instead of creating a ‘container’ for each variable’s weight, I went ahead and stored all values in an array. As the initial weight was 0, I utilised numpy’s zeros() function. I similarly used the empty() function to generate the container for my errors:

w_b0 = 0
w_bn = np.zeros(X.shape[1])
errs = np.empty(0)

Where X is the input of predictors. Since it would be in an numpy.ndarray format, we can use its .shape attribute to get the appropriate length for our weights array. We also keep the intercept as w_b0.

All we need to do now is generalise the previous formula over an arbitrary length of variables — and in a numpy compatible way:

for n in range(epochs): # going through n epochs
    for m in range(y.shape[0]): # for m data points    ## we calculate the error based on current weights
    error = (
             w_b0 +
             sum([(w_bn[i] * X[m, i]) for i in range(X.shape[1])])
             ) - y[m]    ## we add the current error score to the error list (errs)
    errs = np.append(errs, error)    ## we update the intercept
    w_b0 = (w_b0 - learning_rate * error)    ## for each predictor, we update the coefficient
    for x in range(X.shape[1]):
        w_bn[x] = (w_bn[x] - learning_rate * error * X[m][x])

The sum() part or calculating the error is just the Coef(1) * Var(1) + … + Coef(n) * Var(n) part of the linear regression formula using a list comprehension. In the bottom line, we now need 2 indices for X as it is a matrix of multiple different variables. So, for each y value, we iterate through all predictor variables in X.

And that really is the core of the module. The intercept and coefficients are saved as attributes of the model object and can be used for prediction as such:

np.array(
         [
          (
           self.intercept +
           sum(
               [
                X[m, n] * self.coefficients[n] for n in
                range(X.shape[1])
                ]
               )
           )
           for m in range(X.shape[0])
          ]
         )

Where the sum() statement, again, corresponds to Coef(1) * Var(1) + … + Coef(n) * Var(n) and the outmost list comprehension is iterating through all the different observations.

Some nice extras

While fitting a model and predicting outcomes is pretty neat already, I wanted to include some more, arguably useful — and arguably misplaced — add-ons. As the code is fairly simple and is available on m GitHub, I will not go into detail here and will only list the features:

calculating the best epoch based on different aggregate methods
MAE, MSE, RMSE and R² scores
built-in error graph based on different aggregate metrics — or raw error scores

I will also add that maybe the most interesting aspect about this part of the project was structuring the module. I tried to mimic the directory and module structure of scikit-learn — I will leave it to your judgement how well I did.

Let’s Blow Stuff Up!

Well, we might not be that radical, but I really wanted to give a demo of what the module does to finish this article. So, I will do a simple fit comparing scores to scikit-learn’s linear regression model and parade all the fancy features — ayy!

For this experiment, I will use seaborn’s penguin datset and use flipper_length_mm , bill_length_mm and bill_depth_mm to predict body_mass_g (even though, there is a decent chance of multicollinearity — but then again, we are here to blow things up). The data simply visualised looks like this:

I also decided to carry out a train-test split with 20% test set and a random state of 142. I then fitted both models lr_sk for scikit-learn’s version and lr_gd for mine.

lr_sk.fit(X_train, y_train)
lr_gd.fit(X_train, y_train, learning_rate = 0.000006, epochs = 150)

For the learning rate, I was manually messing around with values — some did actually blow up (hurray) — to find a more or less optimal one. With 150 epochs, the last one produced the best errors according to all MAE, MSE and RMSE metrics (which is not necessarily the case).

Error graph for the model using RMSE as aggregate metric

I then compared the two models against each other. Scikit-learn, not surprisingly, did considerably better than my DIY module, but it is the intention that counts.

On the train set I got *drumroll*:

scikit-learn: MAE: 302.1 — RMSE: 384.7 — R²: 0.756
my version: MAE: 359.1 — RMSE: 448.6 — R²: 0.668

On the test set:

scikit-learn: MAE: 346.1 — RMSE: 416.1 — R²: 0.784(?!)
my version: MAE: 454.1 — RMSE: 523.9 — R²: 0.658

But at least I used my own built-in methods to get my scores. To finish the article, here is a nice graph showing the predictions for both models (blue — the DIY version, and orange — the scikit-learn one) on the training and test data.

And there it is, your own neat and compact machine learning module out of nowhere. All in all, it was great fun, 10 out of 10 would recommend.

As always, thank you very much for making it to the end — if you are still here. I am truly indebted to you and hope I could entertain you for a while.