Introduction to Linear Regression

Linear regression is a fundamental statistical method used to study the relationship between two variables by fitting a straight line to a set of observed data points. This technique is widely used in machine learning and data analysis to make predictions and understand connections between variables.

In this article, we’ll explore the basics of linear regression and implement a simple algorithm to predict house prices and classify whether an object is a cat. We’ll also discuss how to improve the algorithm’s performance using normalization and gradient descent.

Linear Regression for House Price Prediction

We’ll start by implementing a naive linear regression algorithm to predict the monthly rent of an apartment based on its features. In this example, we use the number of rooms and square meters as the independent variables and the monthly rent as the dependent variable. Here’s the Python code for the implementation:

import numpy as np

# I took the data from the local online rent website
data = [
    # [
    #   number of rooms,
    #   square meters,
    #   price,
    # ]
    [3, 62, 798],
    [1, 35, 454],
    [2, 38, 615],
    [3, 100, 1474],
    [1, 37, 491],
    [2, 80, 921],
    [2, 82, 983],
    [2, 80, 1044],
    [3, 107, 1290],
    [2, 80, 1413],
]

# This function represents the linear equation we use to make predictions. It takes weights w, a bias b, and an input
# item, and calculates the prediction by multiplying the weights with the input and adding the bias.
def model(w, b, item):
    # Calculate the prediction using the linear equation: w * item + b. Note that both `w` and `item` are vectors,
    # but that doesn't matter as we can use dot product
    return np.dot(w, item) + b

# This function calculates how much the weights w and the bias b should be adjusted. It finds the difference between
# the predicted values and the actual values, called errors, and computes the gradient (or slope) for both w and b.
def compute_gradient(data, w, b):
    # Separate the input features (x) and target values (y) from the data
    x, y = data[:, :-1], data[:, -1]
    # Make predictions using the model function
    predictions = model(w, b, x.T)
    # Compute the errors between predictions and actual target values
    errors = predictions - y
    # Calculate the gradients for weights (w) and bias (b)
    gradient_w = np.dot(errors, x) / len(data)
    gradient_b = np.mean(errors)
    return gradient_w, gradient_b

# This function updates the weights w and the bias b repeatedly, using the gradients computed in the previous function.
# It does this for a certain number of iterations, making the adjustments smaller and smaller with a factor called the
# learning rate alpha.
def gradient_descent(data, w, b, alpha, iterations):
    for i in range(iterations):
        # Compute the gradients for weights (w) and bias (b)
        gradient_w, gradient_b = compute_gradient(data, w, b)
        # Update the weights (w) and bias (b) using the learning rate (alpha)
        w -= alpha * gradient_w
        b -= alpha * gradient_b
        # Print the current iteration, updated weights, and bias
        print(f"Iteration: {i}, w: {w}, b: {b}")
    return w, b

# This function scales the input data to values between 0 and 1. It helps the algorithm to work better and converge
# faster.
def normalize_data(data):
    # Calculate the minimum and maximum values for each feature in the data
    min_vals = np.min(data, axis=0)
    max_vals = np.max(data, axis=0)
    # Normalize the data using the minimum and maximum values
    return (data - min_vals) / (max_vals - min_vals), min_vals, max_vals

# Normalize the input data
normalized_data, min_vals, max_vals = normalize_data(data)

# Initialize weights and bias to zeros
initial_w = np.zeros(normalized_data.shape[1] - 1)
initial_b = 0

# Set the learning rate (alpha) and the number of iterations for gradient descent
alpha = 0.001
iterations = 1000000

# Perform gradient descent on the normalized data
w, b = gradient_descent(normalized_data, initial_w, initial_b, alpha, iterations)
print(f"Final weights: w: {w}, b: {b}")

# Create an input item to make a prediction. How much would it cost for 4 rooms and 80 square meters?
input_item = np.array([2, 80])

# Normalize the input item using the same minimum and maximum values
normalized_input_item = (
    input_item - min_vals[:-1]) / (max_vals[:-1] - min_vals[:-1])

# Make a prediction using the normalized input item and previously calculated weights
normalized_result = model(w, b, normalized_input_item)

# Convert the predicted value back to the original scale
result = normalized_result * (max_vals[-1] - min_vals[-1]) + min_vals[-1]
print("Result: {}".format(result))

How the output would look like:

Iteration: 999997, w: [0.03570123 0.87858858], b: 0.03666023229940503
Iteration: 999998, w: [0.03570123 0.87858858], b: 0.03666023229940503
Iteration: 999999, w: [0.03570123 0.87858858], b: 0.03666023229940503
Final weights: w: [0.03570123 0.87858858], b: 0.03666023229940503
Result: 1106.1165418422659

The model predicts that we need 1106$ to rent an appartment with 4 rooms and 80 square meters :)

Classification: Is it a cat?

Linear regression can also be used for classification problems by applying a suitable activation function. In this example, we’ll use the sigmoid function to classify whether an object is a cat based on its features. The code implementation is as follows:

import numpy as np

data = [
    # [
    # number of legs,
    # weight in g,
    # has fur,
    # has tail
    # ]
    [2, 80000, 0, 0, 0],  # a fellow human
    [2, 48000, 0, 0, 0],  # a nice lady
    [2, 62000, 0, 0, 0],  # another nice lady
    [2, 102000, 0, 0, 0],  # fair bodybuilder
    [6, 101, 1, 0, 0],  # that's a creepy big spider
    [6, 87, 1, 0, 0],  # a bit less creepy spider
    [6, 24, 1, 0, 0],  # a tiny spider, cute one
    [4, 7100, 1, 1, 1],  # a cat named "Fluffy"
    [4, 4000, 1, 1, 1],  # a cat named "Snowball"
    [4, 3000, 1, 1, 1],  # a cat named "Mr. Tinkles"
]

# Sigmoid function to map linear combination to a value between 0 and 1
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# This function represents the linear equation we use to make predictions. It takes weights w, a bias b, and an input
# item, and calculates the prediction by multiplying the weights with the input and adding the bias.
def model(w, b, item):
    linear_combination = np.dot(w, item) + b
    return sigmoid(linear_combination)

# This function calculates how much the weights w and the bias b should be adjusted. It finds the difference between
# the predicted values and the actual values, called errors, and computes the gradient (or slope) for both w and b.


def compute_gradient(data, w, b):
    # Separate the input features (x) and target values (y) from the data
    x, y = data[:, :-1], data[:, -1]
    # Make predictions using the model function
    predictions = model(w, b, x.T)
    # Compute the errors between predictions and actual target values
    errors = predictions - y
    # Calculate the gradients for weights (w) and bias (b)
    gradient_w = np.dot(errors, x) / len(data)
    gradient_b = np.mean(errors)
    return gradient_w, gradient_b

# This function updates the weights w and the bias b repeatedly, using the gradients computed in the previous function.
# It does this for a certain number of iterations, making the adjustments smaller and smaller with a factor called the
# learning rate alpha.
def gradient_descent(data, w, b, alpha, iterations):
    for i in range(iterations):
        # Compute the gradients for weights (w) and bias (b)
        gradient_w, gradient_b = compute_gradient(data, w, b)
        # Update the weights (w) and bias (b) using the learning rate (alpha)
        w -= alpha * gradient_w
        b -= alpha * gradient_b
        # Print the current iteration, updated weights, and bias
        print(f"Iteration: {i}, w: {w}, b: {b}")
    return w, b

# This function scales the input data to values between 0 and 1. It helps the algorithm to work better and converge
# faster.
def normalize_data(data):
    # Calculate the minimum and maximum values for each feature in the data
    min_vals = np.min(data, axis=0)
    max_vals = np.max(data, axis=0)
    # Normalize the data using the minimum and maximum values
    return (data - min_vals) / (max_vals - min_vals), min_vals, max_vals


# Normalize the input data
normalized_data, min_vals, max_vals = normalize_data(data)

# Initialize weights and bias to zeros
initial_w = np.zeros(normalized_data.shape[1] - 1)
initial_b = 0

# Set the learning rate (alpha) and the number of iterations for gradient descent
alpha = 0.001
iterations = 1000000

# Perform gradient descent on the normalized data
w, b = gradient_descent(normalized_data, initial_w,
                        initial_b, alpha, iterations)
print(f"Final weights: w: {w}, b: {b}")

# Create an input item to make a prediction. How much would it cost for 4 rooms and 80 square meters?
input_item = np.array([4, 80, 0, 0])

# Normalize the input item using the same minimum and maximum values
normalized_input_item = (
    input_item - min_vals[:-1]) / (max_vals[:-1] - min_vals[:-1])

# Make a prediction using the normalized input item and previously calculated weights
normalized_result = model(w, b, normalized_input_item)

# Convert the predicted value back to the original scale
result = normalized_result * (max_vals[-1] - min_vals[-1]) + min_vals[-1]
print("Result: {}".format(result))

# input [4, 80, 0, 0]
Final weights: w: [-2.96788065 -3.20259811  1.58331402  9.10238936], b: -3.9189177494239376
Result: 0.004475655395234176

# input [4, 10000, 1, 1]
Final weights: w: [-2.96788065 -3.20259811  1.58331402  9.10238936], b: -3.9189177494239376
Result: 0.9931016099695166

The output of this code shows that our model is confident that an object with 4 legs, 80g weight, no fur, and no tail is not a cat, whereas an object with 4 legs, 10kg weight, fur, and a tail is very likely a cat.

Conclusion

Linear regression is a powerful and versatile method for understanding relationships between variables and making predictions. In this article, we demonstrated how to implement a simple linear regression algorithm for both regression and classification problems. By applying techniques like normalization and gradient descent, we can improve the performance of our algorithm and make more accurate predictions.