Ridge Regression - Python

Coding Ridge Regression in Python

Ridge Regression is implemented easily using Python libraries such as NumPy and scikit-learn. Below is a step-by-step implementation, starting from a mathematical approach using NumPy and then a production-ready approach using scikit-learn.

1. Ridge Regression Using NumPy (From Scratch)

This implementation follows the closed-form solution:

$w = (X^TX + \lambda I)^{-1}X^Ty$

Step 1: Import Libraries and Create Data

import numpy as np

# Sample dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

Step 2: Add Bias Term

# Add bias (intercept)
X_b = np.c_[np.ones((X.shape[0], 1)), X]

Step 3: Apply Ridge Regression Formula

# Regularization parameter
lambda_ = 1.0

# Identity matrix (do not regularize bias term)
I = np.eye(X_b.shape[1])
I[0, 0] = 0

# Closed-form Ridge solution
w = np.linalg.inv(X_b.T @ X_b + lambda_ * I) @ X_b.T @ y

print("Weights:", w)

2. Ridge Regression Using scikit-learn

This is the recommended approach for real-world projects due to optimization and scalability.

Step 1: Import Required Modules

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Step 2: Build a Pipeline with Scaling

model = Pipeline( $\[("scaler", StandardScaler()), ("ridge", Ridge(alpha=1.0))\]$ )

Step 3: Train the Model

model.fit(X, y)

Step 4: Make Predictions

predictions = model.predict(X)
print("Predictions:", predictions)

3. Evaluating Ridge Regression Performance

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print("MSE:", mse)
print("R2 Score:", r2)

4. Effect of Regularization Parameter (Alpha)

for alpha in [0.01, 0.1, 1, 10, 100]:
    ridge = Ridge(alpha=alpha)
    ridge.fit(X, y)
    print(f"Alpha={alpha}, Coefficients={ridge.coef_}")

Small alpha → behaves like linear regression
Large alpha → stronger coefficient shrinkage

5. Key Best Practices

Always scale features before applying Ridge Regression
Tune alpha using cross-validation
Use Ridge when multicollinearity exists
Prefer Ridge over Linear Regression for high-dimensional data

Conclusion

Ridge Regression can be implemented either mathematically using NumPy or efficiently using scikit-learn. While the closed-form solution helps in understanding the theory, the scikit-learn implementation is preferred for production-grade machine learning systems due to performance, stability, and ease of tuning.