Coding Ridge Regression in Python
Ridge Regression is implemented easily using Python libraries such as NumPy and scikit-learn. Below is a step-by-step implementation, starting from a mathematical approach using NumPy and then a production-ready approach using scikit-learn.
1. Ridge Regression Using NumPy (From Scratch)
This implementation follows the closed-form solution:
\[w = (X^TX + \lambda I)^{-1}X^Ty\]
Step 1: Import Libraries and Create Data
import numpy as np
# Sample dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
Step 2: Add Bias Term
# Add bias (intercept)
X_b = np.c_[np.ones((X.shape[0], 1)), X]
Step 3: Apply Ridge Regression Formula
# Regularization parameter
lambda_ = 1.0
# Identity matrix (do not regularize bias term)
I = np.eye(X_b.shape[1])
I[0, 0] = 0
# Closed-form Ridge solution
w = np.linalg.inv(X_b.T @ X_b + lambda_ * I) @ X_b.T @ y
print("Weights:", w)
2. Ridge Regression Using scikit-learn
This is the recommended approach for real-world projects due to optimization and scalability.
Step 1: Import Required Modules
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
Step 2: Build a Pipeline with Scaling
model = Pipeline(\[("scaler", StandardScaler()),
("ridge", Ridge(alpha=1.0))\])
Step 3: Train the Model
model.fit(X, y)
Step 4: Make Predictions
predictions = model.predict(X)
print("Predictions:", predictions)
3. Evaluating Ridge Regression Performance
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)
print("MSE:", mse)
print("R2 Score:", r2)
4. Effect of Regularization Parameter (Alpha)
for alpha in [0.01, 0.1, 1, 10, 100]:
ridge = Ridge(alpha=alpha)
ridge.fit(X, y)
print(f"Alpha={alpha}, Coefficients={ridge.coef_}")
- Small alpha → behaves like linear regression
- Large alpha → stronger coefficient shrinkage
5. Key Best Practices
- Always scale features before applying Ridge Regression
- Tune
alphausing cross-validation - Use Ridge when multicollinearity exists
- Prefer Ridge over Linear Regression for high-dimensional data
Conclusion
Ridge Regression can be implemented either mathematically using NumPy or efficiently using scikit-learn. While the closed-form solution helps in understanding the theory, the scikit-learn implementation is preferred for production-grade machine learning systems due to performance, stability, and ease of tuning.
Comments (0)
No comments yet. Be the first to share your thoughts!
Leave a Comment