不需要数学背景,不需要复杂配置,看完就能上手
线性回归是数据科学中最实用的工具之一。今天,我们用最简洁的代码带你快速上手。
from sklearn.linear_model import LinearRegressionmodel = LinearRegression() # 1. 创建模型model.fit(X_train, y_train) # 2. 训练模型predictions = model.predict(X_test) # 3. 预测结果就这三步,你已经完成了一个预测模型!

pip install scikit-learn pandas numpy matplotlibimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import r2_score# 示例:广告投入 vs 销售额data = { 'TV': [230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2], 'Radio': [37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6], 'Sales': [22.1, 10.4, 9.3, 18.5, 12.9, 6.6, 11.0, 13.2]}df = pd.DataFrame(data)# 特征和目标X = df[['TV', 'Radio']] # 自变量y = df['Sales'] # 因变量X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)model = LinearRegression()model.fit(X_train, y_train)# 预测y_pred = model.predict(X_test)# 评估print(f"R² 得分: {r2_score(y_test, y_pred):.2f}")print(f"TV广告系数: {model.coef_[0]:.2f}")print(f"Radio广告系数: {model.coef_[1]:.2f}")plt.scatter(y_test, y_pred, color='blue', alpha=0.6)plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2)plt.xlabel('实际销售额')plt.ylabel('预测销售额')plt.title('预测效果对比')plt.show()import pandas as pdfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_test_split import train_test_splitfrom sklearn.metrics import r2_score, mean_absolute_error# ========== 配置区域 ==========# 替换为你的数据文件路径DATA_PATH = 'your_data.csv'# 替换为你的特征列名FEATURE_COLS = ['feature1', 'feature2', 'feature3']# 替换为你的目标列名TARGET_COL = 'target'# ==============================# 加载数据df = pd.read_csv(DATA_PATH)X = df[FEATURE_COLS]y = df[TARGET_COL]# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)# 训练模型model = LinearRegression()model.fit(X_train, y_train)# 预测y_pred = model.predict(X_test)# 评估print(f"模型评估结果:")print(f"- R² 得分: {r2_score(y_test, y_pred):.4f}")print(f"- MAE: {mean_absolute_error(y_test, y_pred):.4f}")print(f"\n模型系数:")for feat, coef in zip(FEATURE_COLS, model.coef_): print(f"- {feat}: {coef:.4f}")print(f"截距: {model.intercept_:.4f}")# 保存预测结果results = pd.DataFrame({ '实际值': y_test, '预测值': y_pred, '误差': y_test - y_pred})results.to_csv('prediction_results.csv', index=False)print("\n预测结果已保存到 prediction_results.csv")importance = pd.DataFrame({ '特征': X.columns, '系数': model.coef_}).sort_values('系数', key=abs, ascending=False)print(importance)import joblib# 保存模型joblib.dump(model, 'linear_model.pkl')# 加载模型model = joblib.load('linear_model.pkl')new_data = pd.DataFrame({ 'TV': [100, 200, 300], 'Radio': [20, 30, 40]})predictions = model.predict(new_data)print(predictions)
df.fillna() 或 df.dropna() 处理 | ||
当特征很多时,可以使用Lasso或Ridge防止过拟合:
from sklearn.linear_model import Lasso, Ridge# Lasso回归(L1正则化,自动特征选择)lasso = Lasso(alpha=0.1)lasso.fit(X_train, y_train)# Ridge回归(L2正则化,系数收缩)ridge = Ridge(alpha=1.0)ridge.fit(X_train, y_train)第1天 → 跑通上面的5行代码 ↓第2天 → 用自己的数据替换示例数据 ↓第3天 → 尝试多特征预测 ↓第4天 → 学习模型评估指标 ↓第5天 → 掌握正则化和调参记住:最好的学习方式就是动手实践。
有任何问题,欢迎在评论区留言讨论!
本文代码已在Python 3.9 + scikit-learn 1.3 环境下测试通过。