Error in user YAML: (<unknown>): could not find expected ':' while scanning a simple key at line 3 column 1
---
- oeasy Python 0810
- 这是 oeasy 系统化 Python 教程,从基础一步步讲,扎实、完整、不跳步。愿意花时间学,就能真正学会。
本教程同步发布在:
个人网站: `https://oeasy.org`
蓝桥云课: `https://www.lanqiao.cn/courses/3584`
GitHub: `https://github.com/overmind1980/oeasy-python-tutorial`
Gitee: `https://gitee.com/overmind1980/oeasypython`
---- 核心指标对应英文单词/术语表
| 中文表述 | 英文全称 | 英文缩写 | 常用搭配示例 |
|---|---|---|---|
| 残差值的平方 | Squared Residual | - | squared residual of each sample |
| 残差平方和 | Sum of Squared Residuals | SSR | calculate the SSR of the model |
| 均方误差 | Mean Squared Error | MSE | MSE reflects average squared error |
| 均方根误差 | Root Mean Squared Error | RMSE | RMSE has the same unit as the target variable |
| 实测值 | True Value / Actual Value | - | true distance in the experiment |
| 预测值 | Predicted Value | - | predicted value from linear regression |
| 样本数 | Number of Samples | n | the value of n is 8 in this case |
| 线性回归 | Linear Regression | LR | fit a linear regression model |
- 补充关联术语
| 中文表述 | 英文术语 |
|---|---|
| 决定系数 |
Coefficient of Determination |
| 拟合优度 | Goodness of Fit |
| 回归方程 | Regression Equation |
- 这条线 明显不是一条直线
- 可以分段拟合吗?🤔
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# 1. 伽利略原始数据
data = pd.DataFrame({'Time': [1,2,3,4,5,6,7,8],'Distance': [33,130,298,526,824,1192,1620,2123]})
X, y = data[['Time']], data['Distance']
t, s = data['Time'].values, data['Distance'].values
bp = 4 # 最优断点 时间=4
# 2. 全局一次线性回归
model_linear = LinearRegression().fit(X, y)
y_pred_lin = model_linear.predict(X)
r2_lin, rmse_lin = r2_score(y,y_pred_lin), np.sqrt(mean_squared_error(y,y_pred_lin))
# 3. 两段分段线性回归【修复维度错误 核心写法✅】
m1 = LinearRegression().fit(data[data.Time<=bp][['Time']], data[data.Time<=bp].Distance)
m2 = LinearRegression().fit(data[data.Time>=bp][['Time']], data[data.Time>=bp].Distance)
y_pred_seg = np.zeros(len(y)) # 初始化等长数组
y_pred_seg[data.Time<=bp] = m1.predict(data[data.Time<=bp][['Time']])
y_pred_seg[data.Time>=bp] = m2.predict(data[data.Time>=bp][['Time']])
r2_seg, rmse_seg = r2_score(y,y_pred_seg), np.sqrt(mean_squared_error(y,y_pred_seg))
# 4. 结果输出对比
print(f'全局线性: y={model_linear.coef_[0]:.2f}*t{model_linear.intercept_:+.2f} | R²={r2_lin:.4f} | RMSE={rmse_lin:.1f}')
print(f'分段线性: t≤4→y={m1.coef_[0]:.2f}*t{m1.intercept_:+.2f}; t≥4→y={m2.coef_[0]:.2f}*t{m2.intercept_:+.2f} | R²={r2_seg:.4f} | RMSE={rmse_seg:.1f}')
print(pd.DataFrame({'Time':t,'实测距离':s,'全局预测':np.round(y_pred_lin,1),'分段预测':np.round(y_pred_seg,1)}))
# 5. 绘图+自动保存高清图 无弹窗 无plt.show()
plt.rcParams['font.sans-serif']=['WenQuanYi Zen Hei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(8,5),dpi=120)
plt.scatter(t, s, c='darkred', s=120, edgecolors='white', label='伽利略原始数据')
plt.plot(np.linspace(0,9,100), model_linear.predict(pd.DataFrame({'Time':np.linspace(0,9,100)})), c='orange', lw=2, ls='--', label=f'全局线性 R²={r2_lin:.4f}')
plt.plot(np.linspace(0,bp,50), m1.predict(pd.DataFrame({'Time':np.linspace(0,bp,50)})), c='darkblue', lw=3, label=f'分段1: t≤{bp}')
plt.plot(np.linspace(bp,9,50), m2.predict(pd.DataFrame({'Time':np.linspace(bp,9,50)})), c='darkblue', lw=3)
plt.axvline(bp, c='red', ls=':', alpha=0.8, label=f'分界点={bp}')
plt.xlabel('Time (equal intervals)'), plt.ylabel('Distance (scale divisions)'), plt.legend(loc='upper left'), plt.grid(alpha=0.3)
plt.savefig('/home/project/伽利略_分段线性对比全局线性.png', dpi=120, bbox_inches='tight')
- 具体残差(residuals) 都是多少?
| 时间 Time |
实测距离 Actual Distance |
全局预测距离 Global Predicted Distance |
分段预测距离 Piecewise Predicted Distance |
|---|---|---|---|
| 1 | 33 | -200.9 | -0.3 |
| 2 | 130 | 97.4 | 164.4 |
| 3 | 298 | 395.7 | 329.1 |
| 4 | 526 | 694.1 | 459.0 |
| 5 | 824 | 992.4 | 858.0 |
| 6 | 1192 | 1290.8 | 1257.0 |
| 7 | 1620 | 1589.1 | 1656.0 |
| 8 | 2123 | 1887.4 | 2055.0 |
全局线性
y=298.33*t-499.25
R²=0.9521 | RMSE=153.4
分段线性
t≤4→y=164.70t-165.00
t≥4→y=399.00t-1137.00 R²=0.9951 | RMSE=48.8
- 如何理解?
- 回忆术语
| 中文术语 | 英文对应 | 核心说明 |
|---|---|---|
| 残差 | Residual |
|
| 残差平方和 |
Sum of Squared Residuals | 所有样本残差的平方总和,是误差计算的基础值,$\boldsymbol{SSR}$ 越小越好 |
| 均方误差 |
Mean Squared Error | 基于SSR计算:所有残差平方的平均值,$MSE=\frac{SSR}{n}$;数值越小,模型整体误差越小 |
| 均方根误差 |
Root Mean Squared Error | 基于MSE计算:对MSE开算术平方根,$RMSE=\sqrt{MSE}$;与目标变量同量纲,数值越小,模型预测偏差越小、精准度越高 |
| 决定系数 |
R-squared / Coefficient of Determination | 取值范围 |
- 在本例中
- 显然 分段线性回归效果好
| 评估项 / Evaluation Index |
全局线性回归 Global Linear Regression |
分段线性回归 Piecewise Linear Regression |
|---|---|---|
| 拟合方程 / Fitting Equation |
|
|
| 决定系数 R² / R-squared |
0.9521 | 0.9951 |
| 均方根误差 / RMSE |
153.4 | 48.8 |
- 那么 如果想要 效果更好呢?
- 目前是
- 一个断点
- breakpoint
- t = 4
- 两个断点breakpoint
- 把样本分成3段
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# 1. 伽利略原始数据 (不变)
data = pd.DataFrame({'Time': [1,2,3,4,5,6,7,8],'Distance': [33,130,298,526,824,1192,1620,2123]})
X, y = data[['Time']], data['Distance']
t, s = data['Time'].values, data['Distance'].values
# 断点设置【固定】
bp_two = 4 # 你之前的两段最优断点 (t≤4 和 t≥4)
bp1_three, bp2_three = 3,6 # 三段最优断点 (t≤3 、3<t≤6 、t≥6)
# 2. 两段分段线性回归【你的原版 无修改】
m_two1 = LinearRegression().fit(data[data.Time<=bp_two][['Time']], data[data.Time<=bp_two].Distance)
m_two2 = LinearRegression().fit(data[data.Time>=bp_two][['Time']], data[data.Time>=bp_two].Distance)
y_pred_two = np.zeros(len(y))
y_pred_two[data.Time<=bp_two] = m_two1.predict(data[data.Time<=bp_two][['Time']])
y_pred_two[data.Time>=bp_two] = m_two2.predict(data[data.Time>=bp_two][['Time']])
r2_two, rmse_two = r2_score(y,y_pred_two), np.sqrt(mean_squared_error(y,y_pred_two))
# 3. 三段分段线性回归【新增】
m_three1 = LinearRegression().fit(data[data.Time<=bp1_three][['Time']], data[data.Time<=bp1_three].Distance)
m_three2 = LinearRegression().fit(data[(data.Time>bp1_three)&(data.Time<=bp2_three)][['Time']], data[(data.Time>bp1_three)&(data.Time<=bp2_three)].Distance)
m_three3 = LinearRegression().fit(data[data.Time>=bp2_three][['Time']], data[data.Time>=bp2_three].Distance)
y_pred_three = np.zeros(len(y))
y_pred_three[data.Time<=bp1_three] = m_three1.predict(data[data.Time<=bp1_three][['Time']])
y_pred_three[(data.Time>bp1_three)&(data.Time<=bp2_three)] = m_three2.predict(data[(data.Time>bp1_three)&(data.Time<=bp2_three)][['Time']])
y_pred_three[data.Time>=bp2_three] = m_three3.predict(data[data.Time>=bp2_three][['Time']])
r2_three, rmse_three = r2_score(y,y_pred_three), np.sqrt(mean_squared_error(y,y_pred_three))
# 4. 结果输出对比【仅两段+三段,完美对比】+ 完整残差表格
print(f'两段线性 (断点={bp_two}): t≤4→y={m_two1.coef_[0]:.2f}*t{m_two1.intercept_:+.2f}; t≥4→y={m_two2.coef_[0]:.2f}*t{m_two2.intercept_:+.2f} | R²={r2_two:.4f} | RMSE={rmse_two:.1f}')
print(f'三段线性 (断点={bp1_three}/{bp2_three}): t≤3→y={m_three1.coef_[0]:.2f}*t{m_three1.intercept_:+.2f}; 3<t≤6→y={m_three2.coef_[0]:.2f}*t{m_three2.intercept_:+.2f}; t≥6→y={m_three3.coef_[0]:.2f}*t{m_three3.intercept_:+.2f} | R²={r2_three:.4f} | RMSE={rmse_three:.1f}')
print("="*90)
# 表格:实测+两段预测+三段预测+两段残差+三段残差 全部展示
df_all = pd.DataFrame({
'Time':t,
'实测距离':s,
'两段预测':np.round(y_pred_two,1),
'三段预测':np.round(y_pred_three,1),
'两段残差':np.round(s - np.round(y_pred_two,1),1),
'三段残差':np.round(s - np.round(y_pred_three,1),1)
})
print(df_all)
print("="*90)
# 5. 绘图【核心:无全局线 + 仅两段+三段对比】+ 自动保存高清图 无弹窗
plt.rcParams['font.sans-serif']=['WenQuanYi Zen Hei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(8,5),dpi=120)
# 只保留原始数据点
plt.scatter(t, s, c='darkred', s=120, edgecolors='white', label='伽利略原始数据', zorder=5)
# 绘制【两段线性】拟合线 - 深蓝色 实线 粗3号 (原版样式)
plt.plot(np.linspace(0,bp_two,50), m_two1.predict(pd.DataFrame({'Time':np.linspace(0,bp_two,50)})), c='darkblue', lw=3, ls='-', label=f'两段线性 R²={r2_two:.4f} | RMSE={rmse_two:.1f}')
plt.plot(np.linspace(bp_two,9,50), m_two2.predict(pd.DataFrame({'Time':np.linspace(bp_two,9,50)})), c='darkblue', lw=3, ls='-')
# 绘制【三段线性】拟合线 - 深绿色 实线 粗3号 (区分两段,颜色醒目,对比清晰)
plt.plot(np.linspace(0,bp1_three,50), m_three1.predict(pd.DataFrame({'Time':np.linspace(0,bp1_three,50)})), c='forestgreen', lw=3, ls='-', label=f'三段线性 R²={r2_three:.4f} | RMSE={rmse_three:.1f}')
plt.plot(np.linspace(bp1_three,bp2_three,50), m_three2.predict(pd.DataFrame({'Time':np.linspace(bp1_three,bp2_three,50)})), c='forestgreen', lw=3, ls='-')
plt.plot(np.linspace(bp2_three,9,50), m_three3.predict(pd.DataFrame({'Time':np.linspace(bp2_three,9,50)})), c='forestgreen', lw=3, ls='-')
# 绘制所有分界点竖线 红色虚线
plt.axvline(bp_two, c='red', ls=':', alpha=0.8, lw=2, label=f'两段分界点={bp_two}')
plt.axvline(bp1_three, c='orange', ls=':', alpha=0.8, lw=2, label=f'三段分界点={bp1_three}')
plt.axvline(bp2_three, c='orange', ls=':', alpha=0.8, lw=2, label=f'三段分界点={bp2_three}')
# 图表基础设置 不变
plt.xlabel('Time (equal intervals)')
plt.ylabel('Distance (scale divisions)')
plt.legend(loc='upper left')
plt.grid(alpha=0.3, zorder=0)
plt.xlim(0,9)
plt.ylim(-100, 2300)
# 保存高清图
plt.savefig('/home/project/伽利略_两段线性VS三段线性_对比图.png', dpi=120, bbox_inches='tight')
- 哪个效果好呢?
| 对比维度 | 两段分段线性回归 | 三段分段线性回归 |
|---|---|---|
| 分段分界点 | ||
| 分段拟合公式 |
|
|
| 决定系数 |
||
| 均方根误差 |
||
| 拟合效果评价 | 拟合效果优秀 | 拟合效果更优,误差大幅降低 |
-
分段区别
- 两段回归只在时间=4 切一刀
- 把数据分成两部分拟合
- 三段回归在时间=3和6 各切一刀
- 分三部分拟合
- 切得更细
- 两段回归只在时间=4 切一刀
-
看指标好坏:
-
$R^2$ 越接近1- 说明模型越能抓住数据的规律
- 三段的
$R^2$ 更接近1 - 比两段抓得更准
- RMSE 越小
- 说明模型预测的误差越小
- 三段的 RMSE 比两段小很多
- 预测偏差大幅降低
-
-
最终结论
- 三段回归因为分的更细
- 贴合数据的趋势更到位
- 拟合效果比两段好得多
- 但是绿色的折线不连续
- 有可能让他变成连续折线吗?
- 连续分段线性回归
- Continuous Piecewise Linear Regression
- 形成连续折线
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# ========== 1. 伽利略原始实验数据 (纯净保留,无修改) ==========
data = pd.DataFrame({
'Time': [1,2,3,4,5,6,7,8],
'Distance': [33,130,298,526,824,1192,1620,2123]
})
t, s = data['Time'].values, data['Distance'].values
# ========== 2. 三段分段线性回归 核心配置 (固定最优断点) ==========
bp1, bp2 = 3, 6 # 三段分界点:t≤3 、3<t≤6 、t≥6
# 三段独立拟合线性回归模型
m1 = LinearRegression().fit(data[data.Time<=bp1][['Time']], data[data.Time<=bp1].Distance)
m2 = LinearRegression().fit(data[(data.Time>bp1)&(data.Time<=bp2)][['Time']], data[(data.Time>bp1)&(data.Time<=bp2)].Distance)
m3 = LinearRegression().fit(data[data.Time>=bp2][['Time']], data[data.Time>=bp2].Distance)
# 生成原始数据的三段预测值 & 计算模型评估指标
y_pred = np.zeros(len(s))
y_pred[data.Time<=bp1] = m1.predict(data[data.Time<=bp1][['Time']])
y_pred[(data.Time>bp1)&(data.Time<=bp2)] = m2.predict(data[(data.Time>bp1)&(data.Time<=bp2)][['Time']])
y_pred[data.Time>=bp2] = m3.predict(data[data.Time>=bp2][['Time']])
r2 = r2_score(s, y_pred)
rmse = np.sqrt(mean_squared_error(s, y_pred))
# ========== 3. 纯净输出三段回归结果+完整残差对照表 (无任何冗余内容) ==========
print("="*95)
print("伽利略斜面实验 - 三段分段线性回归 (Piecewise Linear Regression) 拟合结果")
print("="*95)
print(f"分段1 (时间 t ≤ {bp1}) → 拟合公式: y = {m1.coef_[0]:.2f} × t {m1.intercept_:+.2f}")
print(f"分段2 (时间 {bp1} < t ≤ {bp2}) → 拟合公式: y = {m2.coef_[0]:.2f} × t {m2.intercept_:+.2f}")
print(f"分段3 (时间 t ≥ {bp2}) → 拟合公式: y = {m3.coef_[0]:.2f} × t {m3.intercept_:+.2f}")
print(f"拟合优度 R² = {r2:.6f} | 均方根误差 RMSE = {rmse:.2f}")
print("="*95)
# 完整数据对照表:时间+实测距离+三段预测距离+残差(实测-预测),保留1位小数,整洁美观
df_result = pd.DataFrame({
'时间(Time)': t,
'实测距离(Distance)': s,
'三段预测距离': np.round(y_pred, 1),
'残差(实测-预测)': np.round(s - y_pred, 1)
})
print(df_result)
print("="*95)
# ========== 4. 绘图配置【完美连续无断点三段折线+纯净无冗余】 ==========
plt.rcParams['font.sans-serif']=['WenQuanYi Zen Hei'] # 中文正常显示
plt.rcParams['axes.unicode_minus']=False # 负号正常显示
plt.figure(figsize=(8,5), dpi=120) # 画布尺寸+高清分辨率
# 绘制:伽利略原始实验数据点 (醒目红色+白色描边,突出数据)
plt.scatter(t, s, c='darkred', s=120, edgecolors='white', linewidth=1.5, label='伽利略原始实验数据', zorder=5)
# 绘制:三段分段线性回归 → 完美连续无断点绿色折线 (核心优化,顺滑无断层)
t_continuous = np.linspace(0, 9, 200) # 生成连续时间序列,保证折线顺滑
y_continuous = np.zeros(len(t_continuous))
y_continuous[t_continuous<=bp1] = m1.predict(pd.DataFrame({'Time': t_continuous[t_continuous<=bp1]}))
y_continuous[(t_continuous>bp1)&(t_continuous<=bp2)] = m2.predict(pd.DataFrame({'Time': t_continuous[(t_continuous>bp1)&(t_continuous<=bp2)]}))
y_continuous[t_continuous>=bp2] = m3.predict(pd.DataFrame({'Time': t_continuous[t_continuous>=bp2]}))
plt.plot(t_continuous, y_continuous, c='forestgreen', lw=3, ls='-', label=f'三段分段线性拟合 | R²={r2:.4f}', zorder=3)
# 绘制:三段分界竖线 (橙色虚线,清晰标注分段点,无多余线条)
plt.axvline(x=bp1, color='orange', linestyle=':', linewidth=2, alpha=0.9, label=f'分段点 t={bp1}')
plt.axvline(x=bp2, color='orange', linestyle=':', linewidth=2, alpha=0.9, label=f'分段点 t={bp2}')
# ========== 图表美化配置 (完善优化,更专业) ==========
plt.xlabel('Time (equal intervals)', fontsize=12, fontweight='bold')
plt.ylabel('Distance (scale divisions)', fontsize=12, fontweight='bold')
plt.title('Galileo Inclined Plane Experiment - Piecewise Linear Regression', fontsize=14, fontweight='bold', pad=15)
plt.legend(loc='upper left', framealpha=0.9, fontsize=10)
plt.grid(True, alpha=0.3, linestyle='-', zorder=0) # 浅色网格不遮挡数据
plt.xlim(0, 9) # x轴范围适配数据
plt.ylim(-100, 2300) # y轴范围适配数据
# 保存高清无白边图片,无弹窗
plt.savefig('/home/project/伽利略_三段分段线性回归_纯净版.png', dpi=150, bbox_inches='tight')
plt.close()
- 3、4中间的部分有点怪
- 要求0-3的线段向后延伸
- 和4-6的线段交于一点
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# ========== 1. 伽利略原始实验数据 ==========
data = pd.DataFrame({
'Time': [1,2,3,4,5,6,7,8],
'Distance': [33,130,298,526,824,1192,1620,2123]
})
t, s = data['Time'].values, data['Distance'].values
# ========== 2. 三段分段线性回归 核心配置 ==========
bp1, bp2 = 3, 6 # 三段分界点
# 三段独立拟合线性回归模型
m1 = LinearRegression().fit(data[data.Time<=bp1][['Time']], data[data.Time<=bp1].Distance)
m2 = LinearRegression().fit(data[(data.Time>bp1)&(data.Time<=bp2)][['Time']], data[(data.Time>bp1)&(data.Time<=bp2)].Distance)
m3 = LinearRegression().fit(data[data.Time>=bp2][['Time']], data[data.Time>=bp2].Distance)
# 生成原始数据的三段预测值 & 计算模型评估指标
y_pred = np.zeros(len(s))
y_pred[data.Time<=bp1] = m1.predict(data[data.Time<=bp1][['Time']])
y_pred[(data.Time>bp1)&(data.Time<=bp2)] = m2.predict(data[(data.Time>bp1)&(data.Time<=bp2)][['Time']])
y_pred[data.Time>=bp2] = m3.predict(data[data.Time>=bp2][['Time']])
r2 = r2_score(s, y_pred)
rmse = np.sqrt(mean_squared_error(s, y_pred))
# ========== 3. 纯净输出三段回归结果+完整残差对照表 ==========
print("="*95)
print("伽利略斜面实验 - 三段分段线性回归 (延长相交版) 拟合结果")
print("="*95)
print(f"分段1 (时间 t ≤ {bp1}) → 拟合公式: y = {m1.coef_[0]:.2f} × t {m1.intercept_:+.2f}")
print(f"分段2 (时间 {bp1} < t ≤ {bp2}) → 拟合公式: y = {m2.coef_[0]:.2f} × t {m2.intercept_:+.2f}")
print(f"分段3 (时间 t ≥ {bp2}) → 拟合公式: y = {m3.coef_[0]:.2f} × t {m3.intercept_:+.2f}")
print(f"拟合优度 R² = {r2:.6f} | 均方根误差 RMSE = {rmse:.2f}")
print("="*95)
df_result = pd.DataFrame({
'时间(Time)': t,
'实测距离(Distance)': s,
'三段预测距离': np.round(y_pred, 1),
'残差(实测-预测)': np.round(s - y_pred, 1)
})
print(df_result)
print("="*95)
# ========== 4. ✅ 核心完整:延长0-3段相交+修复所有BUG+无折叠 ==========
plt.rcParams['font.sans-serif']=['WenQuanYi Zen Hei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(8,5), dpi=120)
# 绘制原始实验数据点
plt.scatter(t, s, c='darkred', s=120, edgecolors='white', linewidth=1.5, label='伽利略原始实验数据', zorder=5)
# ---------- ✅ 提取三段模型的斜率和截距 (修复k3未定义的核心BUG) ----------
k1, b1 = m1.coef_[0], m1.intercept_ # 0-3段 斜率+截距
k2, b2 = m2.coef_[0], m2.intercept_ # 3-6段 斜率+截距
k3, b3 = m3.coef_[0], m3.intercept_ # 6-8段 斜率+截距
# ---------- ✅ 计算0-3段直线 延长后 与 3-6段直线的【精确数学交点】 ----------
# 直线方程:y = kx + b ,两条直线交点公式推导结果
x_intersect = (b2 - b1) / (k1 - k2)
y_intersect = k1 * x_intersect + b1
print(f"✅ 0-3段延长线 与 3-6段直线 相交于:t = {x_intersect:.2f}, 距离 = {y_intersect:.2f}")
print("="*95)
# ---------- ✅ 绘制【无折叠、自然相交】的完整分段折线 (核心绘图逻辑) ----------
# ① 0 → 交点 :用0-3段的直线 一直延长到交点位置,彻底取消t=3的向下折叠
t_line1 = np.linspace(0, x_intersect, 100)
y_line1 = k1 * t_line1 + b1
plt.plot(t_line1, y_line1, c='forestgreen', lw=3, ls='-', zorder=3)
# ② 交点 → 6 :衔接3-6段的直线,从交点平滑过渡到t=6
t_line2 = np.linspace(x_intersect, bp2, 100)
y_line2 = k2 * t_line2 + b2
plt.plot(t_line2, y_line2, c='forestgreen', lw=3, ls='-', label=f'三段分段拟合(延长相交) | R²={r2:.4f}', zorder=3)
# ③ 6 → 9 :衔接6之后的直线,保持原趋势不变
t_line3 = np.linspace(bp2, 9, 100)
y_line3 = k3 * t_line3 + b3
plt.plot(t_line3, y_line3, c='forestgreen', lw=3, ls='-', zorder=3)
# ---------- ✅ 绘制标记线+交点,可视化更清晰 ----------
# 红色实心圆点 标记【延长线交点】,醒目直观
plt.scatter(x_intersect, y_intersect, c='red', s=100, edgecolors='white', linewidth=1, label=f'延长线交点 t={x_intersect:.2f}', zorder=6)
# 橙色虚线 标记原始分段点t=3、t=6,用于参考对比
plt.axvline(x=bp1, color='orange', linestyle=':', linewidth=2, alpha=0.8, label=f'原始分段点 t={bp1}')
plt.axvline(x=bp2, color='orange', linestyle=':', linewidth=2, alpha=0.8, label=f'原始分段点 t={bp2}')
# ========== 图表美化+基础配置 (无修改) ==========
plt.xlabel('Time (equal intervals)', fontsize=12, fontweight='bold')
plt.ylabel('Distance (scale divisions)', fontsize=12, fontweight='bold')
plt.title('Galileo Inclined Plane Experiment - Extended Intersection Piecewise Regression', fontsize=13, fontweight='bold', pad=10)
plt.legend(loc='upper left', framealpha=0.9, fontsize=9)
plt.grid(True, alpha=0.3, linestyle='-', zorder=0)
plt.xlim(0, 9)
plt.ylim(-100, 2300)
# 保存高清无白边图片
plt.savefig('/home/project/伽利略_延长相交无折叠分段回归.png', dpi=150, bbox_inches='tight')
plt.close()
- 完美收官
- 这次研究的是 分段线性回归
| 中文术语 | 英文标准对应 | 核心概念/定义说明 |
|---|---|---|
| 分段线性回归 | Piecewise Linear Regression | 将数据按分界点分成多个区间 每个区间单独拟合一条直线 贴合数据的分段变化规律 |
| 分段点 / 分界点 | Breakpoint | 划分数据区间的临界值 是分段线性回归的关键参数 |
| 分段拟合公式 | Piecewise Fitting Formula | 各区间对应的线性公式 不同区间的斜率$k$、截距$b$各不相同 |
- 分段数量越多
- 均方根误差越小
- 决定系数越趋向于1
- 可是目前是伽利略的8条数据
- 如果数据集大一些的话
- 也不能这么无限的切分啊
- 有什么更好的办法吗?🤔
- 我们下次再说👋
- 本文来自 oeasy Python 系统教程。
- 想完整、扎实学 Python,
- 搜索 oeasy 即可。







