This code implements a self-balancing robot using an MPU6050 IMU sensor and BTS7960 motor drivers. The robot maintains upright balance through sensor fusion, PID control, and adaptive control zones. In addition, it integrates a Proximal Policy Optimization (PPO) based reinforcement learning model that enhances control under dynamic conditions.
-
Address:
0x68(I2C) -
Accelerometer: ±2g range (16384 LSB/g)
-
Gyroscope: ±250°/sec range (131 LSB/°/sec)
-
Registers Used:
0x6B: Power management0x1C: Accelerometer configuration0x1B: Gyroscope configuration0x3B: Data start register
Motor A (Left):
- RPWM_A (Pin 3): Right PWM for forward
- LPWM_A (Pin 5): Left PWM for backward
- REN_A (Pin 7): Right enable
- LEN_A (Pin 8): Left enable
Motor B (Right):
- RPWM_B (Pin 6): Right PWM for forward
- LPWM_B (Pin 9): Left PWM for backward
- REN_B (Pin 10): Right enable
- LEN_B (Pin 11): Left enable
AccXoffset = Σ(AccX_raw) / 2000_samples
GyroYoffset = Σ(GyroY_raw) / 2000_samples
AccZoffset = Σ(AccZ_raw) / 2000_samples - 16384- Removes sensor bias and drift
16384subtraction: Removes 1g gravity component from Z-axis- Target angle:
targetAngle = atan2(AccX_cal, AccZ_cal) × 180/π
AccX_cal = (AccX_raw - AccXoffset) / 16384.0 // Convert to g
AccZ_cal = (AccZ_raw - AccZoffset) / 16384.0
GyroY_cal = (GyroY_raw - GyroYoffset) / 131.0 // °/secaccAngle = atan2(AccX_cal, AccZ_cal) × 180/π
gyroAngle = currentAngle + (GyroY_cal × deltaTime)
currentAngle = 0.97 × gyroAngle + 0.03 × accAnglesmoothedAngle = 0.75 × prev_smoothedAngle + 0.25 × currentAngleerror = currentAngle - targetAngle
PID_output = Kp×error + Ki×∫error×dt + Kd×(derror/dt)- Dead Zone (±0.09°):
error = 0 - Center Zone (±0.8°): Boosted gains
- Moderate (0.8°–5°): Base gains
- Large (5°–15°): Reduced gains
- Extreme (>15°): Conservative gains
integral += error × deltaTime
integral = constrain(integral, ±integralLimit)derivative = (error - prevError) / deltaTime
filteredDerivative = 0.7×prev + 0.3×currentangularVelocity = (currentAngle - prevAngle) / deltaTime
velocityDampingTerm = -dampingFactor × Kp × angularVelocitymotorSpeed = (int)PID_output- Forward:
analogWrite(RPWM, speed); analogWrite(LPWM, 0); - Backward:
analogWrite(RPWM, 0); analogWrite(LPWM, speed); - Stop if |angle| > 25°
- I2C and motor setup
- Calibration (2000 samples)
- Target angle computed
- Read serial PPO input from Pi (if available)
- Compute deltaTime
- Read IMU
- Calculate angle
- Compute PID
- Control motors
- Debug print every 100ms
- PPO is a reinforcement learning algorithm that learns optimal actions from environment feedback.
- We use real-world data (angle, error, etc.) logged during balancing trials.
- A script
ppo_live_control.pyruns on the Pi. - It reads real-time
angleanderrorfrom Arduino via Serial. - Observations are sent to a trained PPO model:
obs = np.array([error], dtype=np.float32).reshape(1, -1)
action, _ = model.predict(obs, deterministic=True)- Action (float between
-0.5and0.5) is sent back to Arduino via Serial.
if (Serial.available()) {
String input = Serial.readStringUntil('\n');
float ppo_adjust = input.toFloat();
targetAngle += ppo_adjust * 2.0; // adjust angle
}- After logging, run
extract_training_data.pyto prepare:
obs = df['error'].values.reshape(-1, 1)
actions = df['ppo_avg'].values.reshape(-1, 1)- Train with
train_real_ppo.pyusing SB3 PPO:
model = PPO("MlpPolicy", env, ...)
model.learn(total_timesteps=20000)- Save as
ppo_realbot_trained.zip
-
Range: Typically between
-0.5and0.5 -
Meaning:
- Negative: lean more backward
- Positive: lean more forward
-
Effect: Adjusts the
targetAnglein real-time to pre-emptively counteract tilt
Angle: XX.XX | Error: XX.XX | PID: XXX.XX | Speed: XXX | Zone: XXXX | Vel: XXX.X
Zones:
- DEAD: ±0.09°
- CENTER: ±0.8°
- Normal: 0.8°–5°
- Large: 5°–15°
- EXTREME: >15°
- Loop Frequency: ~125Hz
- Debug Rate: 10Hz
- Angle Range: ±30°
- Speed Range: 0–255 PWM
- Safety: Cut-off if tilt > 25°
- Reinforcement: PPO correction layered over PID
This hybrid PID + PPO control strategy combines deterministic feedback with adaptive machine learning, creating a robust and intelligent self-balancing robot 🤖✨