diff --git a/reward_analysis.md b/reward_analysis.md
new file mode 100644
index 0000000..17d450f
--- /dev/null
+++ b/reward_analysis.md
@@ -0,0 +1,228 @@
+# Custom Reward Function Analysis
+
+## Issues Found
+
+### 🚨 **Critical Issue: Duplicate Function Definitions**
+
+All four reward functions you asked me to review exist **twice** in your code:
+
+1. **Lines 459-494**: First definitions (wrong docstrings, looks like copy-paste errors)
+2. **Lines 544-622**: Second definitions (correct docstrings)
+
+The functions in the `gen_reward_manager()` at line 627 are using the SECOND definitions (lines 544-622), but the duplicates are confusing and could cause issues.
+
+---
+
+## Function-by-Function Analysis
+
+### 1. `target_height_reward` (Lines 544-559)
+
+**Current Implementation:**
+```python
+return -((obj.body.position.y - target_height)**2)
+```
+
+**Issues:**
+- ✅ **This is well-implemented** - negative L2 squared distance is a standard reward formulation
+- ✅ Returns a negative reward (penalty) that gets smaller as player approaches target
+- ✅ Smooth and continuous, which is good for RL
+
+**Usage Note:** 
+- Currently used with weight=0.0 in line 629, so it's effectively disabled
+
+---
+
+### 2. `head_to_middle_reward` (Lines 561-580)
+
+**Current Implementation:**
+```python
+multiplier = -1 if player.body.position.x > 0 else 1
+reward = multiplier * (player.body.position.x - player.prev_x)
+return reward
+```
+
+**Issues:**
+- ⚠️ **Has a bug**: Calculates velocity in the wrong direction
+
+**The Problem:**
+- When `player.position.x > 0` (right side), multiplier is -1
+- When `player.position.x < 0` (left side), multiplier is 1
+- This means:
+  - Player on RIGHT side: reward = `-1 * (current - prev)` = `prev - current`
+  - Player on LEFT side: reward = `1 * (current - prev)` = `current - prev`
+
+**What this actually does:**
+- RIGHT side: Rewards moving LEFT (toward middle)
+- LEFT side: Rewards moving RIGHT (toward middle)
+- ✅ **This IS correct!** The logic is right
+
+**However:**
+- This only rewards horizontal movement toward x=0
+- If player is already at middle (x ≈ 0), the multiplier flips sign based on tiny differences
+- Creates discontinuity at x=0
+
+**Potential Issues:**
+- Discontinuity at x=0 could confuse the agent
+- No bounds checking - what if arena is larger than expected?
+
+**Better Implementation Suggestion:**
+```python
+def head_to_middle_reward(env: WarehouseBrawl) -> float:
+    player: Player = env.objects["player"]
+    
+    # Distance from middle
+    dist_from_middle = abs(player.body.position.x)
+    
+    # Reward for reducing distance (moving toward middle)
+    prev_dist = abs(player.prev_x)
+    reward = prev_dist - dist_from_middle
+    
+    return reward
+```
+
+**Why this is better:**
+- Continuous everywhere
+- Directly measures progress toward middle
+- No sign flipping issues
+- Works regardless of arena size
+
+---
+
+### 3. `head_to_opponent` (Lines 582-602)
+
+**Current Implementation:**
+```python
+multiplier = -1 if player.body.position.x > opponent.body.position.x else 1
+reward = multiplier * (player.body.position.x - player.prev_x)
+return reward
+```
+
+**Issues:**
+- ⚠️ **Same discontinuity problem** as head_to_middle
+- ⚠️ **Only considers horizontal movement**
+
+**What this does:**
+- If player is RIGHT of opponent, reward = `-1 * (current - prev)` = reward for moving left (toward opponent)
+- If player is LEFT of opponent, reward = `1 * (current - prev)` = reward for moving right (toward opponent)
+- ✅ Logic is correct for horizontal component
+
+**Problems:**
+1. **No vertical component** - doesn't reward moving up/down toward opponent
+2. **Discontinuity** when player.x exactly equals opponent.x
+3. **Not distance-based** - moving 1 unit when 100 units away gives same reward as moving 1 unit when 1 unit away
+
+**Better Implementation Suggestion:**
+```python
+def head_to_opponent(env: WarehouseBrawl) -> float:
+    player: Player = env.objects["player"]
+    opponent: Player = env.objects["opponent"]
+    
+    # Current distance
+    current_dist = np.sqrt(
+        (player.body.position.x - opponent.body.position.x)**2 +
+        (player.body.position.y - opponent.body.position.y)**2
+    )
+    
+    # Previous distance
+    prev_dist = np.sqrt(
+        (player.prev_x - opponent.body.position.x)**2 +
+        (player.prev_y - opponent.body.position.y)**2
+    )
+    
+    # Reward for getting closer
+    reward = prev_dist - current_dist
+    
+    return reward
+```
+
+**Why this is better:**
+- Considers both horizontal AND vertical movement
+- Continuous everywhere
+- Scales with distance (closer = more reward for same movement)
+- Natural interpretation: reward proportional to distance reduction
+
+**OR, for movement-based (similar to current):**
+```python
+def head_to_opponent(env: WarehouseBrawl) -> float:
+    player: Player = env.objects["player"]
+    opponent: Player = env.objects["opponent"]
+    
+    # Direction vector toward opponent
+    dx = opponent.body.position.x - player.body.position.x
+    dy = opponent.body.position.y - player.body.position.y
+    
+    # Normalize (handle division by zero)
+    dist = np.sqrt(dx**2 + dy**2)
+    if dist < 0.001:
+        return 0  # Already on top of opponent
+    
+    dx_norm = dx / dist
+    dy_norm = dy / dist
+    
+    # Movement this frame
+    vel_x = player.body.position.x - player.prev_x
+    vel_y = player.body.position.y - player.prev_y
+    
+    # Reward velocity in direction of opponent
+    reward = dx_norm * vel_x + dy_norm * vel_y
+    
+    return reward
+```
+
+---
+
+### 4. `taunt_reward` (Lines 604-622)
+
+**Current Implementation:**
+```python
+reward = 1 if isinstance(player.state, TauntState) else 0.0
+return reward * env.dt
+```
+
+**Issues:**
+- ✅ **This is well-implemented**
+- ✅ Returns a fixed reward for being in taunt state
+- ✅ Multiplied by dt for proper time integration
+
+**One potential improvement:**
+If you want to discourage spamming taunt, you could add a small negative reward when entering taunt from certain states:
+
+```python
+# Could track previous state and penalize frequent taunting
+# But current implementation is fine for basic use
+```
+
+---
+
+## Summary Recommendations
+
+| Function | Current Quality | Recommended Action |
+|----------|----------------|-------------------|
+| `target_height_reward` | ✅ Good | Keep as-is |
+| `head_to_middle_reward` | ⚠️ Works but has issues | **Rewrite** to distance-based |
+| `head_to_opponent` | ⚠️ Only 1D, has issues | **Rewrite** to 2D distance-based |
+| `taunt_reward` | ✅ Good | Keep as-is |
+
+## General Recommendations
+
+1. **Delete duplicate functions** (lines 459-494) to avoid confusion
+2. **Fix head_to_middle** to use distance-based formulation
+3. **Fix head_to_opponent** to consider both X and Y movement
+4. **Test reward scaling** - make sure magnitudes are appropriate relative to other rewards
+
+## Context: How prev_x Works
+
+From code analysis:
+- `prev_x` is updated in `physics_process()` after physics step completes (line 3933)
+- This means `position.x - prev_x` gives the **change in position this frame** (velocity * dt)
+- This is correct for calculating movement rewards
+
+## Note on Weights
+
+Your current weights in `gen_reward_manager()`:
+- `head_to_middle_reward`: 0.01
+- `head_to_opponent`: 0.05
+- `taunt_reward`: 0.2
+
+These seem reasonable for encouraging engagement (head_to_opponent) without overwhelming other signals.
+
diff --git a/user/grid_search.py b/user/grid_search.py
new file mode 100644
index 0000000..f966443
--- /dev/null
+++ b/user/grid_search.py
@@ -0,0 +1,91 @@
+import sys, os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+import pandas as pd
+from train_agent import RecurrentPPOAgent, train, TrainLogging, danger_zone_reward, damage_interaction_reward, in_state_reward, on_win_reward, AttackState, RewardManager, RewTerm, SaveHandler, SaveHandlerMode, OpponentsCfg
+from functools import partial
+from environment.agent import BasedAgent
+from environment.environment import CameraResolution
+from itertools import product
+
+"""
+Grid Search over reward function weights for training an RL agent.
+"""
+
+# Create an array to store results of Grid Search training
+results = []
+
+# Define search space
+search_space = {
+    'damage_interaction_reward': [0.5, 1.0],
+    'danger_zone_reward': [0.1, 0.5],
+    'penalize_attack_reward': [-0.1, -0.04],
+}
+
+for dmg, dz, atk in product(
+    search_space['damage_interaction_reward'],
+    search_space['danger_zone_reward'],
+    search_space['penalize_attack_reward']
+):
+    print(f"\n🚀 Training with dmg={dmg}, dz={dz}, atk={atk}")
+
+    reward_manager = RewardManager(
+        {
+            'danger_zone_reward': RewTerm(func=danger_zone_reward, weight=dz),
+            'damage_interaction_reward': RewTerm(func=damage_interaction_reward, weight=dmg),
+            'penalize_attack_reward': RewTerm(func=in_state_reward, weight=atk, params={'desired_state': AttackState}),
+        },
+        {
+            'on_win_reward': ('win_signal', RewTerm(func=on_win_reward, weight=50)),
+        }
+    )
+
+    my_agent = RecurrentPPOAgent()
+    save_path = f'checkpoints/gridsearch_d{dmg}_z{dz}_a{atk}'
+    save_handler = SaveHandler(
+        agent=my_agent,
+        save_freq=1000,
+        save_path=save_path,
+        run_name=f'exp_d{dmg}_z{dz}_a{atk}',
+        mode=SaveHandlerMode.FORCE
+    )
+
+    opponent_cfg = OpponentsCfg(opponents={'based_agent': (1.0, partial(BasedAgent))})
+
+    # Run training
+    try:
+        train(
+            my_agent,
+            reward_manager,
+            save_handler,
+            opponent_cfg,
+            CameraResolution.LOW,
+            train_timesteps=2000,
+            train_logging=TrainLogging.PLOT
+        )
+
+        # Load training log (usually saved as monitor.csv)
+        import os
+        log_path = os.path.join(save_path, "monitor.csv")
+        if os.path.exists(log_path):
+            df = pd.read_csv(log_path, skiprows=1)
+            mean_reward = df['r'].mean()  # average episode reward
+        else:
+            mean_reward = None
+
+        results.append({
+            "damage_interaction_reward": dmg,
+            "danger_zone_reward": dz,
+            "penalize_attack_reward": atk,
+            "mean_reward": mean_reward,
+            "checkpoint": save_path
+        })
+    except Exception as e:
+        print(f"❌ Failed for dmg={dmg}, dz={dz}, atk={atk}: {e}")
+
+# Save results to CSV for inspection
+pd.DataFrame(results).to_csv("gridsearch_results.csv", index=False)
+
+# Print the best one
+best = max(results, key=lambda x: x["mean_reward"] or float('-inf'))
+print("\n🏆 Best configuration:")
+print(best)
\ No newline at end of file
diff --git a/user/grid_search_optuna.py b/user/grid_search_optuna.py
new file mode 100644
index 0000000..9073d10
--- /dev/null
+++ b/user/grid_search_optuna.py
@@ -0,0 +1,86 @@
+import os
+import optuna
+import pandas as pd
+from functools import partial
+
+from train_agent import (
+    RecurrentPPOAgent, train, TrainLogging,
+    danger_zone_reward, damage_interaction_reward, in_state_reward, 
+    holding_more_than_3_kets, on_win_reward, AttackState,
+    RewardManager, RewTerm, SaveHandler, SaveHandlerMode, OpponentsCfg
+)
+from environment.agent import BasedAgent
+from environment.environment import CameraResolution
+
+
+def objective(trial):
+    """
+    Defines one optimization trial.
+    Each trial uses a unique combination of reward weights.
+    """
+    # --- Suggest weights for each reward term ---
+    dmg = trial.suggest_float("damage_interaction_reward", 0.5, 2.0)
+    dz = trial.suggest_float("danger_zone_reward", 0.05, 0.5)
+    atk = trial.suggest_float("penalize_attack_reward", -0.15, -0.02)
+    hold = trial.suggest_float("holding_more_than_3_kets", -0.5, 0.5)
+
+    # --- Build Reward Manager ---
+    reward_manager = RewardManager(
+        {
+            'danger_zone_reward': RewTerm(func=danger_zone_reward, weight=dz),
+            'damage_interaction_reward': RewTerm(func=damage_interaction_reward, weight=dmg),
+            'penalize_attack_reward': RewTerm(
+                func=in_state_reward, 
+                weight=atk, 
+                params={'desired_state': AttackState}
+            ),
+            'holding_more_than_3_kets': RewTerm(
+                func=holding_more_than_3_kets, 
+                weight=hold
+            ),
+        },
+        {
+            'on_win_reward': ('win_signal', RewTerm(func=on_win_reward, weight=50)),
+        }
+    )
+
+    # --- Initialize agent, save handler, and opponent ---
+    my_agent = RecurrentPPOAgent()
+    save_path = f"checkpoints/optuna_trial_{trial.number}"
+    os.makedirs(save_path, exist_ok=True)
+
+    save_handler = SaveHandler(
+        agent=my_agent,
+        save_freq=10_000,
+        save_path=save_path,
+        run_name=f"optuna_trial_{trial.number}",
+        mode=SaveHandlerMode.FORCE,
+    )
+
+    opponent_cfg = OpponentsCfg(opponents={'based_agent': (1.0, partial(BasedAgent))})
+
+    # --- Train briefly (for evaluation) ---
+    try:
+        train(
+            my_agent,
+            reward_manager,
+            save_handler,
+            opponent_cfg,
+            CameraResolution.LOW,
+            train_timesteps=100_000,
+            train_logging=TrainLogging.NONE
+        )
+    except Exception as e:
+        print(f"❌ Trial {trial.number} failed: {e}")
+        return -999  # Penalize crashed runs
+
+    # --- Load training rewards ---
+    log_path = os.path.join(save_path, "monitor.csv")
+    if os.path.exists(log_path):
+        df = pd.read_csv(log_path, skiprows=1)
+        mean_reward = df['r'].mean()
+        print(f"Trial {trial.number}: mean_reward={mean_reward:.2f}")
+        return mean_reward
+    else:
+        print(f"❌ Trial {trial.number} failed: log file not found.")
+        return -999  # Penalize missing logs
\ No newline at end of file
diff --git a/user/pvp_match.py b/user/pvp_match.py
index 9ee7d754..b1486a1 100644
--- a/user/pvp_match.py
+++ b/user/pvp_match.py
@@ -1,17 +1,16 @@
 # import skvideo
 # import skvideo.io
 from environment.environment import RenderMode
-from environment.agent import SB3Agent, CameraResolution, RecurrentPPOAgent, BasedAgent, UserInputAgent, ConstantAgent, run_match, run_real_time_match, gen_reward_manager
-from user.my_agent import SubmittedAgent, ConstantAgent
+from environment.agent import SB3Agent, CameraResolution, RecurrentPPOAgent, BasedAgent, UserInputAgent, ConstantAgent, run_match, run_real_time_match
+from user.my_agent import SubmittedAgent
 
-reward_manager = gen_reward_manager()
 
 experiment_dir_1 = "experiment_6/" #input('Model experiment directory name (e.g. experiment_1): ')
 model_name_1 = "rl_model00_steps" #input('Name of first model (e.g. rl_model_100_steps): ')
 
 my_agent = UserInputAgent()
 #opponent = SubmittedAgent(None)
-opponent = ConstantAgent()
+opponent = SubmittedAgent("C:/Users/HMUQRI/Downloads/UTMIST-AI2/checkpoints/experiment_9/rl_model_700007_steps.zip")
 # my_agent = UserInputAgent()
 # opponent = ConstantAgent()
 
diff --git a/user/train_agent.py b/user/train_agent.py
index 7356155..9cc4864 100644
--- a/user/train_agent.py
+++ b/user/train_agent.py
@@ -537,6 +537,105 @@ def on_combo_reward(env: WarehouseBrawl, agent: str) -> float:
         return -1.0
     else:
         return 1.0
+    
+# -------------------------------------------------------------------------
+# ------------------------- CUSTOM REWARD FUNCTIONS -----------------------
+
+def target_height_reward(
+    env: WarehouseBrawl,
+    target_height: float,
+    obj_name: str = 'player'
+) -> float:
+    """Reward asset for being close to target height using L2 squared kernel.
+
+    Note:
+        For flat terrain, target height is in the world frame. For rough terrain,
+        sensor readings can adjust the target height to account for the terrain.
+    """
+    # Extract the used quantities (to enable type-hinting)
+    obj: GameObject = env.objects[obj_name]
+
+    # Compute the L2 squared penalty
+    return -((obj.body.position.y - target_height)**2)
+
+def head_to_middle_reward(
+    env: WarehouseBrawl,
+) -> float:
+    """
+    Rewards player for moving towards the middle of the arena.
+
+    Args:
+        env (WarehouseBrawl): The game environment.
+
+    Returns:
+        float: The computed reward.
+    
+    Fix: Continuous everywhere, directly measures progress toward the middle and no sign flipping 
+    issue. 
+
+    """
+    player: Player = env.objects["player"]
+    
+    # Distance from middle
+    dist_from_middle = abs(player.body.position.x)
+    
+    # Reward for reducing distance (moving toward middle)
+    prev_dist = abs(player.prev_x)
+    reward = prev_dist - dist_from_middle
+    
+    return reward
+
+
+def head_to_opponent(
+    env: WarehouseBrawl,
+) -> float:
+    """
+    Rewards player for moving towards the opponent.
+
+    Args:
+        env (WarehouseBrawl): The game environment.
+    
+    Returns:
+        float: The computed reward.
+
+    Fix: Considers horizontal and vertical movement, scales with distance
+    """
+    # Current distance
+    current_dist = np.sqrt(
+        (player.body.position.x - opponent.body.position.x)**2 +
+        (player.body.position.y - opponent.body.position.y)**2
+    )
+    
+    # Previous distance
+    prev_dist = np.sqrt(
+        (player.prev_x - opponent.body.position.x)**2 +
+        (player.prev_y - opponent.body.position.y)**2
+    )
+    
+    # Reward for getting closer
+    reward = prev_dist - current_dist
+    
+    return reward
+
+def taunt_reward(
+    env: WarehouseBrawl,
+) -> float:
+    """
+    Rewards player for being in the Taunt state.
+
+    Args:
+        env (WarehouseBrawl): The game environment.
+
+    Returns:
+        float: The computed reward.
+    """
+    # Get player object from the environment
+    player: Player = env.objects["player"]
+
+    # Reward if the player is in the Taunt state
+    reward = 1 if isinstance(player.state, TauntState) else 0.0
+
+    return reward * env.dt
 
 '''
 Add your dictionary of RewardFunctions here using RewTerms
@@ -572,7 +671,7 @@ def gen_reward_manager():
     my_agent = CustomAgent(sb3_class=PPO, extractor=MLPExtractor)
 
     # Start here if you want to train from scratch. e.g:
-    #my_agent = RecurrentPPOAgent()
+    my_agent = RecurrentPPOAgent()
 
     # Start here if you want to train from a specific timestep. e.g:
     #my_agent = RecurrentPPOAgent(file_path='checkpoints/experiment_3/rl_model_120006_steps.zip')