A hierarchical goal-conditioned reinforcement learning agent for Old School RuneScape, trained via evolutionary search and PPO. The agent learns to play the full game — combat, skilling, banking, shopping, cooking, questing, and navigation — through simulated gameplay at 400+ steps/sec.
OSRS Client <-> DreamBot (AI Agent v7.0)
|
+-- StateEncoder (301-dim game state)
| -> ObservationNormalizer (running mean/variance)
| -> NeuralNet (3-layer trunk + LSTM actor-critic, 384 hidden, 1.68M params)
|
+-- ActionExecutor (93 discrete actions)
| combat, skilling, banking, shopping, navigation, prayer, questing
|
+-- RewardCalculator (40+ reward signals)
| XP, kills, food timing, gear upgrades, chat errors, zone bonuses
|
+-- MetaNN (goal-conditioned context, 128-dim)
| 100 named goals across 6 categories
|
+-- GameRules (safety overrides)
| force eat, force flee, block wilderness, auto prayer flick
|
+-- FoodResupplyFSM (9-state bank loop)
+-- LookaheadEvaluator (value-function lookahead, 8 speculative passes)
+-- ExperienceReplay (4096-slot ring buffer)
+-- HttpMonitor (port 8090, real-time JSON API)
+-- PaintOverlay (in-game HUD with live training metrics)
Player vitals, 23 skills (level + XP progress), 28 inventory slots, 10 nearest NPCs (position + HP + combat status), 5 nearest objects, 5 ground items, 10 active prayers, combat stats, location, equipment tiers, food supply, shop state, quest progress, death/gravestone, potions, temporal deltas, skilling resources, and tool awareness.
No-op, 8 walk directions, attack (3 targets), eat, toggle run, bank operations (open/deposit/withdraw), loot pickup, bone burial, 12 prayer toggles/flicks, special attack, equip gear, flee, combat styles, potions, chop/mine/fish, drop junk, climb/doors, home teleport, NPC dialogue, shop operations, magic spells, slayer tasks, gravestone recovery, cooking.
Evolutionary Search (primary): Population-based parallel evaluation. 48 agents per generation, evaluated across 15 threads. Fitness rewards activity diversity — agents that fight, skill, bank, and eat score exponentially higher than specialists. Includes crossover breeding between elite parents.
./gradlew evoTrain -Pargs="--generations 100"PPO (online fine-tuning): Proximal Policy Optimization with 128-step buffer for rapid adaptation during live gameplay. LSTM hidden states preserved across updates. Experience replay injects high-reward transitions.
./gradlew offlineTrain -Pargs="--gamesim --epochs 500"Full OSRS game loop in pure Java — no DreamBot or game client needed. Models combat (hit chance, max hit, prayer protection), skilling (woodcutting, mining, fishing, cooking with level-based success rates), banking, shopping, inventory management, equipment, food chains, NPC spawns/respawns, and ground items. Action completion timing models real game delays (walking takes multiple ticks, skilling has animation time).
25+ named locations with real OSRS coordinates. 10 NPC types across multiple spawn areas. Resource nodes for trees, rocks, and fishing spots. 5 shops with real item prices. 80+ items with real OSRS item IDs.
dreambot-ai/
dreambot-script/ # Main AI agent (Java/Gradle)
src/main/java/com/osrsai/
AIAgent.java # Script entry point, tick loop, component wiring
StateEncoder.java # 301-dim state vector
ActionExecutor.java # 93-action execution + masking
RewardCalculator.java # 40+ reward signals
GameRules.java # Safety overrides + teaching rules
PPOTrainer.java # PPO with experience replay
nn/NeuralNet.java # LSTM actor-critic forward/backward
nn/LSTMCell.java # LSTM cell with cell state clamping
sim/GameSim.java # Full game simulator
sim/EvoTrainer.java # Evolutionary trainer
sim/SimWorld.java # World map, NPCs, resources, shops
sim/SimPlayer.java # Player state, skills, equipment
sim/ChainTests.java # Action chain validation tests
...
mcp-server/ # MCP bridge (Node.js, STDIO)
wiki-mcp-server/ # OSRS Wiki API (search, prices)
dreambot-docs-mcp/ # DreamBot API docs
runelite-mcp-plugin/ # RuneLite plugin (port 8089)
neural-net/ # Legacy Python RL (superseded by Java)
cd dreambot-script
./gradlew shadowJar
# Outputs AIAgent.jar to ~/DreamBot/Scripts/./gradlew evoTrain -Pargs="--generations 100"
# 48 agents x 15 threads, ~30s/generation./gradlew offlineTrain -Pargs="--gamesim --epochs 500"
# ~50 steps/sec, checkpoint saved every 100 epochs./gradlew chainTest
# Validates food chain, gear upgrade, banking loop, combat sustain, skillingStart DreamBot, enable "AI Agent" script. The agent loads trained weights from ~/DreamBot/Scripts/osrsai_models/model.osnn automatically.
curl http://127.0.0.1:8090/agent # Agent state
curl http://127.0.0.1:8090/rewards # Reward breakdown
curl http://127.0.0.1:8090/stats # Session stats
curl http://127.0.0.1:8090/training # Training metrics- Java 17+
- Gradle 8.x
- DreamBot 4.x (for live deployment)
- 16GB RAM recommended for training (evolutionary trainer uses 15 parallel threads)
Private repository. All rights reserved.
Built by drlor with Claude Opus 4.6.