Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 2.92 KB

File metadata and controls

31 lines (24 loc) · 2.92 KB

Smart City Meta-Orchestration: Analytical Architecture

Meta AI Hackathon Capstone - Technical Whitepaper

To secure the $7,000 Grand Prize, an application must transcend basic code and prove a deep understanding of mathematical Reinforcement Learning. This document outlines the rigorous academic framework driving this orchestration environment.

1. Reinforcement Learning Environment (MDP Architecture)

The core environment.py is structured upon a fully bounded Markov Decision Process (MDP).

  • States ($S$): Represented via a highly structured multidimensional array (F, P, T, W, R_idle, R_busy) denoting Fire Severity, Patient Triaging, Traffic Flow, Weather Chaos, and Resource Pools.
  • Actions ($A$): A dispatch vector simulating resource deployment subject strictly to idle limitations.
  • State Transition Physics (step() & reset()): The environment mimics continuous temporal mechanics by embedding a tick_deployment() array. When an AI dispatches a resource, it is dynamically shifted to $R_{busy}$. A hidden dynamic latency algorithm determines the cooldown steps required to return to $R_{idle}$, integrating friction variables such as STORM perturbations and GRIDLOCK delays.

2. Dimensional Reward Shaping & Penalty Bounding

Standard binary reward architectures (+1/-1) fail to effectively evaluate advanced Meta-Agents. Our reward.py implements a bespoke Wastage Density Evaluator:

  • Baseline Allocation: +10 Reward for perfectly mapping resource trajectories to severity requirements.
  • Hazard Multipliers: Operations under Hurricane weather profiles receive a mathematical bonus scaler for hazard resilience.
  • Wastage Penalty ($W_p$): Sub-optimal deployment mapping scales a strict negative linear decay preventing brute-force agent behaviors ($W_p = -2.0 \times R_{excess}$).

3. Multi-Task Curriculum Escalation

To fulfill the evaluation engine constraints, agents operate within a curriculum of 3 escalating, dynamic Tasks bound by the standardized [START] → [STEP] → [END] lifecycle.

  • Task 1: Baseline dispatch constraints.
  • Task 2: Adds dynamic weather friction matrices.
  • Task 3 (Meta-Crisis): Forces the LLM agent to circumvent baseline rules by deploying proactive Traffic Police resources to dissolve Gridlock physics that would otherwise mathematically lock medical rescues.

4. Bounded Evaluation Matrices (Grader)

The grader.py rejects basic success rate logging in favor of a mathematically scaled vector algorithm: $$ Final Score = \left( SuccessRate \times 0.50 \right) + \left( Efficiency \times 0.50 \right) $$

Efficiency tracks the cumulative episode total_reward against the absolute theoretical maximum positive payload capacity of the local initial state. Only an AI that avoids any cascading micro-failures and deploys perfect allocations can achieve a consistent 1.0 bounded baseline.


Built to assert total technical dominance in the Meta AI Hackathon.