diff --git a/README.md b/README.md index 8180e20..a9448e5 100644 --- a/README.md +++ b/README.md @@ -1,86 +1,176 @@ -## How to Start the Backend with Docker (Development) +Certamente. Ecco il contenuto del `README.md` visualizzato direttamente qui. -To spin up the backend and its supporting services in development mode: - -1. **Install & run Docker** on your machine. -2. **Clone** the repository and `cd` into its root. -3. Execute: - - ```bash - bash ./scripts/init-docker-dev.sh - ``` - - This will launch: - - * A **PostgreSQL** container - * A **Backend** container that mounts your local `src/` folder with live-reload - ---- - -## Development Architecture & Philosophy - -We split responsibilities between Docker-managed services and local workflows: +----- + +# **FastSim Project Overview** + +## **1. Why FastSim?** + +FastAPI + Uvicorn gives Python teams a lightning-fast async stack, yet sizing it for production still means guesswork, costly cloud load-tests, or late surprises. **FastSim** fills that gap by becoming a **digital twin** of your actual service: + + * It **replicates** your FastAPI + Uvicorn event-loop behavior in SimPy, generating the same kinds of asynchronous steps (parsing, CPU work, I/O, LLM calls) that happen in real code. + * It **models** your infrastructure primitives—CPU cores (via a SimPy `Resource`), database pools, rate-limiters, and even GPU inference quotas—so you can see queue lengths, scheduling delays, resource utilization, and end-to-end latency. + * It **outputs** the very metrics you would scrape in production (p50/p95/p99 latency, ready-queue lag, concurrency, throughput, cost per LLM call), but entirely offline, in seconds. + +With FastSim you can ask, *“What happens if traffic doubles on Black Friday?”*, *“How many cores are needed to keep p95 latency below 100 ms?”*, or *“Is our LLM-driven endpoint ready for prime time?”*—and get quantitative answers **before** you deploy. + +**Outcome:** Data-driven capacity planning, early performance tuning, and far fewer surprises in production. + +## **2. Project Goals** + +| \# | Goal | Practical Outcome | +| :--- | :--- | :--- | +| 1 | **Pre-production sizing** | Know the required core count, pool size, and replica count to meet your SLA. | +| 2 | **Scenario analysis** | Explore various traffic models, endpoint mixes, latency distributions, and RTT. | +| 3 | **Twin metrics** | Produce the same metrics you’ll scrape in production (latency, queue length, CPU utilization). | +| 4 | **Rapid iteration** | A single YAML/JSON configuration or REST call generates a full performance report. | +| 5 | **Educational value** | Visualize how GIL contention, queue length, and concurrency react to load. | + +## **3. Who Benefits & Why** + +| Audience | Pain-Point Solved | FastSim Value | +| :--- | :--- | :--- | +| **Backend Engineers** | Unsure if a 4-vCPU container can survive a marketing traffic spike. | Run *what-if* scenarios, tweak CPU cores or pool sizes, and get p95 latency and max-concurrency metrics before merging code. | +| **DevOps / SRE** | Guesswork in capacity planning; high cost of over-provisioning. | Simulate 1 to N replicas, autoscaler thresholds, and database pool sizes to find the most cost-effective configuration that meets the SLA. | +| **ML / LLM Product Teams** | LLM inference cost and latency are difficult to predict. | Model the LLM step with a price and latency distribution to estimate cost-per-request and the benefits of GPU batching without needing real GPUs. | +| **Educators / Trainers** | Students struggle to visualize event-loop internals. | Visualize GIL ready-queue lag, CPU vs. I/O steps, and the effect of blocking code—perfect for live demos and labs. | +| **Consultants / Architects** | Need a quick proof-of-concept for new client designs. | Define endpoints in YAML and demonstrate throughput and latency under projected load in minutes. | +| **Open-Source Community** | Lacks a lightweight Python simulator for ASGI workloads. | An extensible codebase makes it easy to plug in new resources (e.g., rate-limiters, caches) or traffic models (e.g., spike, uniform ramp). | +| **System-Design Interviewees** | Hard to quantify trade-offs in whiteboard interviews. | Prototype real-time metrics—queue lengths, concurrency, latency distributions—to demonstrate how your design scales and where bottlenecks lie. | + +## **4. About This Documentation** + +This project contains extensive documentation covering its vision, architecture, and technical implementation. The documents are designed to be read in sequence to build a comprehensive understanding of the project. + +### **How to Read This Documentation** + +For the best understanding of FastSim, we recommend reading the documentation in the following order: + +1. **README.md (This Document)**: Start here for a high-level overview of the project's purpose, goals, target audience, and development workflow. It provides the essential context for all other documents. +2. **dev_worflow_guide**: This document details the github workflow for the development +3. **simulation_input**: This document details the technical contract for configuring a simulation. It explains the `SimulationPayload` and its components (`rqs_input`, `topology_graph`, `sim_settings`). This is essential reading for anyone who will be creating or modifying simulation configurations. +4. **runtime_and_resources**: A deep dive into the simulation's internal engine. It explains how the validated input is transformed into live SimPy processes (Actors, Resources, State). This is intended for advanced users or contributors who want to understand *how* the simulation works under the hood. +5. **requests_generator**: This document covers the mathematical and algorithmic details behind the traffic generation model. It is for those interested in the statistical foundations of the simulator. +6. **Simulation Metrics**: A comprehensive guide to all output metrics. It explains what each metric measures, how it's collected, and why it's important for performance analysis. + +Optional **fastsim_vision**: a more detailed document about the project vision + +you can find the documentation at the root of the project in the folder `documentation/` + +## **5. Development Workflow & Architecture Guide** + +This section outlines the standardized development workflow, repository architecture, and branching strategy for the FastSim backend. + +### **Technology Stack** + + * **Backend**: FastAPI + * **Backend Package Manager**: Poetry + * **Frontend**: React + JavaScript + * **Database**: PostgreSQL + * **Caching**: Redis + * **Containerization**: Docker + +### **Backend Service (`FastSim-backend`)** + +The repository hosts the entire FastAPI backend, which exposes the REST API, runs the discrete-event simulation, communicates with the database, and provides metrics. + +``` +fastsim-backend/ +├── Dockerfile +├── docker_fs/ +│ ├── docker-compose.dev.yml +│ └── docker-compose.prod.yml +├── scripts/ +│ ├── init-docker-dev.sh +│ └── quality-check.sh +├── alembic/ +│ ├── env.py +│ └── versions/ +├── documentation/ +│ └── backend_documentation/ +├── tests/ +│ ├── unit/ +│ └── integration/ +├── src/ +│ └── app/ +│ ├── api/ +│ ├── config/ +│ ├── db/ +│ ├── metrics/ +│ ├── resources/ +│ ├── runtime/ +│ │ ├── rqs_state.py +│ │ └── actors/ +│ ├── samplers/ +│ ├── schemas/ +│ ├── main.py +│ └── simulation_run.py +├── poetry.lock +├── pyproject.toml +└── README.md +``` + +### **How to Start the Backend with Docker (Development)** -### 🐳 Docker-Compose Dev - -* **Containers** host external services (PostgreSQL) and run the FastAPI app. -* Your **local `src/` directory** is mounted into the backend container for hot-reload. -* **No tests, migrations, linting, or type checks** run inside these containers during development. - -**Why?** - -* **Fater feedback** on code changes -* **Full IDE support** (debugging, autocomplete, refactoring) -* **Speed**—no rebuilding images on every change - ---- - -### Local Quality & Testing Workflow - -All code quality tools, migrations, and tests execute on your host machine: - -| Task | Command | Notes | -| --------------------- | ---------------------------------------- | ------------------------------------------------- | -| **Lint & format** | `poetry run ruff check src tests` | Style and best-practice validations | -| **Type checking** | `poetry run mypy src tests` | Static type enforcement | -| **Unit tests** | `poetry run pytest -m "not integration"` | Fast, isolated tests—no DB required | -| **Integration tests** | `poetry run pytest -m integration` | Real-DB tests against Docker’s PostgreSQL | -| **DB migrations** | `poetry run alembic upgrade head` | Applies migrations to your local Docker-hosted DB | +To spin up the backend and its supporting services in development mode: -> **Rationale:** -> Running tests or Alembic migrations inside Docker images would force you to mount the full source tree, install dev dependencies in each build, and copy over configs—**slowing down** your feedback loop and **limiting** IDE features. +1. **Install & run Docker** on your machine. +2. **Clone** the repository and `cd` into its root. +3. Execute: + ```bash + bash ./scripts/init-docker-dev.sh + ``` + This will launch a **PostgreSQL** container and a **Backend** container that mounts your local `src/` folder with live-reload enabled. ---- +### **Development Architecture & Philosophy** -## CI/CD with GitHub Actions +We split responsibilities between Docker-managed services and local workflows. -We maintain two jobs on the `develop` branch: + * **Docker-Compose for Development**: Containers host external services (PostgreSQL) and run the FastAPI app. Your local `src/` directory is mounted into the backend container for hot-reloading. No tests, migrations, or linting run inside these containers during development. + * **Local Quality & Testing Workflow**: All code quality tools, migrations, and tests are executed on your host machine for faster feedback and full IDE support. -### 🔍 Quick (on Pull Requests) +| Task | Command | Notes | +| :--- | :--- | :--- | +| **Lint & format** | `poetry run ruff check src tests` | Style and best-practice validations | +| **Type checking** | `poetry run mypy src tests` | Static type enforcement | +| **Unit tests** | `poetry run pytest -m "not integration"` | Fast, isolated tests—no DB required | +| **Integration tests** | `poetry run pytest -m integration` | Real-DB tests against Docker’s PostgreSQL | +| **DB migrations** | `poetry run alembic upgrade head` | Applies migrations to your local Docker-hosted DB | -* Ruff & MyPy -* Unit tests only -* **No database** +**Rationale**: Running tests or Alembic migrations inside Docker images would slow down your feedback loop and limit IDE features by requiring you to mount the full source tree and install dev dependencies in each build. -### 🛠️ Full (on pushes to `develop`) +## **6. CI/CD with GitHub Actions** -* All **Quick** checks -* Start a **PostgreSQL** service container -* Run **Alembic** migrations -* Execute **unit + integration** tests -* Build the **Docker** image -* **Smoke-test** the `/health` endpoint +We maintain two jobs on the `develop` branch to ensure code quality and stability. -> **Guarantee:** Every commit in `develop` is style-checked, type-safe, DB-tested, and Docker-ready. +### **Quick (on Pull Requests)** ---- + * Ruff & MyPy checks + * Unit tests only + * **No database required** -## Summary +### **Full (on pushes to `develop`)** -1. **Docker-Compose** for services & hot-reload of the app code -2. **Local** execution of migrations, tests, and QA for speed and IDE integration -3. **CI pipeline** split into quick PR checks and full develop-branch validation + * All checks from the "Quick" suite + * Starts a **PostgreSQL** service container + * Runs **Alembic** migrations + * Executes the **full test suite** (unit + integration) + * Builds the **Docker** image + * **Smoke-tests** the `/health` endpoint of the built container +**Guarantee**: Every commit in `develop` is style-checked, type-safe, database-tested, and Docker-ready. +## **7. Limitations – v0.1 (First Public Release)** +1. **Network Delay Model** + * Only pure transport latency is simulated. + * Bandwidth-related effects (e.g., payload size, link speed, congestion) are NOT accounted for. +2. **Concurrency Model** + * The service exposes **async-only endpoints**. + * Execution runs on a single `asyncio` event-loop thread. + * No thread-pool workers or multi-process setups are supported yet; therefore, concurrency is limited to coroutine scheduling (cooperative, single-thread). +3. **CPU Core Allocation** + * Every server instance is pinned to **one physical CPU core**. + * Horizontal scaling must be achieved via multiple containers/VMs, not via multi-core utilization inside a single process. +These constraints will be revisited in future milestones once kernel-level context-switching costs, I/O bandwidth modeling, and multi-process orchestration are integrated. \ No newline at end of file diff --git a/documentation/backend_documentation/input_structure_for_the_simulation.md b/documentation/backend_documentation/input_structure_for_the_simulation.md deleted file mode 100644 index d28b090..0000000 --- a/documentation/backend_documentation/input_structure_for_the_simulation.md +++ /dev/null @@ -1,320 +0,0 @@ -### **FastSim — Simulation Input Schema** - -The `SimulationPayload` is the single, self-contained contract that defines an entire simulation run. Its architecture is guided by a core philosophy: to achieve maximum control over input data through robust, upfront validation. To implement this, we extensively leverage Pydantic's powerful validation capabilities and Python's `Enum` classes. This approach creates a strictly-typed and self-consistent schema that guarantees any configuration is validated *before* the simulation engine starts. - -This contract brings together three distinct but interconnected layers of configuration into one cohesive structure: - -1. **`rqs_input` (`RqsGeneratorInput`)**: Defines the **workload profile**—how many users are active and how frequently they generate requests. -2. **`topology_graph` (`TopologyGraph`)**: Describes the **system's architecture**—its components, resources, and the network connections between them. -3. **`sim_settings` (`SimulationSettings`)**: Configures **global simulation parameters**, such as total runtime and which metrics to collect. - -This layered design decouples the *what* (the system topology) from the *how* (the traffic pattern and simulation control), allowing for modular and reusable configurations. Adherence to our validation-first philosophy means every payload is rigorously parsed against this schema. By using a controlled vocabulary of `Enums` and the power of Pydantic, we guarantee that any malformed or logically inconsistent input is rejected upfront with clear, actionable errors, ensuring the simulation engine operates only on perfectly valid data. - ---- - -### **1. Component: Traffic Profile (`RqsGeneratorInput`)** - -This component specifies the dynamic behavior of users interacting with the system. It is built upon a foundation of shared constants and a reusable, rigorously validated random variable schema. This design ensures that any traffic profile is not only structurally correct but also logically sound before the simulation begins. - -#### **Global Constants** - -These enums provide a single source of truth for validation and default values, eliminating "magic strings" and ensuring consistency. - -| Constant Set | Purpose | Key Values | -| :--- | :--- | :--- | -| **`TimeDefaults`** (`IntEnum`) | Defines default values and validation bounds for time-based fields. | `USER_SAMPLING_WINDOW = 60`, `MIN_USER_SAMPLING_WINDOW = 1`, `MAX_USER_SAMPLING_WINDOW = 120` | -| **`Distribution`** (`StrEnum`) | Defines the canonical names of supported probability distributions. | `"poisson"`, `"normal"`, `"log_normal"`, `"exponential"` | - ---- - -#### **Random Variable Schema (`RVConfig`)** - -At the core of the traffic generator is the `RVConfig`, a schema for defining stochastic variables. This allows critical parameters like user population and request rates to be modeled not as fixed numbers, but as draws from a probability distribution. Pydantic validators are used extensively to enforce correctness. - -```python -class RVConfig(BaseModel): - """class to configure random variables""" - mean: float - distribution: Distribution = Distribution.POISSON - variance: float | None = None - - @field_validator("mean", mode="before") - def ensure_mean_is_numeric(cls, v: object) -> float: - # ... implementation ... - - @model_validator(mode="after") - def default_variance(cls, model: "RVConfig") -> "RVConfig": - # ... implementation ... -``` - -##### **Built-in Validation Logic** - -Pydantic's validation system is leveraged to enforce several layers of correctness directly within the schema: - -| Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **Numeric `mean` Enforcement** | `@field_validator("mean", mode="before")` | This validator intercepts the `mean` field *before* any type casting. It ensures the provided value is an `int` or `float`, raising an explicit `ValueError` for invalid types like strings (`"100"`) or nulls. This prevents common configuration errors and guarantees a valid numeric type for all downstream logic. | -| **Valid `distribution` Name** | `Distribution` (`StrEnum`) type hint | By type-hinting the `distribution` field with the `Distribution` enum, Pydantic automatically ensures that its value must be one of the predefined members (e.g., `"poisson"`, `"normal"`). Any typo or unsupported value (like `"Poisson"` with a capital 'P') results in an immediate validation error. | -| **Intelligent `variance` Defaulting** | `@model_validator(mode="after")` | This powerful validator runs *after* all individual fields have been validated. It enforces a crucial business rule: if `distribution` is `"normal"` **and** `variance` is not provided, the schema automatically sets `variance = mean`. This provides a safe, logical default and simplifies configuration for the user, while ensuring the model is always self-consistent. | - ---- - -#### **Payload Structure (`RqsGeneratorInput`)** - -This is the main payload for configuring the traffic workload. It composes the `RVConfig` schema and adds its own validation rules. - -| Field | Type | Validation & Purpose | -| :--- | :--- | :--- | -| `avg_active_users` | `RVConfig` | A random variable defining concurrent users. **Inherits all `RVConfig` validation**, ensuring its `mean`, `distribution`, and `variance` are valid. | -| `avg_request_per_minute_per_user` | `RVConfig` | A random variable for the user request rate. Also **inherits all `RVConfig` validation**. | -| `user_sampling_window` | `int` | The time duration (in seconds) for which the number of active users is held constant. Its value is **strictly bounded** by Pydantic's `Field` to be between `MIN_USER_SAMPLING_WINDOW` (1) and `MAX_USER_SAMPLING_WINDOW` (120). | - -##### **How the Generator Uses Each Field** - -The simulation evolves based on this robustly validated input: - -1. The timeline is divided into windows of `user_sampling_window` seconds. Because this value is range-checked upfront by Pydantic, the simulation is protected from invalid configurations like zero-length or excessively long windows. -2. At the start of each window, a number of active users, `U(t)`, is drawn from the `avg_active_users` distribution. The embedded `RVConfig` guarantees this distribution is well-defined. -3. Each of the `U(t)` users generates requests according to a rate drawn from `avg_request_per_minute_per_user`. - -Because every numeric input is type-checked and range-checked by Pydantic before the simulation begins, **the runtime engine never needs to defend itself** against invalid data. This makes the core simulation loop leaner, more predictable, and free from redundant error-handling logic. - -### **2. Component: System Blueprint (`TopologyGraph`)** - -The topology schema is the static blueprint of the digital twin you wish to simulate. It describes the system's components, their resources, their behavior, and how they are interconnected. To ensure simulation integrity, FastSim uses this schema to rigorously validate the entire system description upfront, rejecting any inconsistencies before the simulation begins. - -Of course. Here is the complete, consolidated, and highly detailed documentation for the `TopologyGraph` component, with all duplications removed and explanations expanded as requested. - ---- - -### **2. Component: System Blueprint (`TopologyGraph`)** - -The topology schema is the static blueprint of the digital twin you wish to simulate. It describes the system's components, their resources, their behavior, and how they are interconnected. To ensure simulation integrity, FastSim uses this schema to rigorously validate the entire system description upfront, rejecting any inconsistencies before the simulation begins. - -#### **Design Philosophy: A "Micro-to-Macro" Approach** - -The schema is built on a compositional, "micro-to-macro" principle. We start by defining the smallest indivisible units of work (`Step`) and progressively assemble them into larger, more complex structures (`Endpoint`, `Server`, and finally the `TopologyGraph`). - -This layered approach provides several key advantages that enhance the convenience and reliability of crafting simulations: - -* **Modularity and Reusability:** Core operations are defined once as `Steps` and can be reused across multiple `Endpoints`. This modularity simplifies configuration, as complex workflows can be built from a library of simple, well-defined blocks. -* **Local Reasoning, Global Safety:** Each model is responsible for its own internal consistency (e.g., a `Step` ensures its metric is valid for its kind). Parent models then enforce the integrity of the connections *between* these components (e.g., the `TopologyGraph` ensures all `Edges` connect to valid `Nodes`). This allows you to focus on one part of the configuration at a time, confident that the overall structure will be validated globally. -* **Clarity and Maintainability:** The hierarchy is intuitive and mirrors how developers conceptualize system architecture. It is clear how atomic operations roll up into endpoints, which are hosted on servers connected by a network. This makes configuration files easy to read, write, and maintain over time. -* **Guaranteed Robustness:** By catching all structural and referential errors before the simulation begins, this approach embodies the "fail-fast" principle. It guarantees that the SimPy engine operates on a valid, self-consistent model, eliminating a whole class of potential runtime bugs. - -#### **A Controlled Vocabulary: Topology Constants** - -The schema's robustness is founded on a controlled vocabulary defined by Python `Enum` classes. Instead of error-prone "magic strings" (e.g., `"cpu_operation"`), the schema uses these enums to define the finite set of legal values for categories like operation kinds, metrics, and node types. This design choice is critical for several reasons: - -* **Absolute Type-Safety:** Pydantic can validate input with certainty. Any value not explicitly defined in the corresponding `Enum` is immediately rejected, preventing subtle typos or incorrect values from causing difficult-to-debug runtime failures. -* **Enhanced Developer Experience:** IDEs and static analysis tools like `mypy` can provide auto-completion and catch invalid values during development, offering immediate feedback long before the simulation is run. -* **Single Source of Truth:** All valid categories are centralized. To add a new resource type or metric, a developer only needs to update the `Enum` definition, and the change propagates consistently throughout the validation logic. - -| Constant Enum | Purpose | -| :--- | :--- | -| **`EndpointStepIO`, `EndpointStepCPU`, `EndpointStepRAM`** | Defines the exhaustive list of valid `kind` values for a `Step`. | -| **`Metrics`** | Specifies the legal dictionary keys within a `Step`'s `step_metrics`. | -| **`SystemNodes`** | Enumerate the allowed `type` for nodes (e.g., `"server"`, `"client"`). | -| **`SystemEdges`** | Enumerate the allowed categories for connections between nodes. | - ---- - -### **Schema Hierarchy and In-Depth Validation** - -Here we break down each component of the topology, highlighting the specific Pydantic validators that enforce its correctness and the deep rationale behind these choices. - -#### **1. `Step`**: The Atomic Unit of Work -A `Step` represents a single, indivisible operation. Its validation is the cornerstone of ensuring that all work performed in the simulation is logical and well-defined. - -| Validation Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **Coherence of `kind` and `metric`** | `@model_validator` | **Rule:** The `step_metrics` dictionary must contain *exactly one* entry, and its key must be the correct metric for the `Step`'s `kind`.

**Rationale:** This is the most critical validation on a `Step`. The one-to-one mapping is a deliberate design choice for simplicity and robustness. It allows the simulation engine to be deterministic: a `cpu_bound_operation` step is routed to the CPU resource, an `io_wait` step to an I/O event, etc. This avoids the immense complexity of modeling operations that simultaneously contend for multiple resource types (e.g., CPU and RAM). This validator enforces that clear, unambiguous contract, preventing illogical pairings like a RAM allocation step being measured in `cpu_time`. | -| **Positive Metric Values** | `PositiveFloat` / `PositiveInt` | **Rule:** All numeric values in `step_metrics` must be greater than zero.

**Rationale:** It is physically impossible to spend negative or zero time on an operation or allocate negative RAM. This validation uses Pydantic's constrained types to offload this fundamental sanity check, ensuring that only plausible, positive resource requests enter the system and keeping the core simulation logic free of defensive checks against nonsensical data. | - -#### **2. `Endpoint`**: Composing Workflows -An `Endpoint` defines a complete, user-facing operation (e.g., an API call like `/predict`) as an ordered sequence of `Steps`. - -| Validation Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **Consistent Naming** | `@field_validator("endpoint_name")` | **Rule:** Automatically converts the `endpoint_name` to lowercase.

**Rationale:** This enforces a canonical representation for all endpoint identifiers. It eliminates ambiguity and potential bugs that could arise from inconsistent capitalization (e.g., treating `/predict` and `/Predict` as different endpoints). This simple normalization makes the configuration more robust and simplifies endpoint lookups within the simulation engine. | - -#### **3. System Nodes**: `Server` & `Client` -These models define the macro-components of your architecture where work is performed and resources are located. - -| Validation Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **Standardized Node `type`** | `@field_validator("type")` | **Rule:** The `type` field must strictly match the expected `SystemNodes` enum member (e.g., a `Server` object must have `type: "server"`).

**Rationale:** This provides a "belt-and-suspenders" check. Even if a default is provided, this validation prevents a user from explicitly overriding a node's type to a conflicting value. It enforces a strict contract: a `Server` object is always and only a server. This prevents object state confusion and simplifies pattern matching in the simulation engine. | -| **Unique Node IDs** | `@model_validator` in `TopologyNodes` | **Rule:** All `id` fields across all `Server` nodes and the `Client` node must be unique.

**Rationale:** This is fundamental to creating a valid graph. Node IDs are the primary keys used to address components. If two nodes shared the same ID, any `Edge` pointing to that ID would be ambiguous. This global validator prevents such ambiguity, guaranteeing that every node in the system is uniquely identifiable, which is a precondition for the final referential integrity check. | - -#### **4. `Edge`**: Connecting the Components -An `Edge` represents a directed network link between two nodes, defining how requests flow through the system. - -| Validation Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **No Self-Loops** | `@model_validator` | **Rule:** An edge's `source` ID cannot be the same as its `target` ID.

**Rationale:** In the context of a distributed system topology, a network call from a service to itself is a logical anti-pattern. Such an operation would typically be modeled as an internal process (i.e., another `Step`), not a network hop. This validator prevents this common configuration error and simplifies the routing logic by disallowing trivial cycles. | - -#### **5. `TopologyGraph`**: The Complete System -This is the root model that aggregates all `nodes` and `edges` and performs the final, most critical validation: ensuring referential integrity. - -| Validation Check | Pydantic Hook | Rule & Rationale | -| :--- | :--- | :--- | -| **Referential Integrity** | `@model_validator` | **Rule:** Every `edge.source` and `edge.target` ID must correspond to an actual node ID defined in `TopologyNodes`.

**Rationale:** This is the capstone validation that guarantees the structural integrity of the entire system graph. It prevents "dangling edges"—connections that point to non-existent nodes. Without this check, the simulation could start with a broken topology and crash unexpectedly at runtime when a request attempts to traverse a broken link. By performing this check *after* all nodes and edges have been parsed, we ensure that the system described is a complete and validly connected graph, fully embodying the "fail-fast" principle. | - -### **3. Component: Global Simulation Control (`SimulationSettings`)** - -This final component configures the simulation's execution parameters and, critically, determines what data is collected. It acts as the master control panel for the simulation run, governing both its duration and the scope of its output. - -#### **Payload Structure (`SimulationSettings`)** - -```python -class SimulationSettings(BaseModel): - """Global parameters that apply to the whole run.""" - total_simulation_time: int = Field(...) - enabled_sample_metrics: set[SampledMetricName] = Field( - default_factory=lambda: { - SampledMetricName.READY_QUEUE_LEN, - SampledMetricName.CORE_BUSY, - SampledMetricName.RAM_IN_USE, - }, - description="Which time‑series KPIs to collect by default.", - ) - enabled_event_metrics: set[EventMetricName] = Field( - default_factory=lambda: { - EventMetricName.RQS_LATENCY, - }, - description="Which per‑event KPIs to collect by default.", - ) -``` - -| Field | Type | Purpose & Validation | -| :--- | :--- | :--- | -| `total_simulation_time` | `int` | The total simulation horizon in seconds. Must be `>= MIN_SIMULATION_TIME` (1800s). Defaults to `3600`. | -| `enabled_sample_metrics` | `set[SampledMetricName]` | A set of metrics to be sampled at fixed intervals, creating a time-series (e.g., `"ready_queue_len"`, `"ram_in_use"`). | -| `enabled_event_metrics` | `set[EventMetricName]` | A set of metrics recorded only when specific events occur, with no time-series (e.g., `"rqs_latency"`, `"llm_cost"`). | - -We add standard default value for the metrics in case they will be omitted ---- - -#### **Design Rationale: Pre-validated, On-Demand Metrics for Robust and Efficient Collection** - -The design of the `settings` component, particularly the `enabled_*_metrics` fields, is centered on two core principles: **user-driven selectivity** and **ironclad validation**. The rationale behind this approach is to create a system that is both flexible and fundamentally reliable. - -##### **1. The Principle of User-Driven Selectivity** - -We recognize that data collection is not free; it incurs performance overhead in terms of both memory (to store the data) and CPU cycles (to record it). Not every simulation requires every possible metric. For instance: -* A simulation focused on CPU contention may not need detailed LLM cost tracking. -* A high-level analysis of end-to-end latency might not require fine-grained data on event loop queue lengths. - -By allowing the user to explicitly select only the metrics they need, we empower them to tailor the simulation to their specific analytical goals. This on-demand approach makes the simulator more efficient and versatile, avoiding the waste of collecting and processing irrelevant data. - -##### **2. The Power of Ironclad, Upfront Validation** - -This is where the design choice becomes critical for robustness. Simply allowing users to provide a list of strings is inherently risky due to potential typos or misunderstandings of metric names. Our schema mitigates this risk entirely through a strict, upfront validation contract. - -* **A Strict Contract via Enums:** The `enabled_sample_metrics` and `enabled_event_metrics` fields are not just sets of strings; they are sets of `SampledMetricName` and `EventMetricName` enum members. When Pydantic parses the input payload, it validates every single metric name provided by the user against these canonical `Enum` definitions. - -* **Immediate Rejection of Invalid Input:** If a user provides a metric name that is not a valid member of the corresponding enum (e.g., a typo like `"request_latncy"` or a misunderstanding like `"cpu_usage"` instead of `"core_busy"`), Pydantic immediately rejects the entire payload with a clear `ValidationError`. This happens *before* a single line of the simulation engine code is executed. - -##### **3. The Benefit: Guaranteed Runtime Integrity** - -This pre-validation provides a crucial and powerful guarantee to the simulation engine, leading to a safer and more efficient runtime: - -* **Safe, Error-Free Initialization:** At the very beginning of the simulation, the engine receives the *validated* set of metric names. It knows with absolute certainty the complete and exact set of metrics it needs to track. This allows it to safely initialize all necessary data collection structures (like dictionaries) at the start of the run. For example: - ```python - # This is safe because every key is guaranteed to be valid. - event_results = {metric_name: [] for metric_name in settings.enabled_event_metrics} - ``` - -* **Elimination of Runtime KeyErrors:** Because all dictionary keys are guaranteed to exist from the start, the core data collection logic within the simulation's tight event loop becomes incredibly lean and robust. The engine never needs to perform defensive, conditional checks like `if metric_name in event_results: ...`. It can directly and safely access the key: `event_results[metric_name].append(value)`. This completely eliminates an entire class of potential `KeyError` exceptions, which are notoriously difficult to debug in complex, asynchronous simulations. - -In summary, the design of `SimulationSettings` is a perfect example of the "fail-fast" philosophy. By forcing a clear and validated contract with the user upfront, we ensure that the data collection process is not only tailored and efficient but also fundamentally reliable. The engine operates with the confidence that the output data structures will perfectly and safely match the user's validated request, leading to a predictable and robust simulation from start to finish. ---- - -### **End-to-End Example (`SimulationPayload`)** - -The following JSON object shows how these three components combine into a single, complete `SimulationPayload` for a minimal client-server setup. - -```jsonc -{ - // Defines the traffic workload profile. - "rqs_input": { - "avg_active_users": { - "mean": 50, - "distribution": "poisson" - }, - "avg_request_per_minute_per_user": { - "mean": 5.0, - "distribution": "normal", - "variance": 1.0 - }, - "user_sampling_window": 60 - }, - // Describes the system's architectural blueprint. - "topology_graph": { - "nodes": { - "client": { - "id": "mobile_client", - "type": "client" - }, - "servers": [ - { - "id": "api_server", - "type": "server", - "server_resources": { - "cpu_cores": 4, - "ram_mb": 4096 - }, - "endpoints": [ - { - "endpoint_name": "/predict", - "steps": [ - { - "kind": "initial_parsing", - "step_metrics": { "cpu_time": 0.005 } - }, - { - "kind": "io_db", - "step_metrics": { "io_waiting_time": 0.050 } - } - ] - } - ] - } - ] - }, - "edges": [ - { - "source": "mobile_client", - "target": "api_server", - "latency": { - "distribution": "log_normal", - "mean": 0.04, - "variance": 0.01 - } - } - ] - }, - // Configures the simulation run and metric collection. - "settings": { - "total_simulation_time": 3600, - "enabled_sample_metrics": [ - "ready_queue_len", - "ram_in_use", - "throughput_rps" - ], - "enabled_event_metrics": [ - "rqs_latency" - ] - } -} -``` - -### **Key Takeaways** - -* **Single Source of Truth**: `Enum` classes centralize all valid string literals, providing robust, type-safe validation across the entire schema. -* **Layered Validation**: The `Constants → Component Schemas → SimulationPayload` hierarchy ensures that only well-formed and self-consistent configurations reach the simulation engine. -* **Separation of Concerns**: The three top-level keys (`rqs_input`, `topology_graph`, `settings`) clearly separate the workload, the system architecture, and simulation control, making configurations easier to read, write, and reuse. - - - - diff --git a/documentation/backend_documentation/runtime_and_resources.md b/documentation/backend_documentation/runtime_and_resources.md new file mode 100644 index 0000000..6c84bd2 --- /dev/null +++ b/documentation/backend_documentation/runtime_and_resources.md @@ -0,0 +1,283 @@ +Of course. This is an excellent request. A deep dive into the "why" and the real-world analogies is what makes documentation truly valuable. + +Here is the comprehensive, detailed documentation for the FastSim Runtime Layer, written in English, incorporating all your requests. + +----- + +# **FastSim — The Runtime Layer Documentation** + +*(Version July 2025 – Aligned with `app/runtime` and `app/resources`)* + +## **1. The Runtime Philosophy: From Blueprint to Living System** + +If the `SimulationPayload` is the static **blueprint** of a system, the `runtime` package is the **engine** that brings that blueprint to life. It translates a validated, declarative configuration into a dynamic, interacting set of processes within a SimPy simulation environment. The entire design is guided by a few core principles to ensure robustness, testability, and a faithful reflection of real-world systems. + +### **The Actor Model & Process Management** + +Distributed systems are, by nature, composed of independent components that communicate with each other concurrently. To model this, we've adopted an **Actor Model**. Each major component of the architecture (`Generator`, `Server`, `Client`) is implemented as an "Actor"—a self-contained object with its own internal state and behavior that communicates with other actors by sending and receiving messages (`RequestState` objects). + +SimPy's process management is a perfect fit for this model. It uses **cooperative multitasking** within a single-threaded event loop. An actor "runs" until it `yield`s control to the SimPy environment, typically to wait for a duration (`timeout`), a resource (`Container.get`), or an event (`Store.get`). This elegantly mimics modern, non-blocking I/O frameworks (like Python's `asyncio`, Node.js, or Go's goroutines) where a process performs work until it hits an I/O-bound operation, at which point it yields control, allowing the event loop to run other ready tasks. + +### **The "Validation-First" Contract** + +A crucial design decision is the strict separation between configuration and execution. The `runtime` layer operates under the assumption that the input `SimulationPayload` is **100% valid and logically consistent**. This "validation-first" contract means the runtime code is streamlined and free of defensive checks. It doesn't need to validate if a server ID exists or if a resource is defined; it can focus entirely on its core responsibility: accurately modeling the passage of time and contention for resources. + +----- + +## **2. High-Level Architecture & Data Flow** + +The simulation is a choreography of Actors passing a `RequestState` object between them. Communication and resource access are mediated exclusively by the SimPy environment, ensuring all interactions are captured on the simulation timeline. + +```text + .start() .transport(state) .start() +┌───────────┐ Starts ┌───────────┐ Forwards ┌───────────┐ Processes ┌───────────┐ +│ ├─────────►│ ├─────────────►│ ├────────────►│ │ +│ Generator │ │ Edge │ │ Server │ │ Client │ +│ ◄─────────┤ ◄─────────────┤ ◄────────────┤ │ +└───────────┘ └───────────┘ └───────────┘ └───────────┘ + ▲ Creates ▲ Delays RequestState ▲ Consumes ▲ Finishes + │ RequestState │ (Latency & Drops) │ Resources │ Request +``` + + * **Actors** (`runtime/actors/`): The active, stateful processes that perform work (`RqsGeneratorRuntime`, `ServerRuntime`, `ClientRuntime`, `EdgeRuntime`). + * **State Object** (`RequestState`): The message passed between actors. It acts as a digital passport, collecting stamps (`Hop` objects) at every stage of its journey. + * **Resource Registry** (`resources/`): A central authority that creates and allocates finite system resources (CPU cores, RAM) to the actors that need them. + +----- + +## **3. The Anatomy of a Request: State & History** + +At the heart of the simulation is the `RequestState` object, which represents a single user request flowing through the system. + +### **3.1. `Hop` – The Immutable Breadcrumb** + +A `Hop` is a `NamedTuple` that records a single, atomic event in a request's lifecycle: its arrival at a specific component at a specific time. Being an immutable `NamedTuple` makes it lightweight and safe to use in analysis. + +### **3.2. `RequestState` – The Digital Passport** + +```python +@dataclass +class RequestState: + id: int + initial_time: float + finish_time: float | None = None + history: list[Hop] = field(default_factory=list) +``` + +This mutable dataclass is the sole carrier of a request's identity and history. + + * `id`: A unique identifier for the request, assigned by the generator. + * `initial_time`: The simulation timestamp (`env.now`) when the request was created. + * `finish_time`: The timestamp when the request completes its lifecycle. It remains `None` until then. + * `history`: A chronologically ordered list of `Hop` objects, creating a complete, traceable path of the request's journey. + +#### **Real-World Analogy** + +Think of `RequestState` as a request context in a modern microservices architecture. The `id` is analogous to a **Trace ID** (like from OpenTelemetry or Jaeger). The `history` of `Hop` objects is the collection of **spans** associated with that trace, providing a detailed, end-to-end view of where the request spent its time, which is invaluable for performance analysis and debugging. + +----- + +## **4 The Resource Layer — Modelling Contention ⚙️** + +In real infrastructures every machine has a hard ceiling: only *N* CPU cores, only *M* MB of RAM. +FastSim mirrors that physical constraint through the **Resource layer**, which exposes pre-filled SimPy containers that actors must draw from. If a token is not available the coroutine simply blocks — giving you back-pressure “for free”. + +--- + +### **4.1 `ResourcesRuntime` — The Central Bank of Resources** + +| Responsibility | Implementation detail | +| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Discover capacity** | Walks the *validated* `TopologyGraph.nodes.servers`, reading `cpu_cores` and `ram_mb` from each `ServerResources` spec. | +| **Mint containers** | Calls `build_containers(env, spec)` which returns
`{"CPU": simpy.Container(init=cpu_cores), "RAM": simpy.Container(init=ram_mb)}` — the containers start **full** so a server can immediately consume tokens. | +| **Registry map** | Stores them in a private dict `_by_server: dict[str, ServerContainers]`. | +| **Public API** | `registry[server_id] → ServerContainers` (raises `KeyError` if the ID is unknown). | + +```python +registry = ResourcesRuntime(env, topology) +cpu_bucket = registry["srv-1"]["CPU"] # simpy.Container, level == capacity at t=0 +ram_bucket = registry["srv-1"]["RAM"] +``` + +Because the schema guarantees that every `server_id` is unique and every +server referenced in an edge actually exists, `ResourcesRuntime` never needs +defensive checks beyond the simple dictionary lookup. + +--- + +### **4.2 How Contention Emerges** + +* **CPU** – Each `yield CPU.get(1)` represents “claiming one core”. + When all tokens are gone the coroutine waits, modelling a thread-pool or worker saturation. +* **RAM** – `yield RAM.get(amount)` blocks until enough memory is free. + Large requests can starve, reproducing OOM throttling or JVM heap pressure. +* **Automatic fairness** – SimPy’s event loop resumes whichever coroutine became ready first, giving a natural first-come, first-served order. + +> **No bespoke semaphore or queueing code is required** — the SimPy +> containers *are* the semaphore. + +--- + +### **Real-World Analogy** + +| Runtime Component | Real Infrastructure Counterpart | +| -------------------- | --------------------------------------------------------------------------------------------------------- | +| `ResourcesRuntime` | A **cloud provider control plane** or **Kubernetes scheduler**: single source of truth for node capacity. | +| CPU container tokens | **Worker threads / processes** in Gunicorn, uWSGI, or an OS CPU-quota. | +| RAM container tokens | **cgroup memory limit** or a pod’s allocatable memory; once exhausted new workloads must wait. | + +Just like a Kubernetes scheduler won’t place a pod if a node lacks free CPU/RAM, +FastSim won’t let an actor proceed until it obtains the necessary tokens. + +## **5. The Actors: Bringing the System to Life** + +Actors are the core drivers of the simulation. Each represents a key component of the system architecture. They all expose a consistent `.start()` method, which registers their primary behavior as a process with the SimPy environment, allowing for clean and uniform orchestration. + +### **5.1. RqsGeneratorRuntime: The Source of Load** + +This actor's sole purpose is to create `RequestState` objects according to a specified stochastic model, initiating all traffic in the system. + +| Key Parameter (`__init__`) | Meaning | +| :--- | :--- | +| `env` | The SimPy simulation environment. | +| `out_edge` | The `EdgeRuntime` instance to which newly created requests are immediately sent. | +| `rqs_generator_data` | The validated Pydantic schema containing the statistical model for traffic (e.g., user count, request rate). | +| `rng` | A NumPy random number generator instance for deterministic, reproducible randomness. | + +**Core Logic (`.start()`):** +The generator's main process uses a statistical sampler (e.g., `poisson_poisson_sampling`) to yield a series of inter-arrival time gaps. It waits for each gap (`yield self.env.timeout(gap)`), then creates a new `RequestState`, records its first `Hop`, and immediately forwards it to the outbound edge via `out_edge.transport()`. + +**Real-World Analogy:** +The `RqsGeneratorRuntime` represents the collective behavior of your entire user base or the output of an upstream service. It's equivalent to a **load-testing tool** like **k6, Locust, or JMeter**, configured to simulate a specific traffic pattern (e.g., 500 users with an average of 30 requests per minute). + +----- + +### **5.2. EdgeRuntime: The Network Fabric 🚚** + +This actor models the connection *between* two nodes. It simulates the two most important factors of network transit: latency and unreliability. + +| Key Parameter (`__init__`) | Meaning | +| :--- | :--- | +| `env` | The SimPy simulation environment. | +| `edge_config` | The Pydantic `Edge` model containing this link's configuration (latency distribution, dropout rate). | +| `target_box` | A `simpy.Store` that acts as the "inbox" for the destination node. | +| `rng` | The random number generator for sampling latency and dropout. | + +**Core Logic (`.transport()`):** +Unlike other actors, `EdgeRuntime`'s primary method is `.transport(state)`. When called, it doesn't block the caller. Instead, it spawns a new, temporary SimPy process (`_deliver`) for that specific `RequestState`. This process: + +1. Checks for a **dropout** (packet loss) based on `dropout_rate`. If dropped, the request's `finish_time` is set, and its journey ends. +2. If not dropped, it samples a **latency** value from the configured probability distribution. +3. It `yield`s a `timeout` for the sampled latency, simulating network travel time. +4. After the wait, it records a successful `Hop` and places the `RequestState` into the `target_box` of the destination node. + +**Real-World Analogy:** +An `EdgeRuntime` is a direct analog for a **physical or virtual network link**. This could be the public **internet** between a user and your server, a **LAN connection** between two services in a data center, or a **VPC link** between two cloud resources. `latency` represents round-trip time (RTT), and `dropout_rate` models packet loss. + +----- + +### **5.3 `ServerRuntime` — The Workhorse 📦** + +`ServerRuntime` models an application server that owns finite CPU/RAM resources and executes a chain of steps for every incoming request. +With the 2025 refactor it now uses a **dispatcher / handler** pattern: the dispatcher sits in an infinite loop, and each request is handled in its own SimPy subprocess. This enables many concurrent in-flight requests while keeping the code easy to reason about. + +| `__init__` parameter | Meaning | +| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **`env`** | The shared `simpy.Environment`. Every timeout and resource operation is scheduled here. | +| **`server_resources`** | A `ServerContainers` mapping (`{"CPU": Container, "RAM": Container}`) created by `ResourcesRuntime`. The containers are **pre-filled** (`level == capacity`) so the server can immediately pull tokens. | +| **`server_config`** | The validated Pydantic `Server` model: server-wide ID, resource spec, and a list of `Endpoint` objects (each endpoint is an ordered list of `Step`s). | +| **`out_edge`** | The `EdgeRuntime` (or stub) that receives the `RequestState` once processing finishes. | +| **`server_box`** | A `simpy.Store` acting as the server’s inbox. Up-stream actors drop `RequestState`s here. | +| **`rng`** | Instance of `numpy.random.Generator`; defaults to `default_rng()`. Used to pick a random endpoint. | + +--- + +#### **Public API** + +```python +def start(self) -> simpy.Process +``` + +Registers the **dispatcher** coroutine in the environment and returns the created `Process`. + +--- + +#### **Internal Workflow** + +```text +┌───────────┐ server_box.get() ┌──────────────┐ +│ dispatcher │ ────────────────────► │ handle_req N │ +└───────────┘ spawn new process └──────────────┘ + │ + ▼ + RAM get → CPU/IO steps → RAM put → out_edge.transport() +``` + +1. **Dispatcher loop** + + ```python + while True: + raw_state = yield self.server_box.get() # blocks until a request arrives + state = cast(RequestState, raw_state) + self.env.process(self._handle_request(state)) # fire-and-forget + ``` + + *Spawning a new process per request mimics worker thread concurrency.* + +2. **Handler coroutine (`_handle_request`)** + + | Stage | Implementation detail | + | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | + | **Record arrival** | `state.record_hop(SystemNodes.SERVER, self.server_config.id, env.now)` – leaves a breadcrumb for tracing. | + | **Endpoint selection** | Uniform random index `rng.integers(0, len(endpoints))`. (Hook point for custom routing later.) | + | **Reserve RAM (back-pressure)** | Compute `total_ram` (sum of all `StepOperation.NECESSARY_RAM`). `yield RAM.get(total_ram)`. If not enough RAM is free, the coroutine blocks, creating natural memory pressure. | + | **Execute steps in order** | | + | – CPU-bound step | `yield CPU.get(1)` → `yield env.timeout(cpu_time)` → `yield CPU.put(1)` – exactly one core is busy for the duration. | + | – I/O-bound step | `yield env.timeout(io_wait)` – no core is held, modelling non-blocking I/O. | + | **Release RAM** | `yield RAM.put(total_ram)`. | + | **Forward** | `out_edge.transport(state)` – hands the request to the next hop without waiting for network latency. | + +--- + +#### **Concurrency Guarantees** + +* **CPU contention** – because CPU is a token bucket (`simpy.Container`) the maximum number of concurrent CPU-bound steps equals `cpu_cores`. +* **RAM contention** – large requests can stall entirely until enough RAM frees up, accurately modelling out-of-memory throttling. +* **Non-blocking I/O** – while a handler waits on an I/O step it releases the core, allowing other handlers to run; this mirrors an async framework where the event loop can service other sockets. + +--- + +#### **Real-World Analogy** + +| Runtime concept | Real server analogue | +| ------------------------------------- | ------------------------------------------------------------------------------------------ | +| `server_box` | A web server’s accept queue. | +| `CPU.get(1)` | Obtaining one worker thread/process in Gunicorn, uWSGI, or a Node.js “JS thread”. | +| `env.timeout(io_wait)` without a core | An `await` on a database or HTTP call; the worker is idle while the OS handles the socket. | +| RAM token bucket | Process resident set or container memory limit; requests block when heap is exhausted. | + +Thus a **CPU-bound step** is a tight Python loop holding the GIL, while an **I/O-bound step** is `await cursor.execute(...)` that frees the event loop. + +--- + + +### **5.4. ClientRuntime: The Destination** + +This actor typically represents the end-user or system that initiated the request, serving as the final destination. + +| Key Parameter (`__init__`) | Meaning | +| :--- | :--- | +| `env` | The SimPy simulation environment. | +| `out_edge` | The `EdgeRuntime` to use if the client needs to forward the request (acting as a relay). | +| `client_box` | This client's "inbox". | +| `completed_box` | A global `simpy.Store` where all finished requests are placed for final collection and analysis. | + +**Core Logic (`.start()`):** +The client pulls requests from its `client_box`. It then makes a critical decision: + + * **If the request is new** (coming directly from the `RqsGeneratorRuntime`), it acts as a **relay**, immediately forwarding the request to its `out_edge`. + * **If the request is returning** (coming from a `ServerRuntime`), it acts as the **terminus**. It sets the request's `finish_time`, completing its lifecycle, and puts it into the global `completed_box`. + +**Design Note & Real-World Analogy:** +The current logic for this decision—`if state.history[-2].component_type != SystemNodes.GENERATOR`—is **fragile**. While it works, it's not robust. A future improvement would be to add a more explicit routing mechanism. +In the real world, the `ClientRuntime` could be a user's **web browser**, a **mobile application**, or even a **Backend-For-Frontend (BFF)** service that both initiates requests and receives the final aggregated responses. \ No newline at end of file diff --git a/documentation/backend_documentation/simulation_input.md b/documentation/backend_documentation/simulation_input.md new file mode 100644 index 0000000..aa27d58 --- /dev/null +++ b/documentation/backend_documentation/simulation_input.md @@ -0,0 +1,228 @@ +Of course. Here is the complete documentation, translated into English, based on the new Pydantic schemas. + +----- + +### **FastSim — Simulation Input Schema** + +The `SimulationPayload` is the single, self-contained contract that defines an entire simulation run. Its architecture is guided by a core philosophy: to achieve maximum control over input data through robust, upfront validation. To implement this, we extensively leverage Pydantic's powerful validation capabilities and Python's `Enum` classes. This approach creates a strictly-typed and self-consistent schema that guarantees any configuration is validated *before* the simulation engine starts. + +This contract brings together three distinct but interconnected layers of configuration into one cohesive structure: + +1. **`rqs_input` (`RqsGeneratorInput`)**: Defines the **workload profile**—how many users are active and how frequently they generate requests—and acts as the **source node** in our system graph. +2. **`topology_graph` (`TopologyGraph`)**: Describes the **system's architecture**—its components, resources, and the network connections between them, represented as a directed graph. +3. **`sim_settings` (`SimulationSettings`)**: Configures **global simulation parameters**, such as total runtime and which metrics to collect. + +This layered design decouples the *what* (the system topology) from the *how* (the traffic pattern and simulation control), allowing for modular and reusable configurations. Adherence to our validation-first philosophy means every payload is rigorously parsed against this schema. By using a controlled vocabulary of `Enums` and the power of Pydantic, we guarantee that any malformed or logically inconsistent input is rejected upfront with clear, actionable errors, ensuring the simulation engine operates only on perfectly valid data. + +----- + +## **1. The System Graph (`topology_graph` and `rqs_input`)** + +At the core of FastSim is the representation of the system as a **directed graph**. The **nodes** represent the architectural components (like servers, clients, and the traffic generator itself), while the **edges** represent the directed network connections between them. This graph-based approach allows for flexible and realistic modeling of request flows through distributed systems. + +### **Design Philosophy: A "Micro-to-Macro" Approach** + +The schema is built on a compositional, "micro-to-macro" principle. We start by defining the smallest indivisible units of work (`Step`) and progressively assemble them into larger, more complex structures (`Endpoint`, `Server`, and finally the `TopologyGraph`). + +This layered approach provides several key advantages: + + * **Modularity and Reusability:** Core operations are defined once as `Steps` and can be reused across multiple `Endpoints`. + * **Local Reasoning, Global Safety:** Each model is responsible for its own internal consistency (e.g., a `Step` ensures its metric is valid for its kind). Parent models then enforce the integrity of the connections *between* these components (e.g., the `TopologyGraph` ensures all `Edges` connect to valid `Nodes`). + * **Guaranteed Robustness:** By catching all structural and referential errors before the simulation begins, this approach embodies the "fail-fast" principle. It guarantees that the SimPy engine operates on a valid, self-consistent model. + +### **A Controlled Vocabulary: Topology Constants** + +The schema's robustness is founded on a controlled vocabulary defined by Python `Enum` classes. Instead of error-prone "magic strings," the schema uses these enums to define the finite set of legal values for categories like operation kinds, metrics, and node types. + +| Enum | Purpose | +| :------------------------- | :------------------------------------------------------------------------ | +| **`EndpointStepCPU`, `EndpointStepRAM`, `EndpointStepIO`** | Defines the exhaustive list of valid `kind` values for a `Step`. | +| **`StepOperation`** | Specifies the legal dictionary keys within a `Step`'s `step_operation`. | +| **`SystemNodes`** | Enumerate the allowed `type` for nodes (e.g., `"server"`, `"client"`, `"generator"`). | +| **`SystemEdges`** | Enumerate the allowed categories for connections between nodes. | + +----- + +### **Schema Hierarchy and In-Depth Validation** + +Here we break down each component of the topology, highlighting the specific Pydantic validators that enforce its correctness. + +#### **Random Variable Schema (`RVConfig`)** + +At the core of both the traffic generator and network latencies is `RVConfig`, a schema for defining stochastic variables. This allows critical parameters to be modeled not as fixed numbers, but as draws from a probability distribution. + +| Check | Pydantic Hook | Rule & Rationale | +| :---------------------------- | :---------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Numeric `mean` Enforcement** | `@field_validator("mean", mode="before")` | Intercepts the `mean` field and ensures the provided value is an `int` or `float`, rejecting invalid types. This guarantees a valid numeric type for all downstream logic. | +| **Valid `distribution` Name** | `Distribution` (`StrEnum`) type hint | Pydantic automatically ensures that the `distribution` field's value must be one of the predefined members (e.g., `"poisson"`, `"normal"`). Any typo or unsupported value results in an immediate validation error. | +| **Intelligent `variance` Defaulting** | `@model_validator(mode="after")` | Enforces a crucial business rule: if `distribution` is `"normal"` or `"log_normal"` **and** `variance` is not provided, the schema automatically sets `variance = mean`. This provides a safe, logical default. | + +#### **1. `Step`: The Atomic Unit of Work** + +A `Step` represents a single, indivisible operation. + +| Validation Check | Pydantic Hook | Rule & Rationale | +| :------------------------------- | :--------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Coherence of `kind` and `step_operation`** | `@model_validator` | **Rule:** The `step_operation` dictionary must contain *exactly one* entry, and its key (`StepOperation`) must be the correct metric for the `Step`'s `kind`. \\ **Rationale:** This is the most critical validation on a `Step`. It prevents illogical pairings like a RAM allocation step being measured in `cpu_time`. It ensures every step has a clear, unambiguous impact on a single system resource. | +| **Positive Metric Values** | `PositiveFloat` / `PositiveInt` | **Rule:** All numeric values in `step_operation` must be greater than zero. \\ **Rationale:** It is physically impossible to spend negative or zero time on an operation. This ensures that only plausible resource requests enter the system. | + +#### **2. `Endpoint`: Composing Workflows** + +An `Endpoint` defines a complete operation (e.g., an API call like `/predict`) as an ordered sequence of `Steps`. + +| Validation Check | Pydantic Hook | Rule & Rationale | +| :-------------------- | :--------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Consistent Naming** | `@field_validator("endpoint_name")` | **Rule:** Automatically converts the `endpoint_name` to lowercase. \\ **Rationale:** This enforces a canonical representation, eliminating ambiguity from inconsistent capitalization (e.g., treating `/predict` and `/Predict` as the same). | + +#### **3. System Nodes: `Server`, `Client`, and `RqsGeneratorInput`** + +These models define the macro-components of your architecture where work is performed, resources are located, and requests originate. + +| Validation Check | Pydantic Hook | Rule & Rationale | +| :-------------------------------- | :---------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Standardized Node `type`** | `@field_validator("type")` | **Rule:** The `type` field must strictly match the expected `SystemNodes` enum member (e.g., a `Server` object must have `type: "server"`). \\ **Rationale:** This enforces a strict contract: a `Server` object is always and only a server, preventing object state confusion. | +| **Unique Node IDs** | `@model_validator` in `TopologyNodes` | **Rule:** All `id` fields across all `Server` nodes, the `Client` node, and the `RqsGeneratorInput` node must be unique. \\ **Rationale:** This is fundamental to creating a valid graph. Node IDs are the primary keys. If two nodes shared the same ID, any `Edge` pointing to that ID would be ambiguous. | +| **Workload Distribution Constraints** | `@field_validator` in `RqsGeneratorInput` | **Rule:** The `avg_request_per_minute_per_user` field must use a Poisson distribution. The `avg_active_users` field must use a Poisson or Normal distribution. \\ **Rationale:** This is a current restriction of the simulation engine, which has a joint sampler optimized only for these combinations. This validator ensures that only supported configurations are accepted. | + +#### **4. `Edge`: Connecting the Components** + +An `Edge` represents a directed network link between two nodes. + +| Validation Check | Pydantic Hook | Rule & Rationale | +| :---------------- | :----------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **No Self-Loops** | `@model_validator` | **Rule:** An edge's `source` ID cannot be the same as its `target` ID. \\ **Rationale:** A network call from a service to itself is a logical anti-pattern in a system topology. Such an operation should be modeled as an internal process (i.e., another `Step`), not a network hop. | +| **Unique Edge IDs** | `@model_validator` in `TopologyGraph` | **Rule:** All `id` fields of the `Edge`s must be unique. \\ **Rationale:** Ensures that every network connection is uniquely identifiable, which is useful for logging and debugging. | + +#### **5. `TopologyGraph`: The Complete System** + +This is the root model that aggregates all `nodes` and `edges` and performs the final, most critical validation: ensuring referential integrity. + +| Validation Check | Pydantic Hook | Rule & Rationale | +| :---------------------- | :----------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Referential Integrity** | `@model_validator` | **Rule:** Every `edge.source` and `edge.target` ID must correspond to an actual node ID defined in the topology. \\ **Rationale:** This is the capstone validation that guarantees the structural integrity of the entire system graph. It prevents "dangling edges"—connections that point to non-existent nodes—ensuring the described system is a complete and validly connected graph. | + +----- + +## **2. Global Simulation Control (`SimulationSettings`)** + +This final component configures the simulation's execution parameters and, critically, determines what data is collected. + +#### **Payload Structure (`SimulationSettings`)** + +| Field | Type | Purpose & Validation | +| :----------------------- | :------------------------- | :------------------------------------------------------------------------------------------------------ | +| `total_simulation_time` | `int` | The total simulation horizon in seconds. Must be `>=` a defined minimum (e.g., 1800s). | +| `enabled_sample_metrics` | `set[SampledMetricName]` | A set of metrics to be sampled at fixed intervals, creating a time-series (e.g., `"ready_queue_len"`, `"ram_in_use"`). | +| `enabled_event_metrics` | `set[EventMetricName]` | A set of metrics recorded only when specific events occur (e.g., `"rqs_latency"`). | + +### **Design Rationale: Pre-validated, On-Demand Metrics** + +The design of the `settings`, particularly the `enabled_*_metrics` fields, is centered on **user-driven selectivity** and **ironclad validation**. + +1. **Selectivity:** Data collection has a performance cost. By allowing the user to explicitly select only the metrics they need, we make the simulator more efficient and versatile. + +2. **Ironclad Validation:** Simply allowing users to provide a list of strings is risky. Our schema mitigates this risk by validating every metric name provided by the user against the canonical `Enum` definitions (`SampledMetricName`, `EventMetricName`). If a user provides a misspelled or invalid metric name (e.g., `"request_latncy"`), Pydantic immediately rejects the entire payload *before* the simulation engine runs. + +This guarantees that the simulation engine can safely initialize all necessary data collection structures at the start of the run, completely eliminating an entire class of potential `KeyError` exceptions at runtime. + +----- + +## **End-to-End Example (`SimulationPayload`)** + +The following JSON object shows how these components combine into a single `SimulationPayload` for a minimal client-server setup, updated according to the new schema. + +```jsonc +{ + // Defines the workload profile as a generator node. + "rqs_input": { + "id": "mobile_user_generator", + "type": "generator", + "avg_active_users": { + "mean": 50, + "distribution": "poisson" + }, + "avg_request_per_minute_per_user": { + "mean": 5.0, + "distribution": "poisson" + }, + "user_sampling_window": 60 + }, + // Describes the system's architecture as a graph. + "topology_graph": { + "nodes": { + "client": { + "id": "entry_point_client", + "type": "client" + }, + "servers": [ + { + "id": "api_server", + "type": "server", + "server_resources": { + "cpu_cores": 4, + "ram_mb": 4096, + "db_connection_pool": 10 + }, + "endpoints": [ + { + "endpoint_name": "/predict", + "steps": [ + { + "kind": "initial_parsing", + "step_operation": { "cpu_time": 0.005 } + }, + { + "kind": "io_db_query", + "step_operation": { "io_waiting_time": 0.050 } + } + ] + } + ] + } + ] + }, + "edges": [ + { + "id": "client_to_generator", + "source": "entry_point_client", + "target": "mobile_user_generator", + "latency": { + "distribution": "log_normal", + "mean": 0.001, + "variance": 0.0001 + } + }, + { + "id": "generator_to_api", + "source": "mobile_user_generator", + "target": "api_server", + "latency": { + "distribution": "log_normal", + "mean": 0.04, + "variance": 0.01 + }, + "probability": 1.0, + "dropout_rate": 0.0 + } + ] + }, + // Configures the simulation run and metric collection. + "sim_settings": { + "total_simulation_time": 3600, + "enabled_sample_metrics": [ + "ready_queue_len", + "ram_in_use", + "core_busy" + ], + "enabled_event_metrics": [ + "rqs_latency" + ] + } +} +``` + +### **Key Takeaways** + + * **Single Source of Truth**: `Enum` classes centralize all valid string literals, providing robust, type-safe validation across the entire schema. + * **Layered Validation**: The `Constants → Component Schemas → SimulationPayload` hierarchy ensures that only well-formed and self-consistent configurations reach the simulation engine. + * **Separation of Concerns**: The three top-level keys (`rqs_input`, `topology_graph`, `sim_settings`) clearly separate the workload, the system architecture, and simulation control, making configurations easier to read, write, and reuse. \ No newline at end of file diff --git a/documentation/DEV_WORKFLOW_GUIDE.md b/documentation/dev_workflow_guide.md similarity index 57% rename from documentation/DEV_WORKFLOW_GUIDE.md rename to documentation/dev_workflow_guide.md index c08cea4..b06ef1c 100644 --- a/documentation/DEV_WORKFLOW_GUIDE.md +++ b/documentation/dev_workflow_guide.md @@ -13,47 +13,65 @@ The project is built upon the following core technologies: - **Caching**: Redis - **Containerization**: Docker -### 2.1. Backend Service (`FastSim-backend`) +### 2.1 Backend Service (`FastSim-backend`) -This repository contains all code related to the FastAPI backend service. Its primary responsibility is to handle business logic, interact with the database, and expose a RESTful API. +The repository hosts the entire FastAPI backend for FastSim. +Its job is to expose the REST API, run the discrete-event simulation, talk to the database, and provide metrics. -**Folder Structure:** ``` -project-backend/ -├── .github/ # CI workflows: tests, builds, and publishes the Docker image -│ └── workflows/ -│ └── main.yml -├── src/ -│ └── app/ # Main Python package -│ ├── api/ # API routers & endpoints -│ ├── db/ # Database session management & base models -│ ├── models/ # SQLAlchemy ORM models (database table definitions) -│ ├── schemas/ # Pydantic schemas for validation/serialization -│ ├── core/ # Business logic (services, utilities, etc.) -│ ├── config/ # settings.py & constants.py for configuration -│ └── main.py # FastAPI application entrypoint -├── Alembic/ # Database migrations managed with Alembic -│ ├── versions/ # Generated migration files -│ ├── env.py # Alembic environment setup -│ └── script.py.mako # Template for new migrations -├── tests/ # Unit and integration tests -├── alembic.ini # Alembic configuration file -├── py.ini # Python tool configurations (flake8, mypy, etc.) -├── .env.example # Template for environment variables -├── .gitignore # Files and paths ignored by Git -├── docker-compose.override.yml # Local overrides (e.g., hot-reload) -├── docker-compose.test.yml # Docker Compose setup for isolated testing -├── docker-compose.yml # Base Docker Compose configuration for development -├── Dockerfile # Instructions to build the production Docker image -├── poetry.lock # Locked dependency versions for Poetry -├── pyproject.toml # Poetry project configuration (including src layout) -└── README.md # Setup instructions and project overview - +fastsim-backend/ +├── Dockerfile +├── docker_fs/ # docker-compose for dev & prod +│ ├── docker-compose.dev.yml +│ └── docker-compose.prod.yml +├── scripts/ # helper bash scripts (lint, dev-startup, …) +│ ├── init-docker-dev.sh +│ └── quality-check.sh +├── alembic/ # DB migrations (versions/ contains revision files) +│ ├── env.py +│ └── versions/ +├── documentation/ # project vision & low-level docs +│ └── backend_documentation/ +│ └── … +├── tests/ # unit & integration tests +│ ├── unit/ +│ └── integration/ +├── src/ # **application code lives here** +│ └── app/ +│ ├── api/ # FastAPI routers & endpoint handlers +│ ├── config/ # Pydantic Settings + constants +│ ├── db/ # SQLAlchemy base, sessions, initial seed utilities +│ ├── metrics/ # helpers to compute/aggregate simulation KPIs +│ ├── resources/ # SimPy resource registry (CPU/RAM containers, etc.) +│ ├── runtime/ # simulation core +│ │ ├── rqs_state.py # RequestState & Hop +│ │ └── actors/ # SimPy “actors”: Edge, Server, Client, RqsGenerator +│ ├── samplers/ # stochastic samplers (Gaussian-Poisson, etc.) +│ ├── schemas/ # Pydantic input/output models +│ ├── main.py # FastAPI application factory / ASGI entry-point +│ └── simulation_run.py # CLI utility to run a sim outside of HTTP layer +├── poetry.lock +├── pyproject.toml +└── README.md ``` -**Key Responsibilities:** -* To be testable in isolation. -* To produce a versioned Docker image (`backend:`) as its main artifact. +#### What each top-level directory in `src/app` does + +| Directory | Purpose | +| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **`api/`** | Defines the public HTTP surface. Each module holds a router with path operations and dependency wiring. | +| **`config/`** | Centralised configuration: `settings.py` (Pydantic `BaseSettings`) reads env vars; `constants.py` stores enums and global literals. | +| **`db/`** | Persistence layer. Contains the SQLAlchemy base class, the session factory, and a thin wrapper that seeds or resets the database (Alembic migration scripts live at project root). | +| **`metrics/`** | Post-processing helpers that turn raw simulation traces into aggregated KPIs (latency percentiles, cost per request, utilisation curves, …). | +| **`resources/`** | A tiny run-time registry mapping every simulated server to its SimPy `Container`s (CPU, RAM). Keeps resource management separate from actor logic. | +| **`runtime/`** | The heart of the simulator. `rqs_state.py` holds the mutable `RequestState`; sub-package **`actors/`** contains each SimPy process class (Generator, Edge, Server, Client). | +| **`samplers/`** | Probability-distribution utilities that generate inter-arrival and service-time samples—used by the actors during simulation. | +| **`schemas/`** | All Pydantic models for validation and (de)serialisation: request DTOs, topology definitions, simulation settings, outputs. | +| **`main.py`** | Creates and returns the FastAPI app; imported by Uvicorn/Gunicorn. | +| **`simulation_run.py`** | Convenience script to launch a simulation offline (e.g. inside tests or CLI). | + +Everything under `src/` is import-safe thanks to Poetry’s `packages = [{ include = "app" }]` entry in `pyproject.toml`. + ## 3. Branching Strategy: Git Flow diff --git a/documentation/FASTSIM_VISION.md b/documentation/fastsim_vision.md similarity index 100% rename from documentation/FASTSIM_VISION.md rename to documentation/fastsim_vision.md diff --git a/src/app/config/constants.py b/src/app/config/constants.py index c611697..8eeb01e 100644 --- a/src/app/config/constants.py +++ b/src/app/config/constants.py @@ -128,15 +128,11 @@ class StepOperation(StrEnum): """ Keys used inside the ``metrics`` dictionary of a *step*. - * ``NETWORK_LATENCY`` - Mean latency (seconds) incurred on a network edge - *outside* the service (used mainly for validation when steps model - short in-service hops). * ``CPU_TIME`` - Service time (seconds) during which the coroutine occupies the CPU / GIL. * ``NECESSARY_RAM`` - Peak memory (MB) required by the step. """ - NETWORK_LATENCY = "network_latency" CPU_TIME = "cpu_time" IO_WAITING_TIME = "io_waiting_time" NECESSARY_RAM = "necessary_ram" @@ -165,6 +161,7 @@ class NetworkParameters: DROPOUT_RATE = 0.01 MAX_DROPOUT_RATE = 1.0 + # ====================================================================== # CONSTANTS FOR THE MACRO-TOPOLOGY GRAPH # ====================================================================== @@ -235,3 +232,13 @@ class AggregatedMetricName(StrEnum): LATENCY_STATS = "latency_stats" LLM_STATS = "llm_stats" + +# ====================================================================== +# CONSTANTS FOR SERVER RUNTIME +# ====================================================================== + +class ServerResourceName(StrEnum): + """Keys for each server resource type, used when building the container map.""" + + CPU = "CPU" + RAM = "RAM" diff --git a/src/app/config/rqs_state.py b/src/app/config/rqs_state.py deleted file mode 100644 index dec9200..0000000 --- a/src/app/config/rqs_state.py +++ /dev/null @@ -1,50 +0,0 @@ -""" -defining a state in a one to one correspondence -with the requests generated that will go through -all the node necessary to accomplish the user request -""" - -from __future__ import annotations - -from dataclasses import dataclass, field - - -@dataclass -class RequestState: - """ - State object carried by each request through the simulation. - - Attributes: - id: Unique identifier of the request. - t0: Timestamp (simulated env.now) when the request was generated. - history: List of hop records, each noting a node/edge visit. - finish_time: Timestamp when the requests is satisfied - - """ - - id: int # Unique request identifier - initial_time: float # Generation timestamp (env.now) - finish_time: float | None = None # a requests might be dropped - history: list[str] = field(default_factory=list) # Trace of hops - - def record_hop(self, node_name: str, now: float) -> None: - """ - Append a record of visiting a node or edge. - - Args: - node_name: Name of the node or edge being recorded. - now: register the time of the operation - - """ - # Record hop as "NodeName@Timestamp" - self.history.append(f"{node_name}@{now:.3f}") - - @property - def latency(self) -> float | None: - """ - Return the total time in the system (finish_time - initial_time), - or None if the request hasn't completed yet. - """ - if self.finish_time is None: - return None - return self.finish_time - self.initial_time diff --git a/src/app/core/helpers/requests_generator.py b/src/app/core/helpers/requests_generator.py deleted file mode 100644 index 2d04bef..0000000 --- a/src/app/core/helpers/requests_generator.py +++ /dev/null @@ -1,60 +0,0 @@ -""" -SimPy process that generates user requests at stochastic intervals. - -This node samples inter-arrival times according to the configured -distribution (Gaussian-Poisson or Poisson-Poisson), constructs a -RequestState for each new request, records its origin hop, and -immediately pushes it into the next pipeline stage via an EdgeRuntime. -""" - -from __future__ import annotations - -from typing import TYPE_CHECKING - -from app.config.constants import Distribution -from app.core.event_samplers.gaussian_poisson import gaussian_poisson_sampling -from app.core.event_samplers.poisson_poisson import poisson_poisson_sampling - -if TYPE_CHECKING: - from collections.abc import Generator - - import numpy as np - - from app.schemas.requests_generator_input import RqsGeneratorInput - from app.schemas.simulation_settings_input import SimulationSettings - - -def requests_generator( - input_data: RqsGeneratorInput, - sim_settings: SimulationSettings, - *, - rng: np.random.Generator, -) -> Generator[float, None, None]: - """ - Return an iterator of inter-arrival gaps (seconds) according to the model - chosen in *input_data*. - - Notes - ----- - * If ``avg_active_users.distribution`` is ``"gaussian"`` or ``"normal"``, - the Gaussian-Poisson sampler is used. - * Otherwise the default Poisson-Poisson sampler is returned. - - """ - dist = input_data.avg_active_users.distribution.lower() - - if dist == Distribution.NORMAL: - #Gaussian-Poisson model - return gaussian_poisson_sampling( - input_data=input_data, - sim_settings=sim_settings, - rng=rng, - - ) - - # Poisson + Poisson - return poisson_poisson_sampling( - input_data=input_data, - sim_settings=sim_settings, - rng=rng, - ) diff --git a/src/app/core/helpers/dictionary_metrics.py b/src/app/metrics/dictionary_metrics.py similarity index 100% rename from src/app/core/helpers/dictionary_metrics.py rename to src/app/metrics/dictionary_metrics.py diff --git a/src/app/resources/__init__.py b/src/app/resources/__init__.py new file mode 100644 index 0000000..69884c1 --- /dev/null +++ b/src/app/resources/__init__.py @@ -0,0 +1 @@ +"""python package for resource registry""" diff --git a/src/app/resources/registry.py b/src/app/resources/registry.py new file mode 100644 index 0000000..c768cef --- /dev/null +++ b/src/app/resources/registry.py @@ -0,0 +1,38 @@ +""" +Runtime resource registry for server nodes. + +This module defines the ResourcesRuntime class, which takes a validated +TopologyGraph and a SimPy environment, then builds and stores a map +from each server's unique identifier to its SimPy resource containers. +Processes can later retrieve CPU and RAM containers by indexing this registry. +""" + +import simpy + +from app.resources.server_containers import ServerContainers, build_containers +from app.schemas.system_topology.full_system_topology import TopologyGraph + + +class ResourcesRuntime: + """definition of the class to associate resources to various nodes""" + + def __init__( + self, + env: simpy.Environment, + data: TopologyGraph, + + ) -> None: + """Initialization of the attributes""" + self.env = env + self.data = data + self._by_server: dict[str, ServerContainers] = { + server.id: build_containers(env, server.server_resources) + for server in data.nodes.servers + } + + def __getitem__(self, server_id: str) -> ServerContainers: + """ + Useful map to pass to each server the resources based + on the server unique id + """ + return self._by_server[server_id] diff --git a/src/app/resources/server_containers.py b/src/app/resources/server_containers.py new file mode 100644 index 0000000..dcbef1f --- /dev/null +++ b/src/app/resources/server_containers.py @@ -0,0 +1,73 @@ +""" +Definition of support structures for the simulation runtime. + +After Pydantic validation, this module provides TypedDicts and helpers +to build SimPy Containers for each server in the topology, improving +readability and ensuring a single point of truth for resource setup. +""" + + +from typing import TypedDict + +import simpy + +from app.config.constants import ServerResourceName +from app.schemas.system_topology.full_system_topology import ( + ServerResources, +) + +# ============================================================== +# DICT FOR THE REGISTRY TO INITIALIZE RESOURCES FOR EACH SERVER +# ============================================================== + + +class ServerContainers(TypedDict): + """ + Mapping of resource names to their SimPy Container instances for a server. + + - CPU: simpy.Container for CPU cores. + - RAM: simpy.Container for RAM in megabytes. + """ + + CPU: simpy.Container + RAM: simpy.Container + +# Central funcrion to initialize the dictionary with ram and cpu container +def build_containers( + env: simpy.Environment, + spec: ServerResources, + ) -> ServerContainers: + """ + Construct and return a mapping of SimPy Containers for a server's CPU and RAM. + + Given a SimPy environment and a validated ServerResources spec, this function + initializes one simpy.Container for CPU (with capacity equal to cpu_cores) + and one for RAM (with capacity equal to ram_mb), then returns them in a + ServerContainers TypedDict keyed by "CPU" and "RAM". + + Parameters + ---------- + env : simpy.Environment + The simulation environment in which the Containers will be created. + spec : ServerResources + A Pydantic model instance defining the server's cpu_cores and ram_mb. + + Returns + ------- + ServerContainers + A TypedDict with exactly two entries: + - "CPU": simpy.Container initialized with spec.cpu_cores + - "RAM": simpy.Container initialized with spec.ram_mb + + """ + return { + ServerResourceName.CPU.value: simpy.Container( + env, capacity=spec.cpu_cores, init=spec.cpu_cores, + ), + ServerResourceName.RAM.value: simpy.Container( + env, capacity=spec.ram_mb, init=spec.ram_mb, + ), + } + + + diff --git a/src/app/core/runtime/__init__.py b/src/app/runtime/__init__.py similarity index 100% rename from src/app/core/runtime/__init__.py rename to src/app/runtime/__init__.py diff --git a/src/app/runtime/actors/client.py b/src/app/runtime/actors/client.py new file mode 100644 index 0000000..207bd71 --- /dev/null +++ b/src/app/runtime/actors/client.py @@ -0,0 +1,59 @@ +"""defining the object client for the simulation""" + +from collections.abc import Generator +from typing import TYPE_CHECKING + +import simpy + +from app.config.constants import SystemNodes +from app.runtime.actors.edge import EdgeRuntime +from app.schemas.system_topology.full_system_topology import Client + +if TYPE_CHECKING: + from app.runtime.rqs_state import RequestState + + + +class ClientRuntime: + """class to define the client runtime""" + + def __init__( + self, + env: simpy.Environment, + out_edge: EdgeRuntime, + client_box: simpy.Store, + completed_box: simpy.Store, + client_config: Client, + ) -> None: + """Definition of attributes for the client""" + self.env = env + self.out_edge = out_edge + self.client_config = client_config + self.client_box = client_box + self.completed_box = completed_box + + + def _forwarder(self) -> Generator[simpy.Event, None, None]: + """Updtate the state before passing it to another node""" + while True: + state: RequestState = yield self.client_box.get() # type: ignore[assignment] + + state.record_hop( + SystemNodes.CLIENT, + self.client_config.id, + self.env.now, + ) + + # by checking the previous node (-2 the previous component is an edge) + # we are able to understand if the request should be elaborated + # when the type is Generator, or the request is completed, in this case + # the client is the target and the previous node type is not a rqs generator + if state.history[-2].component_type != SystemNodes.GENERATOR: + state.finish_time = self.env.now + yield self.completed_box.put(state) + else: + self.out_edge.transport(state) + + def start(self) -> simpy.Process: + """Initialization of the process""" + return self.env.process(self._forwarder()) diff --git a/src/app/core/runtime/edge.py b/src/app/runtime/actors/edge.py similarity index 78% rename from src/app/core/runtime/edge.py rename to src/app/runtime/actors/edge.py index 4f22608..afef720 100644 --- a/src/app/core/runtime/edge.py +++ b/src/app/runtime/actors/edge.py @@ -12,9 +12,10 @@ import numpy as np import simpy -from app.config.rqs_state import RequestState -from app.core.event_samplers.common_helpers import general_sampler -from app.schemas.system_topology_schema.full_system_topology_schema import Edge +from app.config.constants import SystemEdges +from app.runtime.rqs_state import RequestState +from app.samplers.common_helpers import general_sampler +from app.schemas.system_topology.full_system_topology import Edge if TYPE_CHECKING: from app.schemas.random_variables_config import RVConfig @@ -46,12 +47,20 @@ def _deliver(self, state: RequestState) -> Generator[simpy.Event, None, None]: uniform_variable = self.rng.uniform() if uniform_variable < self.edge_config.dropout_rate: state.finish_time = self.env.now - state.record_hop(f"{self.edge_config.id}-dropped", state.finish_time) + state.record_hop( + SystemEdges.NETWORK_CONNECTION, + f"{self.edge_config.id}-dropped", + state.finish_time, + ) return transit_time = general_sampler(random_variable, self.rng) yield self.env.timeout(transit_time) - state.record_hop(self.edge_config.id, self.env.now) + state.record_hop( + SystemEdges.NETWORK_CONNECTION, + self.edge_config.id, + self.env.now, + ) yield self.target_box.put(state) diff --git a/src/app/core/runtime/rqs_generator.py b/src/app/runtime/actors/rqs_generator.py similarity index 87% rename from src/app/core/runtime/rqs_generator.py rename to src/app/runtime/actors/rqs_generator.py index 5611c85..bc114ba 100644 --- a/src/app/core/runtime/rqs_generator.py +++ b/src/app/runtime/actors/rqs_generator.py @@ -10,9 +10,9 @@ import numpy as np from app.config.constants import Distribution, SystemNodes -from app.config.rqs_state import RequestState -from app.core.event_samplers.gaussian_poisson import gaussian_poisson_sampling -from app.core.event_samplers.poisson_poisson import poisson_poisson_sampling +from app.runtime.rqs_state import RequestState +from app.samplers.gaussian_poisson import gaussian_poisson_sampling +from app.samplers.poisson_poisson import poisson_poisson_sampling if TYPE_CHECKING: @@ -20,8 +20,8 @@ import simpy - from app.core.runtime.edge import EdgeRuntime - from app.schemas.requests_generator_input import RqsGeneratorInput + from app.runtime.actors.edge import EdgeRuntime + from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings @@ -106,12 +106,16 @@ def _event_arrival(self) -> Generator[simpy.Event, None, None]: initial_time=self.env.now, ) - state.record_hop(SystemNodes.GENERATOR, self.env.now) + state.record_hop( + SystemNodes.GENERATOR, + self.rqs_generator_data.id, + self.env.now, + ) # transport is a method of the edge runtime # which define the step of how the state is moving # from one node to another self.out_edge.transport(state) - def run(self) -> simpy.Process: + def start(self) -> simpy.Process: """Passing the structure as a simpy process""" return self.env.process(self._event_arrival()) diff --git a/src/app/runtime/actors/server.py b/src/app/runtime/actors/server.py new file mode 100644 index 0000000..0a0f772 --- /dev/null +++ b/src/app/runtime/actors/server.py @@ -0,0 +1,136 @@ +""" +definition of the class necessary to manage the server +during the simulation +""" + +from collections.abc import Generator +from typing import cast + +import numpy as np +import simpy + +from app.config.constants import ( + EndpointStepCPU, + EndpointStepIO, + EndpointStepRAM, + ServerResourceName, + StepOperation, + SystemNodes, +) +from app.resources.server_containers import ServerContainers +from app.runtime.actors.edge import EdgeRuntime +from app.runtime.rqs_state import RequestState +from app.schemas.system_topology.full_system_topology import Server + + +class ServerRuntime: + """class to define the server during the simulation""" + + def __init__( # noqa: PLR0913 + self, + env: simpy.Environment, + server_resources: ServerContainers, + server_config: Server, + out_edge: EdgeRuntime, + server_box: simpy.Store, + rng: np.random.Generator | None = None, + ) -> None: + """Server attributes + + Args: + env (simpy.Environment): _description_ + server_resources (ServerContainers): _description_ + server_config (Server): _description_ + out_edge (EdgeRuntime): _description_ + server_box (simpy.Store): _description_ + rng (np.random.Generator | None, optional): _description_. Defaults to None. + + """ + self.env = env + self.server_resources = server_resources + self.server_config = server_config + self.out_edge = out_edge + self.server_box = server_box + self.rng = rng or np.random.default_rng() + + + def _handle_request( + self, + state: RequestState, + ) -> Generator[simpy.Event, None, None]: + """ + Define all the step each request has to do ones reach + the server + """ + #register the history for the state: + state.record_hop( + SystemNodes.SERVER, + self.server_config.id, + self.env.now, + ) + + # Define the length of the endpoint list + endpoints_list = self.server_config.endpoints + endpoints_number = len(endpoints_list) + + # select the endpoint where the requests is directed at the moment we use + # a uniform distribution, in the future we will allow the user to define a + # custom distribution + selected_endpoint_idx = self.rng.integers(low=0, high=endpoints_number) + selected_endpoint = endpoints_list[selected_endpoint_idx] + + # RAM management: + # first calculate the ram needed + # Ask if it is available + # Release everything when the operation completed + + total_ram = sum( + step.step_operation[StepOperation.NECESSARY_RAM] + for step in selected_endpoint.steps + if isinstance(step.kind, EndpointStepRAM) + ) + + + if total_ram: + yield self.server_resources[ServerResourceName.RAM.value].get(total_ram) + + + # --- Step Execution: Process CPU and IO operations --- + for step in selected_endpoint.steps: + + if isinstance(step.kind, EndpointStepCPU): + cpu_time = step.step_operation[StepOperation.CPU_TIME] + + # Acquire one core + yield self.server_resources[ServerResourceName.CPU.value].get(1) + # Hold the core busy + yield self.env.timeout(cpu_time) + # Release the core + yield self.server_resources[ServerResourceName.CPU.value].put(1) + + elif isinstance(step.kind, EndpointStepIO): + io_time = step.step_operation[StepOperation.IO_WAITING_TIME] + yield self.env.timeout(io_time) # Wait without holding a CPU core + + # release the ram + if total_ram: + yield self.server_resources[ServerResourceName.RAM.value].put(total_ram) + + self.out_edge.transport(state) + + def _dispatcher(self) -> Generator[simpy.Event, None, None]: + """ + The main dispatcher loop. It pulls requests from the inbox and + spawns a new '_handle_request' process for each one. + """ + while True: + # Wait for a request to arrive in the server's inbox + raw_state = yield self.server_box.get() + request_state = cast("RequestState", raw_state) + # Spawn a new, independent process to handle this request + self.env.process(self._handle_request(request_state)) + + def start(self) -> simpy.Process: + """Generate the process to simulate the server inside simpy env""" + return self.env.process(self._dispatcher()) + diff --git a/src/app/runtime/rqs_state.py b/src/app/runtime/rqs_state.py new file mode 100644 index 0000000..1c953fd --- /dev/null +++ b/src/app/runtime/rqs_state.py @@ -0,0 +1,51 @@ +"""Data structures representing the life-cycle of a single request.""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import TYPE_CHECKING, NamedTuple + +if TYPE_CHECKING: + from app.config.constants import SystemEdges, SystemNodes + + +class Hop(NamedTuple): + """A single traversal of a node or edge.""" + + component_type: SystemNodes | SystemEdges + component_id: str + timestamp: float + + +@dataclass +class RequestState: + """Mutable state carried by each request throughout the simulation.""" + + id: int + initial_time: float + finish_time: float | None = None + history: list[Hop] = field(default_factory=list) + + # ------------------------------------------------------------------ # + # API # + # ------------------------------------------------------------------ # + + def record_hop( + self, + component_type: SystemNodes | SystemEdges, + component_id: str, + now: float, + ) -> None: + """Append a new hop in chronological order.""" + self.history.append(Hop(component_type, component_id, now)) + + # ------------------------------------------------------------------ # + # Derived metrics # + # ------------------------------------------------------------------ # + + @property + def latency(self) -> float | None: + """Total time inside the system or ``None`` if not yet completed.""" + if self.finish_time is None: + return None + return self.finish_time - self.initial_time diff --git a/src/app/core/event_samplers/common_helpers.py b/src/app/samplers/common_helpers.py similarity index 100% rename from src/app/core/event_samplers/common_helpers.py rename to src/app/samplers/common_helpers.py diff --git a/src/app/core/event_samplers/gaussian_poisson.py b/src/app/samplers/gaussian_poisson.py similarity index 96% rename from src/app/core/event_samplers/gaussian_poisson.py rename to src/app/samplers/gaussian_poisson.py index f9cc401..bb1f6fd 100644 --- a/src/app/core/event_samplers/gaussian_poisson.py +++ b/src/app/samplers/gaussian_poisson.py @@ -12,11 +12,11 @@ import numpy as np from app.config.constants import TimeDefaults -from app.core.event_samplers.common_helpers import ( +from app.samplers.common_helpers import ( truncated_gaussian_generator, uniform_variable_generator, ) -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings diff --git a/src/app/core/event_samplers/poisson_poisson.py b/src/app/samplers/poisson_poisson.py similarity index 95% rename from src/app/core/event_samplers/poisson_poisson.py rename to src/app/samplers/poisson_poisson.py index 1d4787f..7f17364 100644 --- a/src/app/core/event_samplers/poisson_poisson.py +++ b/src/app/samplers/poisson_poisson.py @@ -9,11 +9,11 @@ import numpy as np from app.config.constants import TimeDefaults -from app.core.event_samplers.common_helpers import ( +from app.samplers.common_helpers import ( poisson_variable_generator, uniform_variable_generator, ) -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings diff --git a/src/app/schemas/full_simulation_input.py b/src/app/schemas/full_simulation_input.py index b745fee..8be873a 100644 --- a/src/app/schemas/full_simulation_input.py +++ b/src/app/schemas/full_simulation_input.py @@ -2,9 +2,9 @@ from pydantic import BaseModel -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings -from app.schemas.system_topology_schema.full_system_topology_schema import TopologyGraph +from app.schemas.system_topology.full_system_topology import TopologyGraph class SimulationPayload(BaseModel): diff --git a/src/app/schemas/requests_generator_input.py b/src/app/schemas/rqs_generator_input.py similarity index 93% rename from src/app/schemas/requests_generator_input.py rename to src/app/schemas/rqs_generator_input.py index 35ff361..4f217e7 100644 --- a/src/app/schemas/requests_generator_input.py +++ b/src/app/schemas/rqs_generator_input.py @@ -3,13 +3,15 @@ from pydantic import BaseModel, Field, field_validator -from app.config.constants import Distribution, TimeDefaults +from app.config.constants import Distribution, SystemNodes, TimeDefaults from app.schemas.random_variables_config import RVConfig class RqsGeneratorInput(BaseModel): """Define the expected variables for the simulation""" + id: str + type: SystemNodes = SystemNodes.GENERATOR avg_active_users: RVConfig avg_request_per_minute_per_user: RVConfig diff --git a/src/app/schemas/system_topology_schema/endpoint_schema.py b/src/app/schemas/system_topology/endpoint.py similarity index 100% rename from src/app/schemas/system_topology_schema/endpoint_schema.py rename to src/app/schemas/system_topology/endpoint.py diff --git a/src/app/schemas/system_topology_schema/full_system_topology_schema.py b/src/app/schemas/system_topology/full_system_topology.py similarity index 99% rename from src/app/schemas/system_topology_schema/full_system_topology_schema.py rename to src/app/schemas/system_topology/full_system_topology.py index ddbac36..2f29f2a 100644 --- a/src/app/schemas/system_topology_schema/full_system_topology_schema.py +++ b/src/app/schemas/system_topology/full_system_topology.py @@ -24,7 +24,7 @@ SystemNodes, ) from app.schemas.random_variables_config import RVConfig -from app.schemas.system_topology_schema.endpoint_schema import Endpoint +from app.schemas.system_topology.endpoint import Endpoint #------------------------------------------------------------- # Definition of the nodes structure for the graph representing @@ -138,7 +138,6 @@ def unique_ids( model_config = ConfigDict(extra="forbid") - #------------------------------------------------------------- # Definition of the edges structure for the graph representing # the topoogy of the system defined for the simulation diff --git a/src/app/core/simulation_run.py b/src/app/simulation_run.py similarity index 97% rename from src/app/core/simulation_run.py rename to src/app/simulation_run.py index c6159ca..1176751 100644 --- a/src/app/core/simulation_run.py +++ b/src/app/simulation_run.py @@ -19,7 +19,7 @@ - +### TO MODIFY EVERYTHING WORK IN PROGRESS def run_simulation( input_data: SimulationPayload, diff --git a/tests/conftest.py b/tests/conftest.py index e6d7b75..7d0f587 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -30,9 +30,9 @@ from app.main import app from app.schemas.full_simulation_input import SimulationPayload from app.schemas.random_variables_config import RVConfig -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings -from app.schemas.system_topology_schema.full_system_topology_schema import ( +from app.schemas.system_topology.full_system_topology import ( Client, TopologyGraph, TopologyNodes, @@ -153,14 +153,14 @@ def rng() -> NpGenerator: return default_rng(0) -# --------------------------------------------------------------------------- -# Metrics sets -# --------------------------------------------------------------------------- +# --------------------------------------------------------------------------- # +# Metric sets # +# --------------------------------------------------------------------------- # @pytest.fixture(scope="session") def enabled_sample_metrics() -> set[SampledMetricName]: - """Default sample-level KPIs tracked in most tests.""" + """Default time-series KPIs collected in most tests.""" return { SampledMetricName.READY_QUEUE_LEN, SampledMetricName.RAM_IN_USE, @@ -169,13 +169,15 @@ def enabled_sample_metrics() -> set[SampledMetricName]: @pytest.fixture(scope="session") def enabled_event_metrics() -> set[EventMetricName]: - """Default event-level KPIs tracked in most tests.""" - return {EventMetricName.RQS_LATENCY} + """Default per-event KPIs collected in most tests.""" + return { + EventMetricName.RQS_LATENCY, + } -# --------------------------------------------------------------------------- -# Global simulation settings -# --------------------------------------------------------------------------- +# --------------------------------------------------------------------------- # +# Global simulation settings # +# --------------------------------------------------------------------------- # @pytest.fixture @@ -183,7 +185,12 @@ def sim_settings( enabled_sample_metrics: set[SampledMetricName], enabled_event_metrics: set[EventMetricName], ) -> SimulationSettings: - """A minimal `SimulationSettings` instance for unit tests.""" + """ + Minimal :class:`SimulationSettings` instance. + + The simulation horizon is fixed to the lowest allowed value so that unit + tests run quickly. + """ return SimulationSettings( total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, enabled_sample_metrics=enabled_sample_metrics, @@ -191,37 +198,46 @@ def sim_settings( ) -# --------------------------------------------------------------------------- -# Traffic profile -# --------------------------------------------------------------------------- +# --------------------------------------------------------------------------- # +# Traffic profile # +# --------------------------------------------------------------------------- # @pytest.fixture def rqs_input() -> RqsGeneratorInput: - """`RqsGeneratorInput` with 1 user and 2 req/min for quick tests.""" + """ + One active user issuing two requests per minute—sufficient to + exercise the entire request-generator pipeline with minimal overhead. + """ return RqsGeneratorInput( + id="rqs-1", avg_active_users=RVConfig(mean=1.0), avg_request_per_minute_per_user=RVConfig(mean=2.0), user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, ) -# --------------------------------------------------------------------------- -# Minimal topology (one client, no servers, no edges) -# --------------------------------------------------------------------------- +# --------------------------------------------------------------------------- # +# Minimal topology (one client, no servers, no edges) # +# --------------------------------------------------------------------------- # @pytest.fixture def topology_minimal() -> TopologyGraph: - """Valid topology with a single client and zero servers/edges.""" + """ + A valid topology containing a single client and **no** servers or edges. + + Suitable for low-level tests that do not need to traverse the server + layer or network graph. + """ client = Client(id="client-1") nodes = TopologyNodes(servers=[], client=client) return TopologyGraph(nodes=nodes, edges=[]) -# --------------------------------------------------------------------------- -# Full simulation payload -# --------------------------------------------------------------------------- +# --------------------------------------------------------------------------- # +# Complete simulation payload # +# --------------------------------------------------------------------------- # @pytest.fixture @@ -230,7 +246,12 @@ def payload_base( sim_settings: SimulationSettings, topology_minimal: TopologyGraph, ) -> SimulationPayload: - """End-to-end payload used by high-level simulation tests.""" + """ + End-to-end payload used by integration tests and FastAPI endpoint tests. + + It wires together the individual fixtures into the single object expected + by the simulation engine. + """ return SimulationPayload( rqs_input=rqs_input, topology_graph=topology_minimal, diff --git a/tests/unit/test_settings.py b/tests/unit/db/test_settings.py similarity index 100% rename from tests/unit/test_settings.py rename to tests/unit/db/test_settings.py diff --git a/tests/unit/resources/test_registry.py b/tests/unit/resources/test_registry.py new file mode 100644 index 0000000..fe5a693 --- /dev/null +++ b/tests/unit/resources/test_registry.py @@ -0,0 +1,60 @@ +"""Unit tests for ResourcesRuntime (resource registry).""" + +from __future__ import annotations + +import pytest +import simpy + +from app.config.constants import ServerResourceName +from app.resources.registry import ResourcesRuntime +from app.schemas.system_topology.endpoint import Endpoint +from app.schemas.system_topology.full_system_topology import ( + Client, + Server, + ServerResources, + TopologyGraph, + TopologyNodes, +) + + +def _minimal_server(server_id: str, cores: int, ram: int) -> Server: + """Create a Server with a dummy endpoint and resource spec.""" + res = ServerResources(cpu_cores=cores, ram_mb=ram) + dummy_ep = Endpoint(endpoint_name="/ping", steps=[]) + return Server(id=server_id, server_resources=res, endpoints=[dummy_ep]) + + +def _build_topology() -> TopologyGraph: + """Return a minimal but schema-valid topology with two servers.""" + servers = [ + _minimal_server("srv-A", 2, 1024), + _minimal_server("srv-B", 4, 2048), + ] + client = Client(id="clt-1") + nodes = TopologyNodes(servers=servers, client=client) + return TopologyGraph(nodes=nodes, edges=[]) + + +def test_registry_initialises_filled_containers() -> None: + """CPU and RAM containers must start full for every server.""" + env = simpy.Environment() + topo = _build_topology() + registry = ResourcesRuntime(env, topo) + + for srv in topo.nodes.servers: + containers = registry[srv.id] + + cpu = containers[ServerResourceName.CPU.value] + ram = containers[ServerResourceName.RAM.value] + + assert cpu.level == cpu.capacity == srv.server_resources.cpu_cores + assert ram.level == ram.capacity == srv.server_resources.ram_mb + + +def test_getitem_unknown_server_raises_keyerror() -> None: + """Accessing an undefined server ID should raise KeyError.""" + env = simpy.Environment() + registry = ResourcesRuntime(env, _build_topology()) + + with pytest.raises(KeyError): + _ = registry["non-existent-server"] diff --git a/tests/unit/resources/test_server_containers.py b/tests/unit/resources/test_server_containers.py new file mode 100644 index 0000000..c36f927 --- /dev/null +++ b/tests/unit/resources/test_server_containers.py @@ -0,0 +1,19 @@ +"""Unit test: build_containers must return full containers.""" + +import simpy + +from app.config.constants import ServerResourceName +from app.resources.server_containers import build_containers +from app.schemas.system_topology.full_system_topology import ServerResources + + +def test_containers_start_full() -> None: + env = simpy.Environment() + spec = ServerResources(cpu_cores=4, ram_mb=2048) + containers = build_containers(env, spec) + + cpu = containers[ServerResourceName.CPU.value] + ram = containers[ServerResourceName.RAM.value] + + assert cpu.level == cpu.capacity == 4 + assert ram.level == ram.capacity == 2048 diff --git a/tests/unit/runtime/engine/test_client.py b/tests/unit/runtime/engine/test_client.py new file mode 100644 index 0000000..bc17471 --- /dev/null +++ b/tests/unit/runtime/engine/test_client.py @@ -0,0 +1,96 @@ +"""Unit-tests for :class:`ClientRuntime` (outbound / inbound paths).""" + +from __future__ import annotations + +import simpy + +from app.config.constants import SystemEdges, SystemNodes +from app.runtime.actors.client import ClientRuntime +from app.runtime.rqs_state import RequestState +from app.schemas.system_topology.full_system_topology import ( + Client, +) + +# --------------------------------------------------------------------------- # +# Dummy edge (no real network) # +# --------------------------------------------------------------------------- # + + +class DummyEdgeRuntime: + """Collect states passed through *transport* without SimPy side-effects.""" + + def __init__(self, env: simpy.Environment) -> None: + """Init attributes""" + self.env = env + self.forwarded: list[RequestState] = [] + + # Signature compatible with EdgeRuntime.transport but returns *None* + def transport(self, state: RequestState) -> None: + """Transport state""" + self.forwarded.append(state) + + +# --------------------------------------------------------------------------- # +# Helper # +# --------------------------------------------------------------------------- # + + +def _setup( + env: simpy.Environment, +) -> tuple[simpy.Store, simpy.Store, DummyEdgeRuntime]: + inbox: simpy.Store = simpy.Store(env) + completed: simpy.Store = simpy.Store(env) + edge_rt = DummyEdgeRuntime(env) + cli_cfg = Client(id="cli-1") + + client = ClientRuntime( + env=env, + out_edge=edge_rt, # type: ignore[arg-type] + client_box=inbox, + completed_box=completed, + client_config=cli_cfg, + ) + client.start() # start the forwarder + return inbox, completed, edge_rt + + +# --------------------------------------------------------------------------- # +# Tests # +# --------------------------------------------------------------------------- # + + +def test_outbound_is_forwarded() -> None: + """First visit ⇒ forwarded; completed store remains empty.""" + env = simpy.Environment() + inbox, completed, edge_rt = _setup(env) + + req = RequestState(id=1, initial_time=0.0) + req.record_hop(SystemNodes.GENERATOR, "gen-1", env.now) + + inbox.put(req) + env.run() + + assert len(edge_rt.forwarded) == 1 + assert len(completed.items) == 0 + assert req.history[-1].component_type is SystemNodes.CLIENT + assert req.finish_time is None + + +def test_inbound_is_completed() -> None: + """Second visit ⇒ request stored in *completed_box* and not re-forwarded.""" + env = simpy.Environment() + inbox, completed, edge_rt = _setup(env) + + req = RequestState(id=2, initial_time=0.0) + req.record_hop(SystemNodes.GENERATOR, "gen-1", env.now) + req.record_hop(SystemEdges.NETWORK_CONNECTION, "edge-X", env.now) + + inbox.put(req) + env.run() + + assert len(edge_rt.forwarded) == 0 + assert len(completed.items) == 1 + + done = completed.items[0] + assert done.finish_time is not None + assert done.history[-1].component_type is SystemNodes.CLIENT diff --git a/tests/unit/runtime/engine/test_edge.py b/tests/unit/runtime/engine/test_edge.py new file mode 100644 index 0000000..10cb758 --- /dev/null +++ b/tests/unit/runtime/engine/test_edge.py @@ -0,0 +1,127 @@ +"""Unit-tests for :class:`EdgeRuntime` (delivery / drop paths).""" +from __future__ import annotations + +from typing import TYPE_CHECKING, cast + +import simpy + +from app.config.constants import SystemEdges, SystemNodes +from app.runtime.actors.edge import EdgeRuntime +from app.runtime.rqs_state import RequestState +from app.schemas.random_variables_config import RVConfig +from app.schemas.system_topology.full_system_topology import Edge + +if TYPE_CHECKING: + + import numpy as np + +# --------------------------------------------------------------------------- # +# Dummy RNG # +# --------------------------------------------------------------------------- # + + +class DummyRNG: + """Return preset values for ``uniform`` and ``normal``.""" + + def __init__(self, *, uniform_value: float, normal_value: float = 0.0) -> None: + """Attribute init""" + self.uniform_value = uniform_value + self.normal_value = normal_value + self.uniform_called = False + self.normal_called = False + + + def uniform(self) -> float: + """Intercept ``rng.uniform`` calls.""" + self.uniform_called = True + return self.uniform_value + + def normal(self, _mean: float, _sigma: float) -> float: + """Intercept ``rng.normal`` calls.""" + self.normal_called = True + return self.normal_value + + +# --------------------------------------------------------------------------- # +# Helper to build an EdgeRuntime # +# --------------------------------------------------------------------------- # + + +def _make_edge( + env: simpy.Environment, + *, + uniform_value: float, + normal_value: float = 0.0, + dropout_rate: float = 0.0, +) -> tuple[EdgeRuntime, DummyRNG, simpy.Store]: + """Attributes init""" + rng = DummyRNG(uniform_value=uniform_value, normal_value=normal_value) + + store: simpy.Store = simpy.Store(env) + edge_cfg = Edge( + id="edge-1", + source="src", + target="dst", + latency=RVConfig(mean=0.0, variance=1.0, distribution="normal"), + dropout_rate=dropout_rate, + ) + + edge_rt = EdgeRuntime( + env=env, + edge_config=edge_cfg, + rng=cast("np.random.Generator", rng), + target_box=store, + ) + return edge_rt, rng, store + + +# --------------------------------------------------------------------------- # +# Tests # +# --------------------------------------------------------------------------- # + + +def test_edge_delivers_message_when_not_dropped() -> None: + """A message traverses the edge and calls the latency sampler once.""" + env = simpy.Environment() + edge_rt, rng, store = _make_edge( + env, + uniform_value=0.9, + normal_value=0.5, + dropout_rate=0.2, + ) + + state = RequestState(id=1, initial_time=0.0) + state.record_hop(SystemNodes.GENERATOR, "gen-1", env.now) + + edge_rt.transport(state) + env.run() + + assert len(store.items) == 1 + delivered: RequestState = store.items[0] + last = delivered.history[-1] + assert last.component_type is SystemEdges.NETWORK_CONNECTION + assert last.component_id == "edge-1" + assert rng.uniform_called is True + assert rng.normal_called is True + + +def test_edge_drops_message_when_uniform_below_threshold() -> None: + """A message is dropped when the random draw is below *dropout_rate*.""" + env = simpy.Environment() + edge_rt, rng, store = _make_edge( + env, + uniform_value=0.1, # < dropout → drop + dropout_rate=0.5, + ) + + state = RequestState(id=1, initial_time=0.0) + state.record_hop(SystemNodes.GENERATOR, "gen-1", env.now) + + edge_rt.transport(state) + env.run() + + assert len(store.items) == 0 + last = state.history[-1] + assert last.component_id.endswith("dropped") + assert rng.uniform_called is True + assert rng.normal_called is False diff --git a/tests/unit/runtime/test_requests_generator.py b/tests/unit/runtime/engine/test_rqs_generator.py similarity index 93% rename from tests/unit/runtime/test_requests_generator.py rename to tests/unit/runtime/engine/test_rqs_generator.py index be03583..bc76f4d 100644 --- a/tests/unit/runtime/test_requests_generator.py +++ b/tests/unit/runtime/engine/test_rqs_generator.py @@ -8,16 +8,16 @@ import simpy from app.config.constants import Distribution -from app.core.runtime.rqs_generator import RqsGeneratorRuntime +from app.runtime.actors.rqs_generator import RqsGeneratorRuntime if TYPE_CHECKING: import pytest from numpy.random import Generator - from app.config.rqs_state import RequestState - from app.core.runtime.edge import EdgeRuntime - from app.schemas.requests_generator_input import RqsGeneratorInput + from app.runtime.actors.edge import EdgeRuntime + from app.runtime.rqs_state import RequestState + from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings import importlib @@ -63,7 +63,7 @@ def _make_runtime( # --------------------------------------------------------------------------- # -RGR_MODULE = importlib.import_module("app.core.runtime.rqs_generator") +RGR_MODULE = importlib.import_module("app.runtime.actors.rqs_generator") def test_dispatcher_selects_poisson_poisson( monkeypatch: pytest.MonkeyPatch, diff --git a/tests/unit/runtime/engine/test_server.py b/tests/unit/runtime/engine/test_server.py new file mode 100644 index 0000000..69292ba --- /dev/null +++ b/tests/unit/runtime/engine/test_server.py @@ -0,0 +1,181 @@ +"""Unit tests for the concurrent ServerRuntime. + +The tests create an isolated SimPy environment with a test fixture that sets up: +* A single ServerRuntime instance. +* A mock "instant" edge that immediately forwards requests to a sink. +* A server configuration with 2 CPU cores and 1024 MB of RAM. +* A single endpoint with a sequence of RAM, CPU, and I/O steps. + +This setup allows for precise testing of resource acquisition/release and +the correct execution of the processing pipeline for a single request. +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +import simpy +from numpy.random import default_rng + +from app.config.constants import ( + EndpointStepCPU, + EndpointStepIO, + EndpointStepRAM, + StepOperation, + SystemNodes, +) +from app.resources.server_containers import build_containers +from app.runtime.actors.server import ServerRuntime +from app.runtime.rqs_state import RequestState +from app.schemas.system_topology.endpoint import Endpoint, Step +from app.schemas.system_topology.full_system_topology import Server, ServerResources + +if TYPE_CHECKING: + from collections.abc import Generator + + +# --------------------------------------------------------------------------- # +# Test Helper: A mock edge that instantly delivers requests to a sink. # +# --------------------------------------------------------------------------- # +class InstantEdge: + """A test stub for EdgeRuntime with zero latency and no drops. + + This mock allows us to test the ServerRuntime in isolation, without + introducing the complexities of network simulation. + """ + + def __init__(self, env: simpy.Environment, sink: simpy.Store) -> None: + """Initializes the mock edge.""" + self.env = env + self.sink = sink + + def transport(self, state: RequestState) -> simpy.Process: + """Immediately puts the state in the sink via a SimPy process.""" + return self.env.process(self._deliver(state)) + + def _deliver(self, state: RequestState) -> Generator[simpy.Event, None, None]: + """The generator function that performs the delivery.""" + yield self.sink.put(state) + + +# --------------------------------------------------------------------------- # +# Test Fixture: Creates a standardized ServerRuntime for tests. # +# --------------------------------------------------------------------------- # +def _make_server_runtime( + env: simpy.Environment, +) -> tuple[ServerRuntime, simpy.Store]: + """Create a ServerRuntime with a dummy edge and return it and the sink store.""" + # 1. Define server resources + res_spec = ServerResources(cpu_cores=2, ram_mb=1024) + containers = build_containers(env, res_spec) + + # 2. Define a single endpoint with a sequence of steps + # Order: RAM (instant) -> CPU (5ms) -> I/O (20ms) + endpoint = Endpoint( + endpoint_name="/predict", + steps=[ + Step( + kind=EndpointStepRAM.RAM, + step_operation={StepOperation.NECESSARY_RAM: 128}, + ), + Step( + kind=EndpointStepCPU.CPU_BOUND_OPERATION, + step_operation={StepOperation.CPU_TIME: 0.005}, + ), + Step( + kind=EndpointStepIO.DB, + step_operation={StepOperation.IO_WAITING_TIME: 0.020}, + ), + ], + ) + + # 3. Create the full server configuration + server_cfg = Server( + id="api_srv", + endpoints=[endpoint], + server_resources=res_spec, + ) + + # 4. Set up the simulation environment with mock components + inbox: simpy.Store = simpy.Store(env) + sink: simpy.Store = simpy.Store(env) + edge = InstantEdge(env, sink) + + # 5. Instantiate the ServerRuntime + runtime = ServerRuntime( + env=env, + server_resources=containers, + server_config=server_cfg, + out_edge=edge, # type: ignore[arg-type] + server_box=inbox, + rng=default_rng(0), + ) + return runtime, sink + + +# --------------------------------------------------------------------------- # +# Unit Tests # +# --------------------------------------------------------------------------- # +def test_server_reserves_and_releases_ram() -> None: + """Verify that RAM is acquired at the start and fully released at the end.""" + env = simpy.Environment() + server, sink = _make_server_runtime(env) + + # Prepare a request and inject it into the server's inbox. + req = RequestState(id=1, initial_time=0.0) + server.server_box.put(req) + + # Start the server's dispatcher process and run until all events are processed. + server.start() + env.run() + + ram_container = server.server_resources["RAM"] + # After the request is fully processed, the RAM level must return to its capacity. + assert ram_container.level == ram_container.capacity, "RAM must be fully released" + # The request should have successfully reached the sink. + assert len(sink.items) == 1, "Request should be forwarded to the sink" + + +def test_cpu_core_held_only_during_cpu_step() -> None: + """Verify a CPU core is held exclusively during the CPU-bound step.""" + env = simpy.Environment() + server, _ = _make_server_runtime(env) + cpu_container = server.server_resources["CPU"] + + # Inject a single request and start the server. + req = RequestState(id=2, initial_time=0.0) + server.server_box.put(req) + server.start() + + # The endpoint logic is: RAM (t=0) -> CPU (t=0 to t=0.005). + # Run the simulation to a point *during* the CPU step. + env.run(until=0.004) + # The server has 2 cores. One should be busy. + assert cpu_container.level == 1, "One core should still be busy during the CPU step" + + # Now, run the simulation past the CPU step's completion. + env.run(until=0.006) + # The core should have been released immediately after the CPU step. + assert cpu_container.level == 2, "Core should be released after the CPU step" + + +def test_server_records_hop_in_history() -> None: + """Verify that the request's history correctly records its arrival at the server.""" + env = simpy.Environment() + server, sink = _make_server_runtime(env) + + # Inject a request and run the simulation to completion. + req = RequestState(id=3, initial_time=0.0) + server.server_box.put(req) + server.start() + env.run() + + # The request must be in the sink. + assert len(sink.items) == 1, "Request did not reach the sink" + finished_req = sink.items[0] + + # Check the request's history for a 'Hop' corresponding to this server. + assert any( + hop.component_type == SystemNodes.SERVER and hop.component_id == "api_srv" + for hop in finished_req.history + ), "Server hop missing in request history" diff --git a/tests/unit/runtime/test_edge b/tests/unit/runtime/test_edge deleted file mode 100644 index 8daaf01..0000000 --- a/tests/unit/runtime/test_edge +++ /dev/null @@ -1,136 +0,0 @@ -"""Unit-tests for :class:`EdgeRuntime`. - -The tests cover: - -* correct delivery when the random 'uniform' draw exceeds the drop rate; -* correct drop behaviour when the draw is below the drop rate; -* that the latency sampler is invoked exactly once per message. -""" -from __future__ import annotations - -from collections.abc import Iterator -from typing import TYPE_CHECKING, cast - -import numpy as np -import pytest -import simpy - -from app.config.constants import NetworkParameters, SystemNodes -from app.config.rqs_state import RequestState -from app.core.event_samplers.common_helpers import general_sampler -from app.core.runtime.edge import EdgeRuntime -from app.schemas.random_variables_config import RVConfig -from app.schemas.system_topology_schema.full_system_topology_schema import Edge - -if TYPE_CHECKING: - from collections.abc import Generator - - -pytestmark = pytest.mark.unit # module-level marker - - -# --------------------------------------------------------------------------- # -# Dummy RNG # -# --------------------------------------------------------------------------- # - - -class DummyRNG: - """RNG stub returning preset values for `uniform()` and `normal()`.""" - - def __init__( - self, - *, - uniform_value: float, - sampler_value: float = 0.0, - ) -> None: - self.uniform_value = uniform_value - self.sampler_value = sampler_value - self.uniform_called = False - self.sampler_called = False - - def uniform(self) -> float: # noqa: D401 - self.uniform_called = True - return self.uniform_value - - # EdgeRuntime passes `self` to `general_sampler`; wrap the call - def normal(self, mean: float, sigma: float) -> float: # noqa: D401 - self.sampler_called = True - return self.sampler_value - - -# --------------------------------------------------------------------------- # -# Helper to create a minimal EdgeRuntime # -# --------------------------------------------------------------------------- # - - -def _make_edge( - env: simpy.Environment, - *, - uniform_value: float, - sampler_value: float = 0.0, -) -> tuple[EdgeRuntime, DummyRNG, simpy.Store]: - rng = DummyRNG(uniform_value=uniform_value, sampler_value=sampler_value) - store: simpy.Store = simpy.Store(env) - - edge_cfg = Edge( - id="edge-1", - source="src", - target="dst", - latency=RVConfig(mean=0.0, distribution="uniform"), # value ignored in test - ) - - edge_rt = EdgeRuntime( - env=env, - edge_config=edge_cfg, - rng=cast("np.random.Generator", rng), - target_box=store, - ) - return edge_rt, rng, store - - -# --------------------------------------------------------------------------- # -# Tests # -# --------------------------------------------------------------------------- # - - -def test_edge_delivers_message_when_not_dropped() -> None: - """Message is delivered and latency sampler is called once.""" - env = simpy.Environment() - edge, rng, store = _make_edge(env, uniform_value=0.9, sampler_value=0.5) - - # prepare request state - state = RequestState(id=1, initial_time=0.0) - state.record_hop(SystemNodes.GENERATOR, env.now) - - edge.transport(state) - env.run() - - # exactly one message delivered - assert len(store.items) == 1 - delivered: RequestState = store.items[0] - assert delivered.hops[-1].node == "edge-1" # last hop is the edge id - assert rng.uniform_called is True - assert rng.sampler_called is True - - -def test_edge_drops_message_when_uniform_below_threshold( - monkeypatch: pytest.MonkeyPatch, -) -> None: - """Message is dropped and never placed in the target store.""" - # override global drop rate to deterministic 0.5 for the test - monkeypatch.setattr(NetworkParameters, "DROPOUT_RATE", 0.5, raising=False) - - env = simpy.Environment() - edge, rng, store = _make_edge(env, uniform_value=0.1) # below 0.5 ⇒ drop - - state = RequestState(id=2, initial_time=0.0) - state.record_hop(SystemNodes.GENERATOR, env.now) - - edge.transport(state) - env.run() - - assert len(store.items) == 0 # nothing delivered - assert state.hops[-1].node.endswith("dropped") - assert rng.uniform_called is True - # sampler must not be invoked when dropped - assert rng.sampler_called is False diff --git a/tests/unit/runtime/test_rqs_state.py b/tests/unit/runtime/test_rqs_state.py new file mode 100644 index 0000000..d07caa4 --- /dev/null +++ b/tests/unit/runtime/test_rqs_state.py @@ -0,0 +1,66 @@ +"""Unit-tests for :class:`RequestState` and :class:`Hop`.""" +from __future__ import annotations + +from app.config.constants import SystemEdges, SystemNodes +from app.runtime.rqs_state import Hop, RequestState + +# --------------------------------------------------------------------------- # +# Helpers # +# --------------------------------------------------------------------------- # + + +def _state() -> RequestState: + """Return a fresh RequestState with id='42' and t0=0.0.""" + return RequestState(id=42, initial_time=0.0) + + +def _hop( + c_type: SystemNodes | SystemEdges, + c_id: str, + ts: float, +) -> Hop: + """Shorthand to build an Hop literal in tests.""" + return Hop(c_type, c_id, ts) + + +# --------------------------------------------------------------------------- # +# Tests # +# --------------------------------------------------------------------------- # + + +def test_record_hop_appends_tuple() -> None: + """record_hop stores a :class:`Hop` instance with all three fields.""" + st = _state() + st.record_hop(SystemNodes.GENERATOR, "gen-1", now=1.23456) + + expected = [_hop(SystemNodes.GENERATOR, "gen-1", 1.23456)] + assert st.history == expected + assert isinstance(st.history[0], Hop) + + +def test_multiple_hops_preserve_global_order() -> None: + """History keeps exact insertion order for successive hops.""" + st = _state() + st.record_hop(SystemNodes.GENERATOR, "gen-1", 0.1) + st.record_hop(SystemEdges.NETWORK_CONNECTION, "edge-7", 0.2) + st.record_hop(SystemNodes.SERVER, "api-A", 0.3) + + expected: list[Hop] = [ + _hop(SystemNodes.GENERATOR, "gen-1", 0.1), + _hop(SystemEdges.NETWORK_CONNECTION, "edge-7", 0.2), + _hop(SystemNodes.SERVER, "api-A", 0.3), + ] + assert st.history == expected + + +def test_latency_none_until_finish_time_set() -> None: + """Latency is ``None`` if *finish_time* has not been assigned.""" + st = _state() + assert st.latency is None + + +def test_latency_returns_difference() -> None: + """Latency equals ``finish_time - initial_time`` once closed.""" + st = _state() + st.finish_time = 5.5 + assert st.latency == 5.5 # 5.5 - 0.0 diff --git a/tests/unit/sampler/test_gaussian_poisson.py b/tests/unit/samplers/test_gaussian_poisson.py similarity index 93% rename from tests/unit/sampler/test_gaussian_poisson.py rename to tests/unit/samplers/test_gaussian_poisson.py index c182376..b2b3c2e 100644 --- a/tests/unit/sampler/test_gaussian_poisson.py +++ b/tests/unit/samplers/test_gaussian_poisson.py @@ -10,11 +10,11 @@ from numpy.random import Generator, default_rng from app.config.constants import TimeDefaults -from app.core.event_samplers.gaussian_poisson import ( +from app.samplers.gaussian_poisson import ( gaussian_poisson_sampling, ) from app.schemas.random_variables_config import RVConfig -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput if TYPE_CHECKING: @@ -29,6 +29,7 @@ def rqs_cfg() -> RqsGeneratorInput: """Minimal, valid RqsGeneratorInput for Gaussian-Poisson tests.""" return RqsGeneratorInput( + id= "gen-1", avg_active_users=RVConfig( mean=10.0, variance=4.0, @@ -98,7 +99,7 @@ def fake_truncated_gaussian( return 0.0 # force U = 0 monkeypatch.setattr( - "app.core.event_samplers.gaussian_poisson.truncated_gaussian_generator", + "app.samplers.gaussian_poisson.truncated_gaussian_generator", fake_truncated_gaussian, ) diff --git a/tests/unit/sampler/test_poisson_poisson.py b/tests/unit/samplers/test_poisson_poisson.py similarity index 95% rename from tests/unit/sampler/test_poisson_poisson.py rename to tests/unit/samplers/test_poisson_poisson.py index 2fbbb9e..00242fa 100644 --- a/tests/unit/sampler/test_poisson_poisson.py +++ b/tests/unit/samplers/test_poisson_poisson.py @@ -11,9 +11,9 @@ from numpy.random import Generator, default_rng from app.config.constants import TimeDefaults -from app.core.event_samplers.poisson_poisson import poisson_poisson_sampling +from app.samplers.poisson_poisson import poisson_poisson_sampling from app.schemas.random_variables_config import RVConfig -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput if TYPE_CHECKING: @@ -24,6 +24,7 @@ def rqs_cfg() -> RqsGeneratorInput: """Return a minimal, valid RqsGeneratorInput for the sampler tests.""" return RqsGeneratorInput( + id="gen-1", avg_active_users={"mean": 1.0, "distribution": "poisson"}, avg_request_per_minute_per_user={"mean": 60.0, "distribution": "poisson"}, user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, @@ -96,6 +97,7 @@ def test_zero_users_produces_no_events( ) -> None: """If the mean user count is zero the generator must yield no events.""" cfg_zero = RqsGeneratorInput( + id="gen-1", avg_active_users=RVConfig(mean=0.0, distribution="poisson"), avg_request_per_minute_per_user=RVConfig(mean=60.0, distribution="poisson"), user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, diff --git a/tests/unit/sampler/test_sampler_helper.py b/tests/unit/samplers/test_sampler_helper.py similarity index 99% rename from tests/unit/sampler/test_sampler_helper.py rename to tests/unit/samplers/test_sampler_helper.py index 7d34990..7e615d0 100644 --- a/tests/unit/sampler/test_sampler_helper.py +++ b/tests/unit/samplers/test_sampler_helper.py @@ -9,7 +9,7 @@ import pytest from app.config.constants import Distribution -from app.core.event_samplers.common_helpers import ( +from app.samplers.common_helpers import ( exponential_variable_generator, general_sampler, lognormal_variable_generator, diff --git a/tests/unit/input_sructure/test_endpoint_input.py b/tests/unit/schemas/test_endpoint_input.py similarity index 98% rename from tests/unit/input_sructure/test_endpoint_input.py rename to tests/unit/schemas/test_endpoint_input.py index fc08b5e..76d22a2 100644 --- a/tests/unit/input_sructure/test_endpoint_input.py +++ b/tests/unit/schemas/test_endpoint_input.py @@ -11,7 +11,7 @@ EndpointStepRAM, StepOperation, ) -from app.schemas.system_topology_schema.endpoint_schema import Endpoint, Step +from app.schemas.system_topology.endpoint import Endpoint, Step # --------------------------------------------------------------------------- # diff --git a/tests/unit/input_sructure/test_full_topology_input.py b/tests/unit/schemas/test_full_topology_input.py similarity index 97% rename from tests/unit/input_sructure/test_full_topology_input.py rename to tests/unit/schemas/test_full_topology_input.py index 7f0d864..8382ee6 100644 --- a/tests/unit/input_sructure/test_full_topology_input.py +++ b/tests/unit/schemas/test_full_topology_input.py @@ -14,8 +14,8 @@ SystemNodes, ) from app.schemas.random_variables_config import RVConfig -from app.schemas.system_topology_schema.endpoint_schema import Endpoint, Step -from app.schemas.system_topology_schema.full_system_topology_schema import ( +from app.schemas.system_topology.endpoint import Endpoint, Step +from app.schemas.system_topology.full_system_topology import ( Client, Edge, Server, diff --git a/tests/unit/input_sructure/test_requests_generator_input.py b/tests/unit/schemas/test_requests_generator_input.py similarity index 95% rename from tests/unit/input_sructure/test_requests_generator_input.py rename to tests/unit/schemas/test_requests_generator_input.py index 1ca9562..f676f4c 100644 --- a/tests/unit/input_sructure/test_requests_generator_input.py +++ b/tests/unit/schemas/test_requests_generator_input.py @@ -6,7 +6,7 @@ from app.config.constants import Distribution, TimeDefaults from app.schemas.random_variables_config import RVConfig -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.rqs_generator_input import RqsGeneratorInput from app.schemas.simulation_settings_input import SimulationSettings # --------------------------------------------------------------------------- # @@ -52,22 +52,15 @@ def test_explicit_variance_is_preserved() -> None: def test_mean_must_be_numeric() -> None: """A non-numeric mean triggers a ValidationError.""" - with pytest.raises(ValidationError) as exc: + with pytest.raises(ValidationError): RVConfig(mean="not a number", distribution=Distribution.POISSON) - assert any(err["loc"] == ("mean",) for err in exc.value.errors()) - def test_missing_mean_field() -> None: """Omitting mean raises a 'field required' ValidationError.""" - with pytest.raises(ValidationError) as exc: + with pytest.raises(ValidationError): RVConfig.model_validate({"distribution": Distribution.NORMAL}) - assert any( - err["loc"] == ("mean",) and err["type"] == "missing" - for err in exc.value.errors() - ) - def test_default_distribution_is_poisson() -> None: """If distribution is missing, it defaults to 'poisson'.""" @@ -106,6 +99,7 @@ def _valid_normal_cfg(mean: float = 1.0) -> dict[str, float | str]: def test_default_user_sampling_window() -> None: """If user_sampling_window is missing it defaults to the constant.""" inp = RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), ) @@ -115,6 +109,7 @@ def test_default_user_sampling_window() -> None: def test_explicit_user_sampling_window_kept() -> None: """An explicit user_sampling_window is preserved.""" inp = RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), user_sampling_window=30, @@ -126,6 +121,7 @@ def test_user_sampling_window_not_int_raises() -> None: """A non-integer user_sampling_window raises ValidationError.""" with pytest.raises(ValidationError): RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), user_sampling_window="not-int", @@ -137,6 +133,7 @@ def test_user_sampling_window_above_max_raises() -> None: too_large = TimeDefaults.MAX_USER_SAMPLING_WINDOW + 1 with pytest.raises(ValidationError): RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), user_sampling_window=too_large, @@ -147,6 +144,7 @@ def test_avg_request_must_be_poisson() -> None: """avg_request_per_minute_per_user must be Poisson; Normal raises.""" with pytest.raises(ValidationError): RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_normal_cfg(), ) @@ -157,6 +155,7 @@ def test_avg_active_users_invalid_distribution_raises() -> None: bad_cfg = {"mean": 1.0, "distribution": Distribution.EXPONENTIAL} with pytest.raises(ValidationError): RqsGeneratorInput( + id="rqs-1", avg_active_users=bad_cfg, avg_request_per_minute_per_user=_valid_poisson_cfg(), ) @@ -165,6 +164,7 @@ def test_avg_active_users_invalid_distribution_raises() -> None: def test_valid_poisson_poisson_configuration() -> None: """Poisson-Poisson combo is accepted.""" cfg = RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_poisson_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), ) @@ -178,6 +178,7 @@ def test_valid_poisson_poisson_configuration() -> None: def test_valid_normal_poisson_configuration() -> None: """Normal-Poisson combo is accepted.""" cfg = RqsGeneratorInput( + id="rqs-1", avg_active_users=_valid_normal_cfg(), avg_request_per_minute_per_user=_valid_poisson_cfg(), ) diff --git a/tests/unit/test_state.py b/tests/unit/test_state.py deleted file mode 100644 index 4612f1b..0000000 --- a/tests/unit/test_state.py +++ /dev/null @@ -1,47 +0,0 @@ -"""Unit-tests for :class:`RequestState`.""" -from __future__ import annotations - -from app.config.rqs_state import RequestState - -# --------------------------------------------------------------------------- # -# Helpers # -# --------------------------------------------------------------------------- # - - -def _state() -> RequestState: - """Return a fresh RequestState with id=42 and t0=0.0.""" - return RequestState(id=42, initial_time=0.0) - - -# --------------------------------------------------------------------------- # -# Tests # -# --------------------------------------------------------------------------- # - - -def test_record_hop_appends_formatted_entry() -> None: - """Calling *record_hop* stores 'node@timestamp' with 3-dec precision.""" - st = _state() - st.record_hop("generator", now=1.23456) - assert st.history == ["generator@1.235"] # rounded to 3 decimals - - -def test_multiple_hops_preserve_order() -> None: - """History keeps insertion order for consecutive hops.""" - st = _state() - st.record_hop("A", 0.1) - st.record_hop("B", 0.2) - st.record_hop("C", 0.3) - assert st.history == ["A@0.100", "B@0.200", "C@0.300"] - - -def test_latency_none_until_finish_time_set() -> None: - """Latency is None if *finish_time* not assigned.""" - st = _state() - assert st.latency is None - - -def test_latency_returns_difference() -> None: - """Latency equals finish_time - initial_time once completed.""" - st = _state() - st.finish_time = 5.5 - assert st.latency == 5.5 # 5.5 - 0.0