diff --git a/documentation/backend_documentation/input_structure_for_the_simulation.md b/documentation/backend_documentation/input_structure_for_the_simulation.md index e31c860..d28b090 100644 --- a/documentation/backend_documentation/input_structure_for_the_simulation.md +++ b/documentation/backend_documentation/input_structure_for_the_simulation.md @@ -1,404 +1,320 @@ -### **FastSim — Request-Generator Input Configuration** +### **FastSim — Simulation Input Schema** -A **single, self-consistent contract** links three layers of the codebase: +The `SimulationPayload` is the single, self-contained contract that defines an entire simulation run. Its architecture is guided by a core philosophy: to achieve maximum control over input data through robust, upfront validation. To implement this, we extensively leverage Pydantic's powerful validation capabilities and Python's `Enum` classes. This approach creates a strictly-typed and self-consistent schema that guarantees any configuration is validated *before* the simulation engine starts. -1. **Global Constants** – `TimeDefaults`, `Distribution` -2. **Random Variable Schema** – `RVConfig` -3. **Traffic-Generator Payload** – `RqsGeneratorInput` +This contract brings together three distinct but interconnected layers of configuration into one cohesive structure: -Understanding how these layers interact is key to crafting valid and predictable traffic profiles, preventing common configuration errors before the simulation begins. +1. **`rqs_input` (`RqsGeneratorInput`)**: Defines the **workload profile**—how many users are active and how frequently they generate requests. +2. **`topology_graph` (`TopologyGraph`)**: Describes the **system's architecture**—its components, resources, and the network connections between them. +3. **`sim_settings` (`SimulationSettings`)**: Configures **global simulation parameters**, such as total runtime and which metrics to collect. + +This layered design decouples the *what* (the system topology) from the *how* (the traffic pattern and simulation control), allowing for modular and reusable configurations. Adherence to our validation-first philosophy means every payload is rigorously parsed against this schema. By using a controlled vocabulary of `Enums` and the power of Pydantic, we guarantee that any malformed or logically inconsistent input is rejected upfront with clear, actionable errors, ensuring the simulation engine operates only on perfectly valid data. --- -### 1. Global Constants +### **1. Component: Traffic Profile (`RqsGeneratorInput`)** -| Constant Set | Purpose | Key Values | -| :--- | :--- | :--- | -| **`TimeDefaults`** (`IntEnum`) | Defines default values and validation bounds for time-based fields. | `SIMULATION_TIME = 3600 s`, `MIN_SIMULATION_TIME = 1800 s`, `USER_SAMPLING_WINDOW = 60 s`, `MIN_USER_SAMPLING_WINDOW = 1 s`, `MAX_USER_SAMPLING_WINDOW = 120 s` | -| **`Distribution`** (`StrEnum`) | Defines the canonical names of probability distributions supported by the generator. | `"poisson"`, `"normal"`, `"log_normal"`, `"exponential"` | +This component specifies the dynamic behavior of users interacting with the system. It is built upon a foundation of shared constants and a reusable, rigorously validated random variable schema. This design ensures that any traffic profile is not only structurally correct but also logically sound before the simulation begins. -***Why use constants?*** +#### **Global Constants** -* **Consistency:** They are referenced by validators; changing a value in one place updates the entire validation tree. -* **Safety:** They guarantee that a typo, such as `"Poisson"`, raises an error instead of silently failing or switching to an unintended default. +These enums provide a single source of truth for validation and default values, eliminating "magic strings" and ensuring consistency. + +| Constant Set | Purpose | Key Values | +| :--- | :--- | :--- | +| **`TimeDefaults`** (`IntEnum`) | Defines default values and validation bounds for time-based fields. | `USER_SAMPLING_WINDOW = 60`, `MIN_USER_SAMPLING_WINDOW = 1`, `MAX_USER_SAMPLING_WINDOW = 120` | +| **`Distribution`** (`StrEnum`) | Defines the canonical names of supported probability distributions. | `"poisson"`, `"normal"`, `"log_normal"`, `"exponential"` | --- -### 2. Random Variable Schema (`RVConfig`) +#### **Random Variable Schema (`RVConfig`)** + +At the core of the traffic generator is the `RVConfig`, a schema for defining stochastic variables. This allows critical parameters like user population and request rates to be modeled not as fixed numbers, but as draws from a probability distribution. Pydantic validators are used extensively to enforce correctness. ```python class RVConfig(BaseModel): """class to configure random variables""" - mean: float distribution: Distribution = Distribution.POISSON variance: float | None = None @field_validator("mean", mode="before") - def ensure_mean_is_numeric( - cls, # noqa: N805 - v: object, - ) -> float: - """Ensure `mean` is numeric, then coerce to float.""" - err_msg = "mean must be a number (int or float)" - if not isinstance(v, (float, int)): - raise ValueError(err_msg) # noqa: TRY004 - return float(v) - - @model_validator(mode="after") # type: ignore[arg-type] - def default_variance(cls, model: "RVConfig") -> "RVConfig": # noqa: N805 - """Set variance = mean when distribution == 'normal' and variance is missing.""" - if model.variance is None and model.distribution == Distribution.NORMAL: - model.variance = model.mean - return model + def ensure_mean_is_numeric(cls, v: object) -> float: + # ... implementation ... + @model_validator(mode="after") + def default_variance(cls, model: "RVConfig") -> "RVConfig": + # ... implementation ... ``` -#### Validation Logic - -| Check | Pydantic Hook | Rule | -| :--- | :--- | :--- | -| *Mean must be numeric* | `@field_validator("mean", before)` | Rejects strings and nulls; coerces `int` to `float`. | -| *Autofill variance* | `@model_validator(after)` | If `distribution == "normal"` **and** `variance` is not provided, sets `variance = mean`. | -| *Positivity enforcement* | `PositiveFloat` / `PositiveInt` | Pydantic's constrained types are used on fields like `mean` where negative values are invalid, rejecting them before business logic runs. | - -> **Self-Consistency:** Every random draw in the simulation engine relies on a validated `RVConfig` instance. This avoids redundant checks and defensive code downstream. +##### **Built-in Validation Logic** ---- +Pydantic's validation system is leveraged to enforce several layers of correctness directly within the schema: -### 3. Traffic-Generator Payload (`RqsGeneratorInput`) - -| Field | Type | Validation Tied to Constants | +| Check | Pydantic Hook | Rule & Rationale | | :--- | :--- | :--- | -| `avg_active_users` | `RVConfig` | No extra constraints needed; the inner schema guarantees correctness. | -| `avg_request_per_minute_per_user` | `RVConfig` | Same as above. | -| `total_simulation_time` | `int` | `ge=TimeDefaults.MIN_SIMULATION_TIME`
default=`TimeDefaults.SIMULATION_TIME` | -| `user_sampling_window` | `int` | `ge=TimeDefaults.MIN_USER_SAMPLING_WINDOW`
`le=TimeDefaults.MAX_USER_SAMPLING_WINDOW`
default=`TimeDefaults.USER_SAMPLING_WINDOW` | - -#### How the Generator Uses Each Field - -The simulation evolves based on a simple, powerful loop: - -1. **Timeline Partitioning** (`user_sampling_window`): The simulation timeline is divided into fixed-length windows. For each window: -2. **Active User Sampling** (`avg_active_users`): A single value is drawn to determine the concurrent user population, `U(t)`, for that window. -3. **Request Rate Calculation** (`avg_request_per_minute_per_user`): Each of the `U(t)` users contributes to the total request rate, yielding an aggregate load for the window. -4. **Termination** (`total_simulation_time`): The loop stops once the cumulative simulated time reaches this value. - -Because every numeric input is range-checked upfront, **the runtime engine never needs to defend itself** against invalid data like zero-length windows or negative rates, making the event-loop lean and predictable. +| **Numeric `mean` Enforcement** | `@field_validator("mean", mode="before")` | This validator intercepts the `mean` field *before* any type casting. It ensures the provided value is an `int` or `float`, raising an explicit `ValueError` for invalid types like strings (`"100"`) or nulls. This prevents common configuration errors and guarantees a valid numeric type for all downstream logic. | +| **Valid `distribution` Name** | `Distribution` (`StrEnum`) type hint | By type-hinting the `distribution` field with the `Distribution` enum, Pydantic automatically ensures that its value must be one of the predefined members (e.g., `"poisson"`, `"normal"`). Any typo or unsupported value (like `"Poisson"` with a capital 'P') results in an immediate validation error. | +| **Intelligent `variance` Defaulting** | `@model_validator(mode="after")` | This powerful validator runs *after* all individual fields have been validated. It enforces a crucial business rule: if `distribution` is `"normal"` **and** `variance` is not provided, the schema automatically sets `variance = mean`. This provides a safe, logical default and simplifies configuration for the user, while ensuring the model is always self-consistent. | --- -### 4. End-to-End Example (Fully Explicit) - -```json -{ - "avg_active_users": { - "mean": 100, - "distribution": "poisson" - }, - "avg_request_per_minute_per_user": { - "mean": 4.0, - "distribution": "normal", - "variance": null - }, - "total_simulation_time": 5400, - "user_sampling_window": 45 -}``` - -#### What the Validators Do +#### **Payload Structure (`RqsGeneratorInput`)** -1. `mean` is numeric ✔️ -2. `distribution` string matches an enum member ✔️ -3. `total_simulation_time` ≥ 1800 ✔️ -4. `user_sampling_window` is in the range ✔️ -5. `variance` is `null` with a `normal` distribution ⇒ **auto-set to 4.0** ✔️ +This is the main payload for configuring the traffic workload. It composes the `RVConfig` schema and adds its own validation rules. -The payload is accepted. The simulator will run for $5400 / 45 = 120$ simulation windows. - ---- +| Field | Type | Validation & Purpose | +| :--- | :--- | :--- | +| `avg_active_users` | `RVConfig` | A random variable defining concurrent users. **Inherits all `RVConfig` validation**, ensuring its `mean`, `distribution`, and `variance` are valid. | +| `avg_request_per_minute_per_user` | `RVConfig` | A random variable for the user request rate. Also **inherits all `RVConfig` validation**. | +| `user_sampling_window` | `int` | The time duration (in seconds) for which the number of active users is held constant. Its value is **strictly bounded** by Pydantic's `Field` to be between `MIN_USER_SAMPLING_WINDOW` (1) and `MAX_USER_SAMPLING_WINDOW` (120). | -### 5. Common Error Example +##### **How the Generator Uses Each Field** -```json -{ - "avg_active_users": { "mean": "many" }, - "avg_request_per_minute_per_user": { "mean": -2 }, - "total_simulation_time": 600, - "user_sampling_window": 400 -} -``` +The simulation evolves based on this robustly validated input: -| # | Fails On | Error Message (Abridged) | -| :- | :--- | :--- | -| 1 | Numeric check | `Input should be a valid number` | -| 2 | Positivity check | `Input should be greater than 0` | -| 3 | Minimum time check | `Input should be at least 1800` | -| 4 | Maximum window check | `Input should be at most 120` | +1. The timeline is divided into windows of `user_sampling_window` seconds. Because this value is range-checked upfront by Pydantic, the simulation is protected from invalid configurations like zero-length or excessively long windows. +2. At the start of each window, a number of active users, `U(t)`, is drawn from the `avg_active_users` distribution. The embedded `RVConfig` guarantees this distribution is well-defined. +3. Each of the `U(t)` users generates requests according to a rate drawn from `avg_request_per_minute_per_user`. ---- +Because every numeric input is type-checked and range-checked by Pydantic before the simulation begins, **the runtime engine never needs to defend itself** against invalid data. This makes the core simulation loop leaner, more predictable, and free from redundant error-handling logic. -### Takeaways +### **2. Component: System Blueprint (`TopologyGraph`)** -* **Single Source of Truth:** Enums centralize all literal values, eliminating magic strings. -* **Layered Validation:** The `Constants → RVConfig → Request Payload` hierarchy ensures that only well-formed traffic profiles reach the simulation engine. -* **Safe Defaults:** Omitting optional fields never leads to undefined behavior; defaults are sourced directly from the `TimeDefaults` constants. +The topology schema is the static blueprint of the digital twin you wish to simulate. It describes the system's components, their resources, their behavior, and how they are interconnected. To ensure simulation integrity, FastSim uses this schema to rigorously validate the entire system description upfront, rejecting any inconsistencies before the simulation begins. -This robust, layered approach allows you to configure the generator with confidence, knowing that any malformed scenario will be rejected early with explicit, actionable error messages. +Of course. Here is the complete, consolidated, and highly detailed documentation for the `TopologyGraph` component, with all duplications removed and explanations expanded as requested. +--- -### **FastSim Topology Input Schema** +### **2. Component: System Blueprint (`TopologyGraph`)** -The topology schema is the blueprint of the digital twin, defining the structure, resources, behavior, and network connections of the system you wish to simulate. It describes: +The topology schema is the static blueprint of the digital twin you wish to simulate. It describes the system's components, their resources, their behavior, and how they are interconnected. To ensure simulation integrity, FastSim uses this schema to rigorously validate the entire system description upfront, rejecting any inconsistencies before the simulation begins. -1. **What work** each request performs (`Endpoint` → `Step`). -2. **What components** exist in the system (`Server`, `Client`). -3. **Which resources** each component possesses (`ServerResources`). -4. **How** components are interconnected (`Edge`). +#### **Design Philosophy: A "Micro-to-Macro" Approach** -To ensure simulation integrity and prevent runtime errors, FastSim uses Pydantic to rigorously validate the entire topology upfront. Every inconsistency is rejected at load-time. The following sections detail the schema's layered design, from the most granular operation to the complete system graph. +The schema is built on a compositional, "micro-to-macro" principle. We start by defining the smallest indivisible units of work (`Step`) and progressively assemble them into larger, more complex structures (`Endpoint`, `Server`, and finally the `TopologyGraph`). ---- -### **A Controlled Vocabulary: The Role of Constants** +This layered approach provides several key advantages that enhance the convenience and reliability of crafting simulations: -To ensure that input configurations are unambiguous and robust, the topology schema is built upon a controlled vocabulary defined by a series of Python `Enum` classes. Instead of relying on raw strings or "magic values" (e.g., `"cpu_bound_operation"`), which are prone to typos and inconsistencies, the schema uses these enumerations to define the finite set of legal values for categories like operation kinds, metrics, and node types. +* **Modularity and Reusability:** Core operations are defined once as `Steps` and can be reused across multiple `Endpoints`. This modularity simplifies configuration, as complex workflows can be built from a library of simple, well-defined blocks. +* **Local Reasoning, Global Safety:** Each model is responsible for its own internal consistency (e.g., a `Step` ensures its metric is valid for its kind). Parent models then enforce the integrity of the connections *between* these components (e.g., the `TopologyGraph` ensures all `Edges` connect to valid `Nodes`). This allows you to focus on one part of the configuration at a time, confident that the overall structure will be validated globally. +* **Clarity and Maintainability:** The hierarchy is intuitive and mirrors how developers conceptualize system architecture. It is clear how atomic operations roll up into endpoints, which are hosted on servers connected by a network. This makes configuration files easy to read, write, and maintain over time. +* **Guaranteed Robustness:** By catching all structural and referential errors before the simulation begins, this approach embodies the "fail-fast" principle. It guarantees that the SimPy engine operates on a valid, self-consistent model, eliminating a whole class of potential runtime bugs. -This design choice provides three critical benefits: +#### **A Controlled Vocabulary: Topology Constants** -1. **Strong Type-Safety:** By using `StrEnum` and `IntEnum`, Pydantic models can validate input payloads with absolute certainty. Any value not explicitly defined in the corresponding `Enum` is immediately rejected. This prevents subtle configuration errors that would be difficult to debug at simulation time. -2. **Developer Experience and Error Prevention:** This approach provides powerful auto-completion and static analysis. IDEs, `mypy`, and linters can catch invalid values during development, providing immediate feedback long before the code is executed. -3. **Single Source of Truth:** All valid categories are centralized in the `app.config.constants` module. This makes the system easier to maintain and extend. To add a new resource type or metric, a developer only needs to update the `Enum` definition, and the change propagates consistently to validation logic, the simulation engine, and any other component that uses it. +The schema's robustness is founded on a controlled vocabulary defined by Python `Enum` classes. Instead of error-prone "magic strings" (e.g., `"cpu_operation"`), the schema uses these enums to define the finite set of legal values for categories like operation kinds, metrics, and node types. This design choice is critical for several reasons: -The key enumerations that govern the topology schema include: +* **Absolute Type-Safety:** Pydantic can validate input with certainty. Any value not explicitly defined in the corresponding `Enum` is immediately rejected, preventing subtle typos or incorrect values from causing difficult-to-debug runtime failures. +* **Enhanced Developer Experience:** IDEs and static analysis tools like `mypy` can provide auto-completion and catch invalid values during development, offering immediate feedback long before the simulation is run. +* **Single Source of Truth:** All valid categories are centralized. To add a new resource type or metric, a developer only needs to update the `Enum` definition, and the change propagates consistently throughout the validation logic. | Constant Enum | Purpose | | :--- | :--- | -| **`EndpointStepIO`, `EndpointStepCPU`, `EndpointStepRAM`** | Define the exhaustive list of valid `kind` values for a `Step`. | -| **`Metrics`** | Specify the legal dictionary keys within a `Step`'s `step_metrics`, enforcing the one-to-one link between a `kind` and its metric. | -| **`SystemNodes` and `SystemEdges`** | Enumerate the allowed categories for nodes and their connections in the high-level `TopologyGraph`. | - -### **Design Philosophy: A "Micro-to-Macro" Approach** - -The schema is built on a compositional, "micro-to-macro" principle. We start by defining the smallest indivisible units of work (`Step`) and progressively assemble them into larger, more complex structures (`Endpoint`, `Server`, and finally the `TopologyGraph`). - -This layered approach provides several key advantages: -* **Modularity and Reusability:** An `Endpoint` is just a sequence of `Steps`. You can reorder, add, or remove steps without redefining the core operations themselves. -* **Local Reasoning, Global Safety:** Each model is responsible for its own internal consistency (e.g., a `Step` ensures its metric is valid for its kind). Parent models then enforce the integrity of the connections *between* these components (e.g., the `TopologyGraph` ensures all `Edges` connect to valid `Nodes`). -* **Clarity and Maintainability:** The hierarchy makes the system description intuitive to read and write. It’s clear how atomic operations roll up into endpoints, which are hosted on servers connected by a network. -* **Robustness:** All structural and referential errors are caught before the simulation begins, guaranteeing that the SimPy engine operates on a valid, self-consistent model. +| **`EndpointStepIO`, `EndpointStepCPU`, `EndpointStepRAM`** | Defines the exhaustive list of valid `kind` values for a `Step`. | +| **`Metrics`** | Specifies the legal dictionary keys within a `Step`'s `step_metrics`. | +| **`SystemNodes`** | Enumerate the allowed `type` for nodes (e.g., `"server"`, `"client"`). | +| **`SystemEdges`** | Enumerate the allowed categories for connections between nodes. | --- -### **1. The Atomic Unit: `Step`** +### **Schema Hierarchy and In-Depth Validation** -A `Step` represents a single, indivisible operation executed by an asynchronous coroutine within an endpoint. It is the fundamental building block of all work in the simulation. +Here we break down each component of the topology, highlighting the specific Pydantic validators that enforce its correctness and the deep rationale behind these choices. -Each `Step` has a `kind` (the category of work) and `step_metrics` (the resources it consumes). +#### **1. `Step`**: The Atomic Unit of Work +A `Step` represents a single, indivisible operation. Its validation is the cornerstone of ensuring that all work performed in the simulation is logical and well-defined. -```python -class Step(BaseModel): - """ - A single, indivisible operation. - It must be quantified by exactly ONE metric. - """ - kind: EndpointStepIO | EndpointStepCPU | EndpointStepRAM - step_metrics: dict[Metrics, PositiveFloat | PositiveInt] +| Validation Check | Pydantic Hook | Rule & Rationale | +| :--- | :--- | :--- | +| **Coherence of `kind` and `metric`** | `@model_validator` | **Rule:** The `step_metrics` dictionary must contain *exactly one* entry, and its key must be the correct metric for the `Step`'s `kind`.

**Rationale:** This is the most critical validation on a `Step`. The one-to-one mapping is a deliberate design choice for simplicity and robustness. It allows the simulation engine to be deterministic: a `cpu_bound_operation` step is routed to the CPU resource, an `io_wait` step to an I/O event, etc. This avoids the immense complexity of modeling operations that simultaneously contend for multiple resource types (e.g., CPU and RAM). This validator enforces that clear, unambiguous contract, preventing illogical pairings like a RAM allocation step being measured in `cpu_time`. | +| **Positive Metric Values** | `PositiveFloat` / `PositiveInt` | **Rule:** All numeric values in `step_metrics` must be greater than zero.

**Rationale:** It is physically impossible to spend negative or zero time on an operation or allocate negative RAM. This validation uses Pydantic's constrained types to offload this fundamental sanity check, ensuring that only plausible, positive resource requests enter the system and keeping the core simulation logic free of defensive checks against nonsensical data. | - @model_validator(mode="after") - def ensure_coherence_kind_metrics(cls, model: "Step") -> "Step": - metrics_keys = set(model.step_metrics) +#### **2. `Endpoint`**: Composing Workflows +An `Endpoint` defines a complete, user-facing operation (e.g., an API call like `/predict`) as an ordered sequence of `Steps`. - # Enforce that a step performs one and only one type of work. - if len(metrics_keys) != 1: - raise ValueError("step_metrics must contain exactly one entry") +| Validation Check | Pydantic Hook | Rule & Rationale | +| :--- | :--- | :--- | +| **Consistent Naming** | `@field_validator("endpoint_name")` | **Rule:** Automatically converts the `endpoint_name` to lowercase.

**Rationale:** This enforces a canonical representation for all endpoint identifiers. It eliminates ambiguity and potential bugs that could arise from inconsistent capitalization (e.g., treating `/predict` and `/Predict` as different endpoints). This simple normalization makes the configuration more robust and simplifies endpoint lookups within the simulation engine. | - # Enforce that the metric is appropriate for the kind of work. - if isinstance(model.kind, EndpointStepCPU): - if metrics_keys != {Metrics.CPU_TIME}: - raise ValueError(f"CPU step requires metric '{Metrics.CPU_TIME}'") +#### **3. System Nodes**: `Server` & `Client` +These models define the macro-components of your architecture where work is performed and resources are located. - elif isinstance(model.kind, EndpointStepRAM): - if metrics_keys != {Metrics.NECESSARY_RAM}: - raise ValueError(f"RAM step requires metric '{Metrics.NECESSARY_RAM}'") +| Validation Check | Pydantic Hook | Rule & Rationale | +| :--- | :--- | :--- | +| **Standardized Node `type`** | `@field_validator("type")` | **Rule:** The `type` field must strictly match the expected `SystemNodes` enum member (e.g., a `Server` object must have `type: "server"`).

**Rationale:** This provides a "belt-and-suspenders" check. Even if a default is provided, this validation prevents a user from explicitly overriding a node's type to a conflicting value. It enforces a strict contract: a `Server` object is always and only a server. This prevents object state confusion and simplifies pattern matching in the simulation engine. | +| **Unique Node IDs** | `@model_validator` in `TopologyNodes` | **Rule:** All `id` fields across all `Server` nodes and the `Client` node must be unique.

**Rationale:** This is fundamental to creating a valid graph. Node IDs are the primary keys used to address components. If two nodes shared the same ID, any `Edge` pointing to that ID would be ambiguous. This global validator prevents such ambiguity, guaranteeing that every node in the system is uniquely identifiable, which is a precondition for the final referential integrity check. | - elif isinstance(model.kind, EndpointStepIO): - if metrics_keys != {Metrics.IO_WAITING_TIME}: - raise ValueError(f"I/O step requires metric '{Metrics.IO_WAITING_TIME}'") +#### **4. `Edge`**: Connecting the Components +An `Edge` represents a directed network link between two nodes, defining how requests flow through the system. - return model -``` +| Validation Check | Pydantic Hook | Rule & Rationale | +| :--- | :--- | :--- | +| **No Self-Loops** | `@model_validator` | **Rule:** An edge's `source` ID cannot be the same as its `target` ID.

**Rationale:** In the context of a distributed system topology, a network call from a service to itself is a logical anti-pattern. Such an operation would typically be modeled as an internal process (i.e., another `Step`), not a network hop. This validator prevents this common configuration error and simplifies the routing logic by disallowing trivial cycles. | -> **Design Rationale:** The strict one-to-one mapping between a `Step` and a single metric is a core design choice. It simplifies the simulation engine immensely, as each `Step` can be deterministically routed to a request on a single SimPy resource (a CPU queue, a RAM container, or an I/O event). This avoids the complexity of modeling operations that simultaneously consume multiple resource types. +#### **5. `TopologyGraph`**: The Complete System +This is the root model that aggregates all `nodes` and `edges` and performs the final, most critical validation: ensuring referential integrity. ---- +| Validation Check | Pydantic Hook | Rule & Rationale | +| :--- | :--- | :--- | +| **Referential Integrity** | `@model_validator` | **Rule:** Every `edge.source` and `edge.target` ID must correspond to an actual node ID defined in `TopologyNodes`.

**Rationale:** This is the capstone validation that guarantees the structural integrity of the entire system graph. It prevents "dangling edges"—connections that point to non-existent nodes. Without this check, the simulation could start with a broken topology and crash unexpectedly at runtime when a request attempts to traverse a broken link. By performing this check *after* all nodes and edges have been parsed, we ensure that the system described is a complete and validly connected graph, fully embodying the "fail-fast" principle. | -### **2. Composing Workflows: `Endpoint`** +### **3. Component: Global Simulation Control (`SimulationSettings`)** -An `Endpoint` defines a complete, user-facing operation (e.g., an API call like `/predict`) as an ordered sequence of `Steps`. +This final component configures the simulation's execution parameters and, critically, determines what data is collected. It acts as the master control panel for the simulation run, governing both its duration and the scope of its output. + +#### **Payload Structure (`SimulationSettings`)** ```python -class Endpoint(BaseModel): - """A higher-level API call, executed as a strict sequence of steps.""" - endpoint_name: str - steps: list[Step] - - @field_validator("endpoint_name", mode="before") - def name_to_lower(cls, v: str) -> str: - """Standardize endpoint name to be lowercase for consistency.""" - return v.lower() +class SimulationSettings(BaseModel): + """Global parameters that apply to the whole run.""" + total_simulation_time: int = Field(...) + enabled_sample_metrics: set[SampledMetricName] = Field( + default_factory=lambda: { + SampledMetricName.READY_QUEUE_LEN, + SampledMetricName.CORE_BUSY, + SampledMetricName.RAM_IN_USE, + }, + description="Which time‑series KPIs to collect by default.", + ) + enabled_event_metrics: set[EventMetricName] = Field( + default_factory=lambda: { + EventMetricName.RQS_LATENCY, + }, + description="Which per‑event KPIs to collect by default.", + ) ``` -> **Design Rationale:** The simulation processes the `steps` list in the exact order provided. The total latency and resource consumption of an endpoint call is the sequential sum of its individual `Step` delays. This directly models the execution flow of a typical web request handler. +| Field | Type | Purpose & Validation | +| :--- | :--- | :--- | +| `total_simulation_time` | `int` | The total simulation horizon in seconds. Must be `>= MIN_SIMULATION_TIME` (1800s). Defaults to `3600`. | +| `enabled_sample_metrics` | `set[SampledMetricName]` | A set of metrics to be sampled at fixed intervals, creating a time-series (e.g., `"ready_queue_len"`, `"ram_in_use"`). | +| `enabled_event_metrics` | `set[EventMetricName]` | A set of metrics recorded only when specific events occur, with no time-series (e.g., `"rqs_latency"`, `"llm_cost"`). | +We add standard default value for the metrics in case they will be omitted --- -### **3. Defining Components: System Nodes** +#### **Design Rationale: Pre-validated, On-Demand Metrics for Robust and Efficient Collection** -Nodes are the macro-components of your architecture where work is performed and resources are located. +The design of the `settings` component, particularly the `enabled_*_metrics` fields, is centered on two core principles: **user-driven selectivity** and **ironclad validation**. The rationale behind this approach is to create a system that is both flexible and fundamentally reliable. -#### **`ServerResources` and `Server`** -A `Server` node hosts endpoints and owns a set of physical resources. These resources are mapped directly to specific SimPy primitives, which govern how requests queue and contend for service. +##### **1. The Principle of User-Driven Selectivity** -```python -class ServerResources(BaseModel): - """Quantifiable resources available on a server node.""" - cpu_cores: PositiveInt = Field(ge=ServerResourcesDefaults.MINIMUM_CPU_CORES) - ram_mb: PositiveInt = Field(ge=ServerResourcesDefaults.MINIMUM_RAM_MB) - db_connection_pool: PositiveInt | None = None - -class Server(BaseModel): - """A node that hosts endpoints and owns resources.""" - id: str - type: SystemNodes = SystemNodes.SERVER - server_resources: ServerResources - endpoints: list[Endpoint] -``` +We recognize that data collection is not free; it incurs performance overhead in terms of both memory (to store the data) and CPU cycles (to record it). Not every simulation requires every possible metric. For instance: +* A simulation focused on CPU contention may not need detailed LLM cost tracking. +* A high-level analysis of end-to-end latency might not require fine-grained data on event loop queue lengths. -> **Design Rationale: Mapping to SimPy Primitives** -> * `cpu_cores` maps to a `simpy.Resource`. This models a classic semaphore where only `N` processes can execute concurrently, and others must wait in a queue. It perfectly represents CPU-bound tasks competing for a limited number of cores. -> * `ram_mb` maps to a `simpy.Container`. A container models a divisible resource where processes can request and return variable amounts. This is ideal for memory, as multiple requests can simultaneously hold different amounts of RAM without exclusively locking the entire memory pool. +By allowing the user to explicitly select only the metrics they need, we empower them to tailor the simulation to their specific analytical goals. This on-demand approach makes the simulator more efficient and versatile, avoiding the waste of collecting and processing irrelevant data. -#### **`Client`** -The `Client` is a special, resource-less node that serves as the origin point for all requests generated during the simulation. +##### **2. The Power of Ironclad, Upfront Validation** -#### **Node Aggregation and Validation (`TopologyNodes`)** -All `Server` and `Client` nodes are collected in the `TopologyNodes` model, which performs a critical validation check: ensuring all component IDs are unique across the entire system. +This is where the design choice becomes critical for robustness. Simply allowing users to provide a list of strings is inherently risky due to potential typos or misunderstandings of metric names. Our schema mitigates this risk entirely through a strict, upfront validation contract. ---- +* **A Strict Contract via Enums:** The `enabled_sample_metrics` and `enabled_event_metrics` fields are not just sets of strings; they are sets of `SampledMetricName` and `EventMetricName` enum members. When Pydantic parses the input payload, it validates every single metric name provided by the user against these canonical `Enum` definitions. -### **4. Connecting the Components: `Edge`** +* **Immediate Rejection of Invalid Input:** If a user provides a metric name that is not a valid member of the corresponding enum (e.g., a typo like `"request_latncy"` or a misunderstanding like `"cpu_usage"` instead of `"core_busy"`), Pydantic immediately rejects the entire payload with a clear `ValidationError`. This happens *before* a single line of the simulation engine code is executed. -An `Edge` represents a directed network link between two nodes, defining how requests flow through the system. +##### **3. The Benefit: Guaranteed Runtime Integrity** -```python -class Edge(BaseModel): - """A directed connection in the topology graph.""" - source: str - target: str - latency: RVConfig - probability: float = Field(1.0, ge=0.0, le=1.0) - edge_type: SystemEdges = SystemEdges.NETWORK_CONNECTION -``` +This pre-validation provides a crucial and powerful guarantee to the simulation engine, leading to a safer and more efficient runtime: -> **Design Rationale:** -> * **Stochastic Latency:** Latency is not a fixed number but an `RVConfig` object. This allows you to model realistic network conditions using various probability distributions (e.g., log-normal for internet RTTs, exponential for failure retries), making the simulation far more accurate. -> * **Probabilistic Routing:** The `probability` field enables modeling of simple load balancing or A/B testing scenarios where traffic from a single `source` can be split across multiple `target` nodes. +* **Safe, Error-Free Initialization:** At the very beginning of the simulation, the engine receives the *validated* set of metric names. It knows with absolute certainty the complete and exact set of metrics it needs to track. This allows it to safely initialize all necessary data collection structures (like dictionaries) at the start of the run. For example: + ```python + # This is safe because every key is guaranteed to be valid. + event_results = {metric_name: [] for metric_name in settings.enabled_event_metrics} + ``` ---- - -### **5. The Complete System: `TopologyGraph`** - -The `TopologyGraph` is the root of the configuration. It aggregates all `nodes` and `edges` and performs the final, most critical validation: ensuring referential integrity. - -```python -class TopologyGraph(BaseModel): - """The complete system definition, uniting all nodes and edges.""" - nodes: TopologyNodes - edges: list[Edge] - - @model_validator(mode="after") - def edge_refs_valid(cls, model: "TopologyGraph") -> "TopologyGraph": - """Ensure every edge connects two valid, existing nodes.""" - valid_ids = {s.id for s in model.nodes.servers} | {model.nodes.client.id} - for e in model.edges: - if e.source not in valid_ids or e.target not in valid_ids: - raise ValueError(f"Edge '{e.source}->{e.target}' references an unknown node.") - return model -``` -> **Design Rationale:** This final check guarantees that the topology is a valid, connected graph. By confirming that every `edge.source` and `edge.target` corresponds to a defined node `id`, it prevents the simulation from starting with a broken or nonsensical configuration, embodying the "fail-fast" principle. +* **Elimination of Runtime KeyErrors:** Because all dictionary keys are guaranteed to exist from the start, the core data collection logic within the simulation's tight event loop becomes incredibly lean and robust. The engine never needs to perform defensive, conditional checks like `if metric_name in event_results: ...`. It can directly and safely access the key: `event_results[metric_name].append(value)`. This completely eliminates an entire class of potential `KeyError` exceptions, which are notoriously difficult to debug in complex, asynchronous simulations. +In summary, the design of `SimulationSettings` is a perfect example of the "fail-fast" philosophy. By forcing a clear and validated contract with the user upfront, we ensure that the data collection process is not only tailored and efficient but also fundamentally reliable. The engine operates with the confidence that the output data structures will perfectly and safely match the user's validated request, leading to a predictable and robust simulation from start to finish. --- -### **End-to-End Example** +### **End-to-End Example (`SimulationPayload`)** -Here is a minimal, complete JSON configuration that defines a single client and a single API server. +The following JSON object shows how these three components combine into a single, complete `SimulationPayload` for a minimal client-server setup. ```jsonc { - "nodes": { - // The client node is the source of all generated requests. - "client": { - "id": "user_browser", - "type": "client" + // Defines the traffic workload profile. + "rqs_input": { + "avg_active_users": { + "mean": 50, + "distribution": "poisson" + }, + "avg_request_per_minute_per_user": { + "mean": 5.0, + "distribution": "normal", + "variance": 1.0 }, - // A list of all server nodes in the system. - "servers": [ + "user_sampling_window": 60 + }, + // Describes the system's architectural blueprint. + "topology_graph": { + "nodes": { + "client": { + "id": "mobile_client", + "type": "client" + }, + "servers": [ + { + "id": "api_server", + "type": "server", + "server_resources": { + "cpu_cores": 4, + "ram_mb": 4096 + }, + "endpoints": [ + { + "endpoint_name": "/predict", + "steps": [ + { + "kind": "initial_parsing", + "step_metrics": { "cpu_time": 0.005 } + }, + { + "kind": "io_db", + "step_metrics": { "io_waiting_time": 0.050 } + } + ] + } + ] + } + ] + }, + "edges": [ { - "id": "api_server_node", - "type": "server", - "server_resources": { - "cpu_cores": 2, - "ram_mb": 2048 - }, - "endpoints": [ - { - "endpoint_name": "/predict", - "steps": [ - { - "kind": "initial_parsing", - "step_metrics": { "cpu_time": 0.005 } - }, - { - "kind": "io_db", - "step_metrics": { "io_waiting_time": 0.050 } - }, - { - "kind": "cpu_bound_operation", - "step_metrics": { "cpu_time": 0.015 } - } - ] - } - ] + "source": "mobile_client", + "target": "api_server", + "latency": { + "distribution": "log_normal", + "mean": 0.04, + "variance": 0.01 + } } ] }, - "edges": [ - // A network link from the client to the API server. - { - "source": "user_browser", - "target": "api_server_node", - "latency": { - "distribution": "log_normal", - "mean": 0.05, - "std_dev": 0.01 - }, - "probability": 1.0 - } - ] -}``` - - + // Configures the simulation run and metric collection. + "settings": { + "total_simulation_time": 3600, + "enabled_sample_metrics": [ + "ready_queue_len", + "ram_in_use", + "throughput_rps" + ], + "enabled_event_metrics": [ + "rqs_latency" + ] + } +} +``` +### **Key Takeaways** -> **YAML friendly:** -> The topology schema is 100 % agnostic to the wire format. -> You can encode the same structure in **YAML** with identical field -> names and value types—Pydantic will parse either JSON or YAML as long -> as the keys and data types respect the schema. -> No additional changes or converters are required. -``` +* **Single Source of Truth**: `Enum` classes centralize all valid string literals, providing robust, type-safe validation across the entire schema. +* **Layered Validation**: The `Constants → Component Schemas → SimulationPayload` hierarchy ensures that only well-formed and self-consistent configurations reach the simulation engine. +* **Separation of Concerns**: The three top-level keys (`rqs_input`, `topology_graph`, `settings`) clearly separate the workload, the system architecture, and simulation control, making configurations easier to read, write, and reuse. -### **Key Takeaway** -This rigorously validated, compositional schema is the foundation of FastSim's reliability. By defining a clear vocabulary of constants (`Metrics`, `SystemNodes`) and enforcing relationships with Pydantic validators, the schema guarantees that every simulation run starts from a **complete and self-consistent** system description. This allows you to refactor simulation logic or extend the model with new resources (e.g., GPU memory) with full confidence that existing configurations remain valid and robust. \ No newline at end of file diff --git a/documentation/backend_documentation/metrics_to_measure.md b/documentation/backend_documentation/metrics_to_measure.md new file mode 100644 index 0000000..4de1a1a --- /dev/null +++ b/documentation/backend_documentation/metrics_to_measure.md @@ -0,0 +1,49 @@ +### **FastSim — simulation's metrics** + +Metrics are the lifeblood of any simulation, transforming a series of abstract events into concrete, actionable insights about system performance, resource utilization, and potential bottlenecks. FastSim provides a flexible and robust metrics collection system designed to give you a multi-faceted view of your system's behavior under load. + +To achieve this, FastSim categorizes metrics into three distinct types based on their collection methodology: + +1. **Sampled Metrics (`SampledMetricName`):** These metrics provide a **time-series view** of the system's state. They are captured at fixed, regular intervals throughout the simulation's duration (e.g., every second). This methodology is ideal for understanding trends, observing oscillations, and measuring the continuous utilization of finite resources like CPU and RAM. Think of them as periodic snapshots of your system's health. + +2. **Event-based Metrics (`EventMetricName`):** These metrics are recorded **only when a specific event occurs**. Their collection is asynchronous and irregular, triggered by discrete happenings within the simulation, such as the completion of a request. This methodology is perfect for measuring the properties of individual transactions, such as end-to-end latency, where an average value is less important than understanding the full distribution of outcomes. + +3. **Aggregated Metrics (`AggregatedMetricName`):** These are not collected directly during the simulation but are **calculated after the simulation ends**. They provide high-level statistical summaries (like mean, median, and percentiles) derived from the raw data collected by Event-based metrics. They distill thousands of individual data points into a handful of key performance indicators (KPIs) that are easy to interpret. + +The following sections provide a detailed breakdown of each metric within these categories, explaining what they measure and the rationale for their importance. + +--- + +### **1. Sampled Metrics: A Time-Series Perspective** + +Sampled metrics are configured in the `SimulationSettings` payload. Enabling them allows you to plot the evolution of system resources over time, which is crucial for identifying saturation points and transient performance issues. + +| Metric Name (`SampledMetricName`) | Description & Rationale | +| :--- | :--- | +| **`READY_QUEUE_LEN`** | **What it is:** The number of tasks in the `asyncio` event loop's "ready" queue waiting for their turn to run on the CPU.

**Rationale:** This is arguably the most critical indicator of **CPU saturation**. In a single-threaded Python process, only one coroutine can run at a time (held by the GIL). If this queue length is consistently greater than zero, it means tasks are ready to do work but are forced to wait because the CPU is busy. A long or growing queue is a definitive sign that your application is CPU-bound and that the CPU is a primary bottleneck. | +| **`CORE_BUSY`** | **What it is:** The number of server CPU cores that are currently executing a task.

**Rationale:** This provides a direct measure of **CPU utilization**. When plotted over time, it shows how effectively you are using your provisioned processing power. If `CORE_BUSY` is consistently at its maximum value (equal to `server_resources.cpu_cores`), the system is CPU-saturated. Conversely, if it's consistently low while latency is high, the bottleneck is likely elsewhere (e.g., I/O). It perfectly complements `READY_QUEUE_LEN` to form a complete picture of CPU health. | +| **`EVENT_LOOP_IO_SLEEP`** | **What it is:** A measure indicating if the event loop is idle, polling for I/O operations to complete.

**Rationale:** This metric helps you determine if your system is **I/O-bound**. If the event loop spends a significant amount of time in this state, it means the CPU is underutilized because it has no ready tasks to run and is instead waiting for external systems (like databases, caches, or downstream APIs) to respond. High values for this metric coupled with low CPU utilization are a clear signal to investigate and optimize the performance of your I/O operations. | +| **`RAM_IN_USE`** | **What it is:** The total amount of memory (in MB) currently allocated by all active requests within a server.

**Rationale:** Essential for **capacity planning and stability analysis**. This metric allows you to visualize your system's memory footprint under load. You can identify which endpoints cause memory spikes and ensure your provisioned RAM is sufficient. A steadily increasing `RAM_IN_USE` value that never returns to a baseline is the classic signature of a **memory leak**, a critical bug this metric helps you detect. | +| **`THROUGHPUT_RPS`** | **What it is:** The number of requests successfully completed per second, calculated over the last sampling window.

**Rationale:** This is a fundamental measure of **system performance and capacity**. It answers the question: "How much work is my system actually doing?" Plotting throughput against user load or other resource metrics is key to understanding your system's scaling characteristics. A drop in throughput often correlates with a spike in latency or resource saturation, helping you pinpoint the exact moment a bottleneck began to affect performance. | + +--- + +### **2. Event-based Metrics: A Per-Transaction Perspective** + +Event-based metrics are also enabled in the `SimulationSettings` payload. They generate a collection of raw data points, one for each relevant event, which is ideal for statistical analysis of transactional performance. + +| Metric Name (`EventMetricName`) | Description & Rationale | +| :--- | :--- | +| **`RQS_LATENCY`** | **What it is:** The total end-to-end duration, in seconds, for a single request to be fully processed.

**Rationale:** This is the **primary user-facing performance metric**. Users directly experience latency. While a simple average can be useful, it often hides critical problems. By collecting the latency for *every single request*, FastSim allows for the calculation of statistical distributions and, most importantly, **tail-latency percentiles (p95, p99)**. These percentiles represent the worst-case experience for your users and are crucial for evaluating Service Level Objectives (SLOs) and ensuring a consistent user experience. | +| **`LLM_COST`** | **What it is:** The estimated monetary cost (e.g., in USD) incurred by a single call to an external Large Language Model (LLM) API during a request.

**Rationale:** In modern AI-powered applications, API calls to third-party services like LLMs can be a major operational expense. This metric moves beyond technical performance to measure **financial performance**. By tracking cost on a per-event basis, you can attribute expenses to specific endpoints or user behaviors, identify unnecessarily costly operations, and make informed decisions to optimize your application's cost-effectiveness. | + +--- + +### **3. Aggregated Metrics: High-Level Summaries** + +**Important:** Aggregated metrics are **not configured in the input payload**. They are automatically calculated by the FastSim engine at the end of a simulation run, based on the raw data collected from the enabled Event-based metrics. + +| Metric Name (`AggregatedMetricName`) | Description & Rationale | +| :--- | :--- | +| **`LATENCY_STATS`** | **What it is:** A statistical summary of the entire collection of `RQS_LATENCY` data points. This typically includes the mean, median (p50), standard deviation, and high-end percentiles (p95, p99, p99.9).

**Rationale:** This provides a comprehensive and easily digestible summary of your system's latency profile. While the raw data is essential, these summary statistics answer high-level questions quickly. The mean tells you the average experience, the median protects against outliers, and the p95/p99 values tell you the latency that 95% or 99% of your users will beat—a critical KPI for reliability and user satisfaction. | +| **`LLM_STATS`** | **What it is:** A statistical summary of the `LLM_COST` data points. This can include total cost over the simulation, average cost per request, and cost distribution.

**Rationale:** This gives you a bird's-eye view of the financial implications of your system's design. Instead of looking at individual transaction costs, `LLM_STATS` provides the bottom line: the total operational cost during the simulation period. This is invaluable for budgeting, forecasting, and validating the financial viability of new features. | \ No newline at end of file diff --git a/src/app/api/simulation.py b/src/app/api/simulation.py index e025ad6..73984d5 100644 --- a/src/app/api/simulation.py +++ b/src/app/api/simulation.py @@ -4,13 +4,13 @@ from fastapi import APIRouter from app.core.simulation.simulation_run import run_simulation -from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.full_simulation_input import SimulationPayload from app.schemas.simulation_output import SimulationOutput router = APIRouter() @router.post("/simulation") -async def event_loop_simulation(input_data: RqsGeneratorInput) -> SimulationOutput: +async def event_loop_simulation(input_data: SimulationPayload) -> SimulationOutput: """Run the simulation and return aggregate KPIs.""" rng = np.random.default_rng() return run_simulation(input_data, rng=rng) diff --git a/src/app/config/constants.py b/src/app/config/constants.py index 23c7936..498e8b7 100644 --- a/src/app/config/constants.py +++ b/src/app/config/constants.py @@ -182,3 +182,43 @@ class SystemEdges(StrEnum): """ NETWORK_CONNECTION = "network_connection" + +# ====================================================================== +# CONSTANTS FOR SAMPLED METRICS +# ====================================================================== + +class SampledMetricName(StrEnum): + """ + define the metrics sampled every fixed amount of + time to create a time series + """ + + READY_QUEUE_LEN = "ready_queue_len" #length of the event loop ready q + CORE_BUSY = "core_busy" + EVENT_LOOP_IO_SLEEP = "event_loop_io_sleep" + RAM_IN_USE = "ram_in_use" + THROUGHPUT_RPS = "throughput_rps" + +# ====================================================================== +# CONSTANTS FOR EVENT METRICS +# ====================================================================== + +class EventMetricName(StrEnum): + """ + define the metrics triggered by event with no + time series + """ + + RQS_LATENCY = "rqs_latency" + LLM_COST = "llm_cost" + + +# ====================================================================== +# CONSTANTS FOR AGGREGATED METRICS +# ====================================================================== + +class AggregatedMetricName(StrEnum): + """aggregated metrics to calculate at the end of simulation""" + + LATENCY_STATS = "latency_stats" + LLM_STATS = "llm_stats" diff --git a/src/app/core/event_samplers/gaussian_poisson.py b/src/app/core/event_samplers/gaussian_poisson.py index 9c83b4f..0b7818c 100644 --- a/src/app/core/event_samplers/gaussian_poisson.py +++ b/src/app/core/event_samplers/gaussian_poisson.py @@ -17,10 +17,12 @@ uniform_variable_generator, ) from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.simulation_settings_input import SimulationSettings def gaussian_poisson_sampling( input_data: RqsGeneratorInput, + sim_settings: SimulationSettings, *, rng: np.random.Generator | None = None, ) -> Generator[float, None, None]: @@ -35,11 +37,11 @@ def gaussian_poisson_sampling( Λ = U * (mean_req_per_minute_per_user / 60) [req/s]. 3. While inside the current window, draw gaps Δt ~ Exponential(Λ) using inverse-CDF. - 4. Stop once the virtual clock exceeds *simulation_time*. + 4. Stop once the virtual clock exceeds *total_simulation_time*. """ rng = rng or np.random.default_rng() - simulation_time = input_data.total_simulation_time + simulation_time = sim_settings.total_simulation_time user_sampling_window = input_data.user_sampling_window # λ_u : mean concurrent users per window diff --git a/src/app/core/event_samplers/poisson_poisson.py b/src/app/core/event_samplers/poisson_poisson.py index ebb1970..1566f90 100644 --- a/src/app/core/event_samplers/poisson_poisson.py +++ b/src/app/core/event_samplers/poisson_poisson.py @@ -14,10 +14,12 @@ uniform_variable_generator, ) from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.simulation_settings_input import SimulationSettings def poisson_poisson_sampling( input_data: RqsGeneratorInput, + sim_settings: SimulationSettings, *, rng: np.random.Generator | None = None, ) -> Generator[float, None, None]: @@ -32,11 +34,11 @@ def poisson_poisson_sampling( Λ = U * (mean_req_per_minute_per_user / 60) [req/s]. 3. While inside the current window, draw gaps Δt ~ Exponential(Λ) using inverse-CDF. - 4. Stop once the virtual clock exceeds *simulation_time*. + 4. Stop once the virtual clock exceeds *total_simulation_time*. """ rng = rng or np.random.default_rng() - simulation_time = input_data.total_simulation_time + simulation_time = sim_settings.total_simulation_time user_sampling_window = input_data.user_sampling_window # λ_u : mean concurrent users per window diff --git a/src/app/core/helpers.py b/src/app/core/helpers.py new file mode 100644 index 0000000..189af47 --- /dev/null +++ b/src/app/core/helpers.py @@ -0,0 +1,38 @@ +"""helpers for the simulation""" + +from collections.abc import Iterable + +from app.config.constants import EventMetricName, SampledMetricName + + +def alloc_sample_metric( + enabled_sample_metrics: Iterable[SampledMetricName], + ) -> dict[str, list[float | int]]: + """ + After the pydantic validation of the whole input we + instantiate a dictionary to collect the sampled metrics the + user want to measure + """ + # t is the alignment parameter for example assume + # the snapshot for the sampled metrics are done every 10ms + # t = [10,20,30,40....] to each t will correspond a measured + # metric corresponding to that time interval + + dict_sampled_metrics: dict[str, list[float | int]] = {"t": []} + for key in enabled_sample_metrics: + dict_sampled_metrics[key] = [] + return dict_sampled_metrics + + +def alloc_event_metric( + enabled_event_metrics: Iterable[EventMetricName], + ) -> dict[str, list[float | int]]: + """ + After the pydantic validation of the whole input we + instantiate a dictionary to collect the event metrics the + user want to measure + """ + dict_event_metrics: dict[str, list[float | int]] = {} + for key in enabled_event_metrics: + dict_event_metrics[key] = [] + return dict_event_metrics diff --git a/src/app/core/simulation/requests_generator.py b/src/app/core/simulation/requests_generator.py index 810218f..1b177e7 100644 --- a/src/app/core/simulation/requests_generator.py +++ b/src/app/core/simulation/requests_generator.py @@ -17,10 +17,12 @@ import numpy as np from app.schemas.requests_generator_input import RqsGeneratorInput + from app.schemas.simulation_settings_input import SimulationSettings def requests_generator( input_data: RqsGeneratorInput, + sim_settings: SimulationSettings, *, rng: np.random.Generator | None = None, ) -> Generator[float, None, None]: @@ -41,6 +43,7 @@ def requests_generator( #Gaussian-Poisson model return gaussian_poisson_sampling( input_data=input_data, + sim_settings=sim_settings, rng=rng, ) @@ -48,5 +51,6 @@ def requests_generator( # Poisson + Poisson return poisson_poisson_sampling( input_data=input_data, + sim_settings=sim_settings, rng=rng, ) diff --git a/src/app/core/simulation/simulation_run.py b/src/app/core/simulation/simulation_run.py index 3e1672a..1aea997 100644 --- a/src/app/core/simulation/simulation_run.py +++ b/src/app/core/simulation/simulation_run.py @@ -14,28 +14,32 @@ import numpy as np - from app.schemas.requests_generator_input import RqsGeneratorInput + from app.schemas.full_simulation_input import SimulationPayload + + def run_simulation( - input_data: RqsGeneratorInput, + input_data: SimulationPayload, *, rng: np.random.Generator, ) -> SimulationOutput: """Simulation executor in Simpy""" - gaps: Generator[float, None, None] = requests_generator(input_data, rng=rng) + sim_settings = input_data.sim_settings + + requests_generator_input = input_data.rqs_input + + gaps: Generator[float, None, None] = requests_generator( + requests_generator_input, + sim_settings, + rng=rng) env = simpy.Environment() - simulation_time = input_data.total_simulation_time - # pydantic in the validation assign a value and mypy is not - # complaining because a None cannot be compared in the loop - # to a float - assert simulation_time is not None total_request_per_time_period = { - "simulation_time": simulation_time, + "simulation_time": sim_settings.total_simulation_time, "total_requests": 0, } @@ -47,10 +51,10 @@ def arrival_process( total_request_per_time_period["total_requests"] += 1 env.process(arrival_process(env)) - env.run(until=simulation_time) + env.run(until=sim_settings.total_simulation_time) return SimulationOutput( total_requests=total_request_per_time_period, - metric_2=str(input_data.avg_request_per_minute_per_user.mean), - metric_n=str(input_data.avg_active_users.mean), + metric_2=str(requests_generator_input.avg_request_per_minute_per_user.mean), + metric_n=str(requests_generator_input.avg_active_users.mean), ) diff --git a/src/app/schemas/full_simulation_input.py b/src/app/schemas/full_simulation_input.py index 0e1d7a9..b745fee 100644 --- a/src/app/schemas/full_simulation_input.py +++ b/src/app/schemas/full_simulation_input.py @@ -3,6 +3,7 @@ from pydantic import BaseModel from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.simulation_settings_input import SimulationSettings from app.schemas.system_topology_schema.full_system_topology_schema import TopologyGraph @@ -11,3 +12,4 @@ class SimulationPayload(BaseModel): rqs_input: RqsGeneratorInput topology_graph: TopologyGraph + sim_settings: SimulationSettings diff --git a/src/app/schemas/requests_generator_input.py b/src/app/schemas/requests_generator_input.py index 88812e0..f56a4f0 100644 --- a/src/app/schemas/requests_generator_input.py +++ b/src/app/schemas/requests_generator_input.py @@ -12,13 +12,6 @@ class RqsGeneratorInput(BaseModel): avg_active_users: RVConfig avg_request_per_minute_per_user: RVConfig - total_simulation_time: int = Field( - default=TimeDefaults.SIMULATION_TIME, - ge=TimeDefaults.MIN_SIMULATION_TIME, - description=( - f"Simulation time in seconds (>= {TimeDefaults.MIN_SIMULATION_TIME})." - ), - ) user_sampling_window: int = Field( default=TimeDefaults.USER_SAMPLING_WINDOW, diff --git a/src/app/schemas/simulation_settings_input.py b/src/app/schemas/simulation_settings_input.py new file mode 100644 index 0000000..5d0ac6a --- /dev/null +++ b/src/app/schemas/simulation_settings_input.py @@ -0,0 +1,31 @@ +"""define a class with the global settings for the simulation""" + +from pydantic import BaseModel, Field + +from app.config.constants import EventMetricName, SampledMetricName, TimeDefaults + + +class SimulationSettings(BaseModel): + """Global parameters that apply to the whole run.""" + + total_simulation_time: int = Field( + default=TimeDefaults.SIMULATION_TIME, + ge=TimeDefaults.MIN_SIMULATION_TIME, + description="Simulation horizon in seconds.", + ) + + enabled_sample_metrics: set[SampledMetricName] = Field( + default_factory=lambda: { + SampledMetricName.READY_QUEUE_LEN, + SampledMetricName.CORE_BUSY, + SampledMetricName.RAM_IN_USE, + }, + description="Which time-series KPIs to collect by default.", + ) + enabled_event_metrics: set[EventMetricName] = Field( + default_factory=lambda: { + EventMetricName.RQS_LATENCY, + }, + description="Which per-event KPIs to collect by default.", + ) + diff --git a/tests/conftest.py b/tests/conftest.py index 5fcd4bc..e6d7b75 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -10,6 +10,8 @@ from alembic.config import Config from dotenv import load_dotenv from fastapi.testclient import TestClient +from numpy.random import Generator as NpGenerator +from numpy.random import default_rng from sqlalchemy.ext.asyncio import ( AsyncEngine, AsyncSession, @@ -18,9 +20,23 @@ ) from sqlalchemy_utils import create_database, database_exists, drop_database +from app.config.constants import ( + EventMetricName, + SampledMetricName, + TimeDefaults, +) from app.config.settings import settings from app.db.session import get_db from app.main import app +from app.schemas.full_simulation_input import SimulationPayload +from app.schemas.random_variables_config import RVConfig +from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.simulation_settings_input import SimulationSettings +from app.schemas.system_topology_schema.full_system_topology_schema import ( + Client, + TopologyGraph, + TopologyNodes, +) # Load test environment variables from .env.test ENV_PATH = Path(__file__).resolve().parents[1] / "docker_fs" / ".env.test" @@ -121,3 +137,102 @@ async def db_session(async_engine: AsyncEngine) -> AsyncGenerator[AsyncSession, await transaction.rollback() # Close the connection await connection.close() + +# ============================================================================ +# STANDARD CONFIGURATION FOR INPUT VARIABLES +# ============================================================================ + +# --------------------------------------------------------------------------- +# RNG +# --------------------------------------------------------------------------- + + +@pytest.fixture(scope="session") +def rng() -> NpGenerator: + """Deterministic NumPy RNG shared across tests (seed=0).""" + return default_rng(0) + + +# --------------------------------------------------------------------------- +# Metrics sets +# --------------------------------------------------------------------------- + + +@pytest.fixture(scope="session") +def enabled_sample_metrics() -> set[SampledMetricName]: + """Default sample-level KPIs tracked in most tests.""" + return { + SampledMetricName.READY_QUEUE_LEN, + SampledMetricName.RAM_IN_USE, + } + + +@pytest.fixture(scope="session") +def enabled_event_metrics() -> set[EventMetricName]: + """Default event-level KPIs tracked in most tests.""" + return {EventMetricName.RQS_LATENCY} + + +# --------------------------------------------------------------------------- +# Global simulation settings +# --------------------------------------------------------------------------- + + +@pytest.fixture +def sim_settings( + enabled_sample_metrics: set[SampledMetricName], + enabled_event_metrics: set[EventMetricName], +) -> SimulationSettings: + """A minimal `SimulationSettings` instance for unit tests.""" + return SimulationSettings( + total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, + enabled_sample_metrics=enabled_sample_metrics, + enabled_event_metrics=enabled_event_metrics, + ) + + +# --------------------------------------------------------------------------- +# Traffic profile +# --------------------------------------------------------------------------- + + +@pytest.fixture +def rqs_input() -> RqsGeneratorInput: + """`RqsGeneratorInput` with 1 user and 2 req/min for quick tests.""" + return RqsGeneratorInput( + avg_active_users=RVConfig(mean=1.0), + avg_request_per_minute_per_user=RVConfig(mean=2.0), + user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, + ) + + +# --------------------------------------------------------------------------- +# Minimal topology (one client, no servers, no edges) +# --------------------------------------------------------------------------- + + +@pytest.fixture +def topology_minimal() -> TopologyGraph: + """Valid topology with a single client and zero servers/edges.""" + client = Client(id="client-1") + nodes = TopologyNodes(servers=[], client=client) + return TopologyGraph(nodes=nodes, edges=[]) + + +# --------------------------------------------------------------------------- +# Full simulation payload +# --------------------------------------------------------------------------- + + +@pytest.fixture +def payload_base( + rqs_input: RqsGeneratorInput, + sim_settings: SimulationSettings, + topology_minimal: TopologyGraph, +) -> SimulationPayload: + """End-to-end payload used by high-level simulation tests.""" + return SimulationPayload( + rqs_input=rqs_input, + topology_graph=topology_minimal, + sim_settings=sim_settings, + ) diff --git a/tests/unit/input_sructure/test_requests_generator_input.py b/tests/unit/input_sructure/test_requests_generator_input.py index 9fb49a5..39a0b2d 100644 --- a/tests/unit/input_sructure/test_requests_generator_input.py +++ b/tests/unit/input_sructure/test_requests_generator_input.py @@ -1,90 +1,82 @@ +"""Validation tests for RVConfig, RqsGeneratorInput and SimulationSettings.""" + +from __future__ import annotations + import pytest from pydantic import ValidationError from app.config.constants import Distribution, TimeDefaults from app.schemas.random_variables_config import RVConfig from app.schemas.requests_generator_input import RqsGeneratorInput +from app.schemas.simulation_settings_input import SimulationSettings + +# --------------------------------------------------------------------------- +# RVCONFIG +# --------------------------------------------------------------------------- -# -------------------------------------------------------------------------- -# TEST RANDOM VARIABLE CONFIGURATION -# -------------------------------------------------------------------------- def test_normal_sets_variance_to_mean() -> None: - """When distribution='normal' and variance is omitted, variance == mean.""" + """If variance is omitted with 'normal', it defaults to mean.""" cfg = RVConfig(mean=10, distribution=Distribution.NORMAL) assert cfg.variance == 10.0 def test_poisson_keeps_variance_none() -> None: - """When distribution='poisson' and variance is omitted, variance stays None.""" + """If variance is omitted with 'poisson', it remains None.""" cfg = RVConfig(mean=5, distribution=Distribution.POISSON) assert cfg.variance is None def test_explicit_variance_is_preserved() -> None: - """If the user supplies variance explicitly, it is preserved unchanged.""" + """An explicit variance value is not modified.""" cfg = RVConfig(mean=8, distribution=Distribution.NORMAL, variance=4) assert cfg.variance == 4.0 def test_mean_must_be_numeric() -> None: - """A non-numeric mean raises a ValidationError with our custom message.""" - with pytest.raises(ValidationError) as excinfo: + """A non numeric mean triggers a ValidationError.""" + with pytest.raises(ValidationError) as exc: RVConfig(mean="not a number", distribution=Distribution.POISSON) - # Check that at least one error refers to the 'mean' field - assert any(err["loc"] == ("mean",) for err in excinfo.value.errors()) - assert "mean must be a number" in excinfo.value.errors()[0]["msg"] + assert any(err["loc"] == ("mean",) for err in exc.value.errors()) def test_missing_mean_field() -> None: - """Omitting the mean field raises a 'field required' ValidationError.""" - with pytest.raises(ValidationError) as excinfo: - # Using model_validate avoids the constructor signature check + """Omitting mean raises a 'field required' ValidationError.""" + with pytest.raises(ValidationError) as exc: RVConfig.model_validate({"distribution": Distribution.NORMAL}) assert any( err["loc"] == ("mean",) and err["type"] == "missing" - for err in excinfo.value.errors() + for err in exc.value.errors() ) -def test_gaussian_sets_variance_to_mean() -> None: - """When distribution='gaussian' and variance is omitted, variance == mean.""" - cfg = RVConfig(mean=12.5, distribution=Distribution.NORMAL) - assert cfg.variance == pytest.approx(12.5) - def test_default_distribution_is_poisson() -> None: - """ - When distribution is omitted, it defaults to 'poisson' and - variance stays None. - """ + """If distribution is missing, it defaults to 'poisson'.""" cfg = RVConfig(mean=3.3) assert cfg.distribution == Distribution.POISSON assert cfg.variance is None def test_explicit_variance_kept_for_poisson() -> None: - """If the user supplies variance even for poisson, it is preserved.""" + """Variance is kept even when distribution is poisson.""" cfg = RVConfig(mean=4.0, distribution=Distribution.POISSON, variance=2.2) assert cfg.variance == pytest.approx(2.2) def test_invalid_distribution_raises() -> None: - """Supplying a non-supported distribution literal raises ValidationError.""" - with pytest.raises(ValidationError) as excinfo: + """An unsupported distribution literal raises ValidationError.""" + with pytest.raises(ValidationError): RVConfig(mean=5.0, distribution="not_a_dist") - errors = excinfo.value.errors() - # Only assert there is at least one error for the 'distribution' field: - assert any(e["loc"] == ("distribution",) for e in errors) +# --------------------------------------------------------------------------- +# RQSGENERATORINPUT - USER_SAMPLING_WINDOW +# --------------------------------------------------------------------------- -# -------------------------------------------------------------------------- -# TEST FIELD VALIDATOR USER SAMPLING WINDOW -# -------------------------------------------------------------------------- def test_default_user_sampling_window() -> None: - """When user_sampling_window is omitted, it defaults to USER_SAMPLING_WINDOW.""" + """If user_sampling_window is missing it defaults to the constant.""" inp = RqsGeneratorInput( avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, avg_request_per_minute_per_user={ @@ -96,47 +88,35 @@ def test_default_user_sampling_window() -> None: def test_explicit_user_sampling_window_kept() -> None: - """An explicit user_sampling_window value is preserved unchanged.""" - custom_window = 30 + """An explicit user_sampling_window is preserved.""" inp = RqsGeneratorInput( avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, avg_request_per_minute_per_user={ "mean": 1.0, "distribution": Distribution.POISSON, }, - user_sampling_window=custom_window, + user_sampling_window=30, ) - assert inp.user_sampling_window == custom_window + assert inp.user_sampling_window == 30 def test_user_sampling_window_not_int_raises() -> None: - """A non-integer user_sampling_window raises a ValidationError.""" - with pytest.raises(ValidationError) as excinfo: - + """A non integer user_sampling_window raises ValidationError.""" + with pytest.raises(ValidationError): RqsGeneratorInput( avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, avg_request_per_minute_per_user={ "mean": 1.0, "distribution": Distribution.POISSON, }, - user_sampling_window="not-an-int", + user_sampling_window="not-int", ) - errors = excinfo.value.errors() - assert any(err["loc"] == ("user_sampling_window",) for err in errors) - - # Pydantic v2 wording - assert any("valid integer" in err["msg"] for err in errors) - - def test_user_sampling_window_above_max_raises() -> None: - """ - Passing user_sampling_window > MAX_USER_SAMPLING_WINDOW - must raise a ValidationError. - """ + """user_sampling_window above the max constant raises ValidationError.""" too_large = TimeDefaults.MAX_USER_SAMPLING_WINDOW + 1 - with pytest.raises(ValidationError) as excinfo: + with pytest.raises(ValidationError): RqsGeneratorInput( avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, avg_request_per_minute_per_user={ @@ -146,90 +126,33 @@ def test_user_sampling_window_above_max_raises() -> None: user_sampling_window=too_large, ) - errors = excinfo.value.errors() - assert any(err["loc"] == ("user_sampling_window",) for err in errors) - - expected_snippet = ( - f"less than or equal to {TimeDefaults.MAX_USER_SAMPLING_WINDOW}" - ) - assert any(expected_snippet in err["msg"] for err in errors) +# --------------------------------------------------------------------------- +# SIMULATIONSETTINGS - TOTAL_SIMULATION_TIME +# --------------------------------------------------------------------------- -# -------------------------------------------------------------------------- -# TEST FIELD VALIDATOR TOTAL SIMULATION TIME -# -------------------------------------------------------------------------- def test_default_total_simulation_time() -> None: - """When total_simulation_time is omitted, it defaults to SIMULATION_TIME.""" - inp = RqsGeneratorInput( - avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, - avg_request_per_minute_per_user={ - "mean": 1.0, - "distribution": Distribution.POISSON, - }, - ) - assert inp.total_simulation_time == TimeDefaults.SIMULATION_TIME + """If total_simulation_time is missing it defaults to the constant.""" + settings = SimulationSettings() + assert settings.total_simulation_time == TimeDefaults.SIMULATION_TIME def test_explicit_total_simulation_time_kept() -> None: - """An explicit total_simulation_time value is preserved unchanged.""" - custom_time = 3_000 - inp = RqsGeneratorInput( - avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, - avg_request_per_minute_per_user={ - "mean": 1.0, - "distribution": Distribution.POISSON, - }, - total_simulation_time=custom_time, - ) - assert inp.total_simulation_time == custom_time + """An explicit total_simulation_time is preserved.""" + settings = SimulationSettings(total_simulation_time=3_000) + assert settings.total_simulation_time == 3_000 def test_total_simulation_time_not_int_raises() -> None: - """A non-integer total_simulation_time raises a ValidationError.""" - with pytest.raises(ValidationError) as excinfo: - - RqsGeneratorInput( - avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, - avg_request_per_minute_per_user={ - "mean": 1.0, - "distribution": Distribution.POISSON, - }, - total_simulation_time="three thousand", - ) - - errors = excinfo.value.errors() - assert any(err["loc"] == ("total_simulation_time",) for err in errors) - - # Pydantic v2 wording: “Input should be a valid integer” - assert any("valid integer" in err["msg"] for err in errors) - + """A non integer total_simulation_time raises ValidationError.""" + with pytest.raises(ValidationError): + SimulationSettings(total_simulation_time="three thousand") def test_total_simulation_time_below_minimum_raises() -> None: - """ - Passing total_simulation_time < MIN_SIMULATION_TIME - must raise a ValidationError. - """ + """A total_simulation_time below the minimum constant raises ValidationError.""" too_small = TimeDefaults.MIN_SIMULATION_TIME - 1 - with pytest.raises(ValidationError) as excinfo: - RqsGeneratorInput( - avg_active_users={"mean": 1.0, "distribution": Distribution.POISSON}, - avg_request_per_minute_per_user={ - "mean": 1.0, - "distribution": Distribution.POISSON, - }, - total_simulation_time=too_small, - ) - - errors = excinfo.value.errors() - # c'è almeno un errore sul campo giusto - assert any(err["loc"] == ("total_simulation_time",) for err in errors) - - expected_snippet = ( - f"greater than or equal to {TimeDefaults.MIN_SIMULATION_TIME}" - ) - assert any(expected_snippet in err["msg"] for err in errors) - - + with pytest.raises(ValidationError): + SimulationSettings(total_simulation_time=too_small) diff --git a/tests/unit/sampler/test_gaussian_poisson.py b/tests/unit/sampler/test_gaussian_poisson.py index b464007..c182376 100644 --- a/tests/unit/sampler/test_gaussian_poisson.py +++ b/tests/unit/sampler/test_gaussian_poisson.py @@ -1,94 +1,109 @@ -"""Unit tests for gaussian_poisson_sampling.""" +"""Unit-tests for `gaussian_poisson_sampling`.""" from __future__ import annotations import itertools from types import GeneratorType +from typing import TYPE_CHECKING -import numpy as np import pytest +from numpy.random import Generator, default_rng from app.config.constants import TimeDefaults -from app.core.event_samplers.gaussian_poisson import gaussian_poisson_sampling +from app.core.event_samplers.gaussian_poisson import ( + gaussian_poisson_sampling, +) from app.schemas.random_variables_config import RVConfig from app.schemas.requests_generator_input import RqsGeneratorInput +if TYPE_CHECKING: + + from app.schemas.simulation_settings_input import SimulationSettings + # --------------------------------------------------------------------------- -# Fixture +# FIXTURES # --------------------------------------------------------------------------- @pytest.fixture -def base_input() -> RqsGeneratorInput: - """Return a minimal, valid RqsGeneratorInput for the Gaussian-Poisson sampler.""" +def rqs_cfg() -> RqsGeneratorInput: + """Minimal, valid RqsGeneratorInput for Gaussian-Poisson tests.""" return RqsGeneratorInput( avg_active_users=RVConfig( - mean=10.0, variance=4.0, distribution="normal", + mean=10.0, + variance=4.0, + distribution="normal", ), avg_request_per_minute_per_user=RVConfig(mean=30.0), - total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, ) + # --------------------------------------------------------------------------- -# Basic behaviour +# BASIC BEHAVIOUR # --------------------------------------------------------------------------- -def test_returns_generator_type(base_input: RqsGeneratorInput) -> None: +def test_returns_generator_type( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, + rng: Generator, +) -> None: """The function must return a generator object.""" - rng = np.random.default_rng(0) - gen = gaussian_poisson_sampling(base_input, rng=rng) + gen = gaussian_poisson_sampling(rqs_cfg, sim_settings, rng=rng) assert isinstance(gen, GeneratorType) -def test_generates_positive_gaps(base_input: RqsGeneratorInput) -> None: +def test_generates_positive_gaps( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, +) -> None: """ With nominal parameters the sampler should emit at least a few positive - gaps and no gap must be non-positive. + gaps, and the cumulative time must stay below the horizon. """ - rng = np.random.default_rng(42) gaps: list[float] = list( - itertools.islice(gaussian_poisson_sampling(base_input, rng=rng), 1000), + itertools.islice( + gaussian_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(42)), + 1000, + ), ) - # At least one event is expected. - assert gaps - # No gap may be negative or zero. - assert all(gap > 0.0 for gap in gaps) - # The cumulative time of gaps must stay below the horizon. - assert sum(gaps) < base_input.total_simulation_time + assert gaps, "Expected at least one event" + assert all(g > 0.0 for g in gaps), "No gap may be ≤ 0" + assert sum(gaps) < sim_settings.total_simulation_time # --------------------------------------------------------------------------- -# Edge-case: zero users ⇒ no events +# EDGE CASE: ZERO USERS # --------------------------------------------------------------------------- def test_zero_users_produces_no_events( monkeypatch: pytest.MonkeyPatch, - base_input: RqsGeneratorInput, + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, ) -> None: """ - If every Gaussian draw returns 0 users, Λ == 0, - hence the generator must yield no events at all. + If every Gaussian draw returns 0 users, Λ == 0 and the generator must + yield no events at all. """ def fake_truncated_gaussian( mean: float, var: float, - rng: np.random.Generator, + rng: Generator, ) -> float: return 0.0 # force U = 0 - # Patch the helper so that it always returns 0 users. monkeypatch.setattr( "app.core.event_samplers.gaussian_poisson.truncated_gaussian_generator", fake_truncated_gaussian, ) - rng = np.random.default_rng(123) - gaps = list(gaussian_poisson_sampling(base_input, rng=rng)) + gaps: list[float] = list( + gaussian_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(123)), + ) assert gaps == [] # no events should be generated diff --git a/tests/unit/sampler/test_poisson_poisson.py b/tests/unit/sampler/test_poisson_poisson.py new file mode 100644 index 0000000..2fbbb9e --- /dev/null +++ b/tests/unit/sampler/test_poisson_poisson.py @@ -0,0 +1,124 @@ +"""Unit tests for `poisson_poisson_sampling`.""" + +from __future__ import annotations + +import itertools +import math +from types import GeneratorType +from typing import TYPE_CHECKING + +import pytest +from numpy.random import Generator, default_rng + +from app.config.constants import TimeDefaults +from app.core.event_samplers.poisson_poisson import poisson_poisson_sampling +from app.schemas.random_variables_config import RVConfig +from app.schemas.requests_generator_input import RqsGeneratorInput + +if TYPE_CHECKING: + + from app.schemas.simulation_settings_input import SimulationSettings + + +@pytest.fixture +def rqs_cfg() -> RqsGeneratorInput: + """Return a minimal, valid RqsGeneratorInput for the sampler tests.""" + return RqsGeneratorInput( + avg_active_users={"mean": 1.0, "distribution": "poisson"}, + avg_request_per_minute_per_user={"mean": 60.0, "distribution": "poisson"}, + user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, + ) + +# -------------------------------------------------------- +# BASIC SHAPE AND TYPE TESTS +# -------------------------------------------------------- + + +def test_sampler_returns_generator( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, + rng: Generator, +) -> None: + """Function must return a generator object.""" + gen = poisson_poisson_sampling(rqs_cfg, sim_settings, rng=rng) + assert isinstance(gen, GeneratorType) + + +def test_all_gaps_are_positive( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, +) -> None: + """Every yielded gap must be strictly positive.""" + gaps = list( + itertools.islice( + poisson_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(1)), + 1_000, + ), + ) + assert all(g > 0.0 for g in gaps) + + +# --------------------------------------------------------------------------- +# REPRODUCIBILITY WITH FIXED SEED +# --------------------------------------------------------------------------- + + +def test_sampler_is_reproducible_with_fixed_seed( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, +) -> None: + """Same RNG seed must produce identical first N gaps.""" + seed = 42 + n_samples = 15 + + gaps_1 = list( + itertools.islice( + poisson_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(seed)), + n_samples, + ), + ) + gaps_2 = list( + itertools.islice( + poisson_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(seed)), + n_samples, + ), + ) + assert gaps_1 == gaps_2 + + +# --------------------------------------------------------------------------- +# EDGE CASE: ZERO USERS +# --------------------------------------------------------------------------- + + +def test_zero_users_produces_no_events( + sim_settings: SimulationSettings, +) -> None: + """If the mean user count is zero the generator must yield no events.""" + cfg_zero = RqsGeneratorInput( + avg_active_users=RVConfig(mean=0.0, distribution="poisson"), + avg_request_per_minute_per_user=RVConfig(mean=60.0, distribution="poisson"), + user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, + ) + + gaps: list[float] = list( + poisson_poisson_sampling(cfg_zero, sim_settings, rng=default_rng(123)), + ) + assert gaps == [] + + +# --------------------------------------------------------------------------- +# CUMULATIVE TIME NEVER EXCEEDS THE HORIZON +# --------------------------------------------------------------------------- + + +def test_cumulative_time_never_exceeds_horizon( + rqs_cfg: RqsGeneratorInput, + sim_settings: SimulationSettings, +) -> None: + """Sum of gaps must stay below the simulation horizon.""" + gaps: list[float] = list( + poisson_poisson_sampling(rqs_cfg, sim_settings, rng=default_rng(7)), + ) + cum_time = math.fsum(gaps) + assert cum_time < sim_settings.total_simulation_time diff --git a/tests/unit/sampler/test_poisson_posson.py b/tests/unit/sampler/test_poisson_posson.py deleted file mode 100644 index 48d4f9a..0000000 --- a/tests/unit/sampler/test_poisson_posson.py +++ /dev/null @@ -1,120 +0,0 @@ -"""Unit tests for the poisson_poisson_sampling generator.""" - -from __future__ import annotations - -import itertools -import math -from types import GeneratorType - -import numpy as np -import pytest - -from app.config.constants import TimeDefaults -from app.core.event_samplers.poisson_poisson import poisson_poisson_sampling -from app.schemas.random_variables_config import RVConfig -from app.schemas.requests_generator_input import RqsGeneratorInput - - -@pytest.fixture -def base_input() -> RqsGeneratorInput: - """Return a minimal-valid RqsGeneratorInput for the sampler tests.""" - return RqsGeneratorInput( - # 1 average concurrent user … - avg_active_users={"mean": 1.0, "distribution": "poisson"}, - # … sending on average 60 req/min → 1 req/s - avg_request_per_minute_per_user={"mean": 60.0, "distribution": "poisson"}, - total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, # 30 min - user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, # 60 s - ) - - -# --------------------------------------------------------------------- -# BASIC SHAPE / TYPE TESTS -# --------------------------------------------------------------------- - - -def test_sampler_returns_generator(base_input: RqsGeneratorInput) -> None: - """The function must return a real generator object.""" - rng = np.random.default_rng(0) - gen = poisson_poisson_sampling(base_input, rng=rng) - - assert isinstance(gen, GeneratorType) - - -def test_all_gaps_are_positive(base_input: RqsGeneratorInput) -> None: - """Every yielded inter-arrival gap Δt must be > 0.""" - rng = np.random.default_rng(1) - gaps: list[float] = list( - itertools.islice(poisson_poisson_sampling(base_input, rng=rng), 1_000), - ) - - # None of the first 1 000 gaps (if any) can be negative or zero - assert all(gap > 0.0 for gap in gaps) - - -# --------------------------------------------------------------------- -# REPRODUCIBILITY WITH FIXED RNG SEED -# --------------------------------------------------------------------- - - -def test_sampler_is_reproducible_with_fixed_seed(base_input: RqsGeneratorInput) -> None: - """Same seed ⇒ identical first N gaps.""" - seed = 42 - n_samples = 15 - - gaps_1 = list( - itertools.islice( - poisson_poisson_sampling( - base_input, rng=np.random.default_rng(seed), - ), - n_samples, - ), - ) - gaps_2 = list( - itertools.islice( - poisson_poisson_sampling( - base_input, rng=np.random.default_rng(seed), - ), - n_samples, - ), - ) - - assert gaps_1 == gaps_2 - - -# --------------------------------------------------------------------- -# EDGE-CASE: ZERO USERS ⇒ NO EVENTS -# --------------------------------------------------------------------- - - -def test_zero_users_produces_no_events(base_input: RqsGeneratorInput) -> None: - """ - With mean concurrent users == 0 the Poisson draw is almost surely 0, - so Λ = 0 and the generator should yield no events. - """ - input_data = RqsGeneratorInput( - avg_active_users=RVConfig(mean=0.0, distribution="poisson"), - avg_request_per_minute_per_user=RVConfig(mean=60.0, distribution="poisson"), - total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, - user_sampling_window=TimeDefaults.USER_SAMPLING_WINDOW, - ) - - rng = np.random.default_rng(123) - gaps = list(poisson_poisson_sampling(input_data, rng=rng)) - - assert gaps == [] # no events expected - -# --------------------------------------------------------------------- -# CUMULATIVE TIME ALWAYS < SIMULATION HORIZON -# --------------------------------------------------------------------- - - -def test_cumulative_time_never_exceeds_horizon(base_input: RqsGeneratorInput) -> None: - """ΣΔt (virtual clock) must stay strictly below total_simulation_time.""" - rng = np.random.default_rng(7) - gaps = list(poisson_poisson_sampling(base_input, rng=rng)) - - cum_time = math.fsum(gaps) - # Even if the virtual clock can jump when λ == 0, - # the summed gaps must never exceed the horizon. - assert cum_time < base_input.total_simulation_time diff --git a/tests/unit/simulation/test_requests_generator.py b/tests/unit/simulation/test_requests_generator.py index 9b77baf..fe72f2e 100644 --- a/tests/unit/simulation/test_requests_generator.py +++ b/tests/unit/simulation/test_requests_generator.py @@ -1,57 +1,48 @@ -"""Unit test to verify the behaviour of the rqs generator""" +"""Unit-tests for the requests generator and the SimPy runner. + +All common fixtures (`rng`, `rqs_input`, `sim_settings`, `payload_base`, …) +are defined once in *tests/conftest.py*. +This module focuses purely on behavioural checks. +""" from __future__ import annotations from types import GeneratorType from typing import TYPE_CHECKING -import numpy as np import pytest -from app.config.constants import TimeDefaults from app.core.simulation.requests_generator import requests_generator from app.core.simulation.simulation_run import run_simulation -from app.schemas.requests_generator_input import RqsGeneratorInput - -if TYPE_CHECKING: +if TYPE_CHECKING: # static-typing only from collections.abc import Iterator + from numpy.random import Generator + + from app.schemas.full_simulation_input import SimulationPayload + from app.schemas.requests_generator_input import RqsGeneratorInput from app.schemas.simulation_output import SimulationOutput + from app.schemas.simulation_settings_input import SimulationSettings -# -------------------------------------------------------------- -# TESTS INPUT -# -------------------------------------------------------------- - -@pytest.fixture -def base_input() -> RqsGeneratorInput: - """Return a RqsGeneratorInput with a 120-second simulation horizon.""" - return RqsGeneratorInput( - avg_active_users={"mean": 1.0}, - avg_request_per_minute_per_user={"mean": 2.0}, - total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, - ) -# -------------------------------------------------------------- -# REQUESTS GENERATOR FUNCTION TESTS -# -------------------------------------------------------------- +# --------------------------------------------------------------------------- +# REQUESTS-GENERATOR - dispatcher tests +# --------------------------------------------------------------------------- + def test_default_requests_generator_uses_poisson_poisson_sampling( - base_input: RqsGeneratorInput, + rqs_input: RqsGeneratorInput, + sim_settings: SimulationSettings, + rng: Generator, ) -> None: - """ - Verify that when avg_active_users.distribution is the default 'poisson', - requests_generator returns an iterator whose code object is from - poisson_poisson_sampling. - """ - rng = np.random.default_rng(0) - gen = requests_generator(base_input, rng=rng) - # It must be a generator. - assert isinstance(gen, GeneratorType) + """Default distribution must map to *poisson_poisson_sampling*.""" + gen = requests_generator(rqs_input, sim_settings, rng=rng) - # Internally, it should call poisson_poisson_sampling. + assert isinstance(gen, GeneratorType) assert gen.gi_code.co_name == "poisson_poisson_sampling" + @pytest.mark.parametrize( ("dist", "expected_sampler"), [ @@ -62,136 +53,93 @@ def test_default_requests_generator_uses_poisson_poisson_sampling( def test_requests_generator_dispatches_to_correct_sampler( dist: str, expected_sampler: str, + rqs_input: RqsGeneratorInput, + sim_settings: SimulationSettings, + rng: Generator, ) -> None: - """ - Verify that requests_generator returns a generator whose code object - comes from the appropriate sampler function based on distribution: - - 'poisson' → poisson_poisson_sampling - - 'normal' → gaussian_poisson_sampling - """ - input_data = RqsGeneratorInput( - avg_active_users={"mean": 1.0, "distribution": dist}, - avg_request_per_minute_per_user={"mean": 1.0}, - total_simulation_time=TimeDefaults.MIN_SIMULATION_TIME, - ) - rng = np.random.default_rng(0) - gen = requests_generator(input_data, rng=rng) + """Dispatcher must select the sampler matching *dist*.""" + rqs_input.avg_active_users.distribution = dist # type: ignore[assignment] + gen = requests_generator(rqs_input, sim_settings, rng=rng) - # It must be a generator object. assert isinstance(gen, GeneratorType) - # Check which underlying sampler function produced it. assert gen.gi_code.co_name == expected_sampler -# -------------------------------------------------------------- -# REQUESTS GENERATOR INSIDE SIMULATION TESTS -# -------------------------------------------------------------- -def test_run_simulation_counts_events_up_to_horizon( - monkeypatch: pytest.MonkeyPatch, base_input: RqsGeneratorInput, +# --------------------------------------------------------------------------- +# SIMULATION-RUNNER - horizon handling +# --------------------------------------------------------------------------- + + +def _patch_generator( + monkeypatch: pytest.MonkeyPatch, + gaps: list[float], ) -> None: - """ - Verify that all events whose cumulative inter-arrival times - fall within the simulation horizon are counted. - For gaps [1, 2, 3, 4], cumulative times [1, 3, 6, 10] - yield 4 events by t=10. - """ - def fake_requests_generator_fixed( - data: RqsGeneratorInput, *, rng: np.random.Generator, + """Monkey-patch *requests_generator* with a deterministic gap sequence.""" + + def _fake( + data: RqsGeneratorInput, + config: SimulationSettings, # unused, keeps signature + *, + rng: Generator | None = None, ) -> Iterator[float]: - # Replace the complex Poisson-Poisson sampler with a deterministic sequence. - yield from [1.0, 2.0, 3.0, 4.0] + yield from gaps - # Monkeypatch the internal requests_generator to use our simple generator. monkeypatch.setattr( "app.core.simulation.simulation_run.requests_generator", - fake_requests_generator_fixed, + _fake, ) - # The rng argument is unused in this deterministic test. - rng = np.random.default_rng(42) - output: SimulationOutput = run_simulation(base_input, rng=rng) - assert output.total_requests["total_requests"] == 4 - # The returned metrics should reflect the input means as strings. - assert output.metric_2 == str(base_input.avg_request_per_minute_per_user.mean) - assert output.metric_n == str(base_input.avg_active_users.mean) - - -def test_run_simulation_includes_event_at_exact_horizon( - monkeypatch: pytest.MonkeyPatch, base_input: RqsGeneratorInput, +def test_run_simulation_counts_events_up_to_horizon( + monkeypatch: pytest.MonkeyPatch, + payload_base: SimulationPayload, + rng: Generator, ) -> None: - """ - Confirm that an event scheduled exactly at the simulation horizon - is not processed, since SimPy stops at t == horizon. - """ - def fake_generator_at_horizon( - data: RqsGeneratorInput, *, rng: np.random.Generator, - ) -> Iterator[float]: + """All events with cumulative time ≤ horizon must be counted.""" + _patch_generator(monkeypatch, gaps=[1.0, 2.0, 3.0, 4.0]) - # mypy assertion, pydantic guaranteed - assert base_input.total_simulation_time is not None - # Yield a single event at exactly t == simulation_time. - yield float(base_input.total_simulation_time) + output: SimulationOutput = run_simulation(payload_base, rng=rng) - monkeypatch.setattr( - "app.core.simulation.simulation_run.requests_generator", - fake_generator_at_horizon, + assert output.total_requests["total_requests"] == 4 + assert output.metric_2 == str( + payload_base.rqs_input.avg_request_per_minute_per_user.mean, ) + assert output.metric_n == str(payload_base.rqs_input.avg_active_users.mean) - rng = np.random.default_rng(123) - output: SimulationOutput = run_simulation(base_input, rng=rng) - # SimPy does not execute events scheduled exactly at the stop time. +def test_run_simulation_skips_event_at_exact_horizon( + monkeypatch: pytest.MonkeyPatch, + payload_base: SimulationPayload, + rng: Generator, +) -> None: + """An event scheduled exactly at *t == horizon* is ignored.""" + horizon = payload_base.sim_settings.total_simulation_time + _patch_generator(monkeypatch, gaps=[float(horizon)]) + + output: SimulationOutput = run_simulation(payload_base, rng=rng) assert output.total_requests["total_requests"] == 0 def test_run_simulation_excludes_event_beyond_horizon( - monkeypatch: pytest.MonkeyPatch, base_input: RqsGeneratorInput, + monkeypatch: pytest.MonkeyPatch, + payload_base: SimulationPayload, + rng: Generator, ) -> None: - """ - Ensure that events scheduled after the simulation horizon - are not counted. - """ - def fake_generator_beyond_horizon( - data: RqsGeneratorInput, *, rng: np.random.Generator, - ) -> Iterator[float]: - - # mypy assertion, pydantic guaranteed - assert base_input.total_simulation_time is not None - # Yield a single event just beyond the horizon. - yield float(base_input.total_simulation_time) + 0.1 - - monkeypatch.setattr( - "app.core.simulation.simulation_run.requests_generator", - fake_generator_beyond_horizon, - ) - - rng = np.random.default_rng(999) - output: SimulationOutput = run_simulation(base_input, rng=rng) + """Events strictly after the horizon must not be counted.""" + horizon = payload_base.sim_settings.total_simulation_time + _patch_generator(monkeypatch, gaps=[float(horizon) + 0.1]) + output: SimulationOutput = run_simulation(payload_base, rng=rng) assert output.total_requests["total_requests"] == 0 def test_run_simulation_zero_events_when_generator_empty( - monkeypatch: pytest.MonkeyPatch, base_input: RqsGeneratorInput, + monkeypatch: pytest.MonkeyPatch, + payload_base: SimulationPayload, + rng: Generator, ) -> None: - """ - Check that run_simulation reports zero requests when no - inter-arrival times are yielded. - """ - def fake_generator_empty( - data: RqsGeneratorInput, *, rng: np.random.Generator, - ) -> Iterator[float]: - # Empty generator yields nothing. - if False: - yield # pragma: no cover - - monkeypatch.setattr( - "app.core.simulation.simulation_run.requests_generator", - fake_generator_empty, - ) - - rng = np.random.default_rng(2025) - output: SimulationOutput = run_simulation(base_input, rng=rng) + """No gaps => no requests counted.""" + _patch_generator(monkeypatch, gaps=[]) + output: SimulationOutput = run_simulation(payload_base, rng=rng) assert output.total_requests["total_requests"] == 0