Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions documentation/FASTSIM_VISION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
## 1 Why FastSim?

FastAPI + Uvicorn gives Python teams a lightning-fast async stack, yet sizing it for production still means guess-work, costly cloud load-tests or late surprises. **FastSim** fills that gap by becoming a **digital twin** of your actual service:

* It **replays** your FastAPI + Uvicorn event-loop behavior in SimPy, generating exactly the same kinds of asynchronous steps (parsing, CPU work, I/O, LLM calls) that happen in real code.
* It **models** your infrastructure primitives—CPU cores (via a SimPy `Resource`), database pools, rate-limiters, even GPU inference quotas—so you can see queue lengths, scheduling delays, resource utilization, and end-to-end latency.
* It **outputs** the very metrics you’d scrape in production (p50/p95/p99 latency, ready-queue lag, current & max concurrency, throughput, cost per LLM call), but entirely offline, in seconds.

With FastSim you can ask *“What happens if traffic doubles on Black Friday?”*, *“How many cores to keep p95 < 100 ms?”* or *“Is our LLM-driven endpoint ready for prime time?”*—and get quantitative answers **before** you deploy.

**Outcome:** data-driven capacity planning, early performance tuning, and far fewer “surprises” once you hit production.

---

## 2 Project Goals

| # | Goal | Practical Outcome |
| - | ------------------------- | ------------------------------------------------------------------------ |
| 1 | **Pre-production sizing** | Know core-count, pool-size, replica-count to hit SLA. |
| 2 | **Scenario lab** | Explore traffic models, endpoint mixes, latency distributions, RTT, etc. |
| 3 | **Twin metrics** | Produce the same metrics you’ll scrape in prod (latency, queue, CPU). |
| 4 | **Rapid iteration** | One YAML/JSON config or REST call → full report. |
| 5 | **Educational value** | Visualise how GIL lag, queue length, concurrency react to load. |

---

## 3 Who benefits & why (detailed)

| Audience | Pain-point solved | FastSim value |
| ------------------------------ | --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Backend engineers** | Unsure if 4 vCPU container survives a marketing spike | Run *what-if* load, tweak CPU cores / pool size, get p95 & max-concurrency before merging. |
| **DevOps / SRE** | Guesswork in capacity planning; cost of over-provisioning | Simulate 1 → N replicas, autoscaler thresholds, DB-pool size; pick the cheapest config meeting SLA. |
| **ML / LLM product teams** | LLM inference cost & latency hard to predict | Model the LLM step with a price + latency distribution; estimate \$/req and GPU batch gains without real GPU. |
| **Educators / Trainers** | Students struggle to “see” event-loop internals | Visualise GIL ready-queue lag, CPU vs I/O steps, effect of blocking code—perfect for live demos and labs. |
| **Consultants / Architects** | Need a quick PoC of new designs for clients | Drop endpoint definitions in YAML and demo throughput / latency under projected load in minutes. |
| **Open-source community** | Lacks a lightweight Python simulator for ASGI workloads | Extensible codebase; easy to plug in new resources (rate-limit, cache) or traffic models (spike, uniform ramp). |
| **System-design interviewees** | Hard to quantify trade-offs in whiteboard interviews | Prototype real-time metrics—queue lengths, concurrency, latency distributions—to demonstrate in interviews how your design scales and where bottlenecks lie. |

---

**Bottom-line:** FastSim turns abstract architecture diagrams into concrete numbers—*before* spinning up expensive cloud environments—so you can build, validate and discuss your designs with full confidence.
281 changes: 281 additions & 0 deletions documentation/backend_documentation/requests_generator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Requests Generator

This document describes the design of the **requests generator**, which models a stream of user requests to a given endpoint over time.

---

## Model Inputs and Output

Following the FastSim philosophy, we accept a small set of input parameters to drive a “what-if” analysis in a pre-production environment. These inputs let you explore reliability and cost implications under different traffic scenarios.

**Inputs**

1. **Average concurrent users** – expected number of users (or sessions) simultaneously hitting the endpoint.
2. **Average requests per minute per user** – average number of requests each user issues per minute.
3. **Simulation time** – total duration of the simulation, in seconds.

**Output**
A continuous sequence of timestamps (seconds) marking individual request arrivals.

---

## Model Assumptions

* *Concurrent users* and *requests per minute per user* are **random variables**.
* *Simulation time* is **deterministic**.

We model:

* **Requests per minute per user** as Poisson($\lambda_r$).
* **Concurrent users** as either Poisson($\lambda_u$) or truncated Normal.
* **The variables are independent**

```python
from pydantic import BaseModel
from typing import Literal

class RVConfig(BaseModel):
"""Configure a random-variable parameter."""
mean: float
distribution: Literal["poisson", "normal", "gaussian"] = "poisson"
variance: float | None = None # required only for normal/gaussian

class SimulationInput(BaseModel):
"""Define simulation inputs."""
avg_active_users: RVConfig
avg_request_per_minute_per_user: RVConfig
total_simulation_time: int | None = None
```

---

## Aggregate Request Rate

From the two random inputs we define the **per-second aggregate rate** $\Lambda$:

$$
\Lambda
= \text{concurrent\_users}
\;\times\;
\frac{\text{requests\_per\_minute\_per\_user}}{60}
\quad[\text{requests/s}].
$$

---

## 1. Poisson → Exponential Refresher

### 1.1 Homogeneous Poisson process

A Poisson process of rate $\lambda$ has

$$
\Pr\{N(t)=k\}
= e^{-\lambda t}\,\frac{(\lambda t)^{k}}{k!},\quad k=0,1,2,\dots
$$

### 1.2 Waiting time to first event

Define $T_1=\inf\{t>0:N(t)=1\}$.
The survival function is

$$
\Pr\{T_1>t\}
= \Pr\{N(t)=0\}
= e^{-\lambda t},
$$

so the CDF is

$$
F_{T_1}(t) = 1 - e^{-\lambda t},\quad t\ge0,
$$

and the density $f(t)=\lambda\,e^{-\lambda t}$. Thus

$$
T_1 \sim \mathrm{Exp}(\lambda),
$$

and by memorylessness every inter-arrival gap $\Delta t_i$ is i.i.d. Exp($\lambda$).

### 1.3 Inverse-CDF sampling

To draw $\Delta t\sim\mathrm{Exp}(\lambda)$:

1. Sample $U\sim\mathcal U(0,1)$.
2. Solve $U=1-e^{-\lambda\,\Delta t}$;$\Rightarrow\;\Delta t=-\ln(1-U)/\lambda$.
3. Equivalent compact form:
$\displaystyle \Delta t = -\,\ln(U)/\lambda$.

---

## 2. Poisson × Poisson Workload

### 2.1 Notation

| Symbol | Meaning | Law |
| --------------------------------- | --------------------------------------- | -------- |
| $U\sim\mathrm{Pois}(\lambda_u)$ | active users in current 1-minute window | Poisson |
| $R_i\sim\mathrm{Pois}(\lambda_r)$ | requests per minute by user *i* | Poisson |
| $N=\sum_{i=1}^U R_i$ | total requests in that minute | compound |
| $\Lambda=N/60$ | aggregate rate (requests / second) | compound |

The procedure here rely heavily on the independence of our random variables.

### 2.2 Conditional sum ⇒ Poisson

Given $U=u$:

$$
N\mid U=u
=\sum_{i=1}^{u}R_i
\;\sim\;\mathrm{Pois}(u\,\lambda_r).
$$

### 2.3 Unconditional law of $N$

By the law of total probability:

$$
\Pr\{N=n\}
=\sum_{u=0}^{\infty}
\Pr\{U=u\}\;
\Pr\{N=n\mid U=u\}
\;=\;
e^{-\lambda_u}\,\frac1{n!}
\sum_{u=0}^{\infty}
\frac{\lambda_u^u}{u!}\,
e^{-u\lambda_r}\,(u\lambda_r)^n.
$$

This is the **Poisson–Poisson compound** (Borel–Tanner) distribution.

---

## 3. Exact Hierarchical Sampler

Rather than invert the discrete CDF above, we exploit the conditional structure:

```python
# Hierarchical sampler code snippet
now = 0.0 # virtual clock (s)
window_end = 0.0 # end of the current user window
Lambda = 0.0 # aggregate rate Λ (req/s)

while now < simulation_time:
# (Re)sample U at the start of each window
if now >= window_end:
window_end = now + float(sampling_window_s)
users = poisson_variable_generator(mean_concurrent_user, rng)
Lambda = users * mean_req_per_sec_per_user

# No users → fast-forward to next window
if Lambda <= 0.0:
now = window_end
continue

# Exponential gap from a protected uniform value
u_raw = max(uniform_variable_generator(rng), 1e-15)
delta_t = -math.log(1.0 - u_raw) / Lambda

# End simulation if the next event exceeds the horizon
if now + delta_t > simulation_time:
break

# If the gap crosses the window boundary, jump to it
if now + delta_t >= window_end:
now = window_end
continue

now += delta_t
yield delta_t
```

Because each conditional step matches the exact Poisson→Exponential law, this two-stage algorithm reproduces the same joint distribution as analytically inverting the compound CDF, but with minimal computation.

---

## 4. Validity of the hierarchical sampler

The validity of the hierarchical sampler relies on a structural property of the model:

$$
N \;=\; \sum_{i=1}^{U} R_i,
$$

where each $R_i \sim \mathrm{Pois}(\lambda_r)$ is independent of the others and of $U$. Because the Poisson family is closed under convolution,

$$
N \,\big|\, U=u \;\sim\; \mathrm{Pois}\!\bigl(u\,\lambda_r\bigr).
$$

This result has two important consequences:

1. **Deterministic conditional rate** – Given $U=u$, the aggregate request arrivals constitute a homogeneous Poisson process with the *deterministic* rate

$$
\Lambda = \frac{u\,\lambda_r}{60}.
$$

All inter-arrival gaps are therefore i.i.d. exponential with parameter $\Lambda$, allowing us to use the standard inverse–CDF formula for each gap.

2. **Layered uncertainty handling** – The randomness associated with $U$ is handled in an outer step (sampling $U$ once per window), while the inner step leverages the well-known Poisson→Exponential correspondence. This two-level construction reproduces exactly the joint distribution obtained by first drawing $\Lambda = N/60$ from the compound Poisson law and then drawing gaps conditional on $\Lambda$.

If the total count could **not** be written as a sum of independent Poisson variables, the conditional distribution of $N$ would no longer be Poisson and the exponential-gap shortcut would not apply. In that situation one would need to work directly with the (generally more complex) mixed distribution of $\Lambda$ or adopt another specialized sampling scheme.



## 5. Equivalence to CDF Inversion

By the law of total probability, for any event set $A$:

$$
\Pr\{(\Lambda,\Delta t_1,\dots)\in A\}
=\sum_{u=0}^\infty
\Pr\{U=u\}\;
\Pr\{(\Lambda,\Delta t_1,\dots)\in A\mid U=u\}.
$$

Step 1 samples $\Pr\{U=u\}$, step 2–3 sample the conditional exponential gaps. Because these two factors exactly match the mixture definition of the compound CDF, the hierarchical sampler **is** an exact implementation of two-stage CDF inversion, avoiding any explicit inversion of an infinite series.

---

## 6. Gaussian × Poisson Variant

If concurrent users follow a truncated Normal,

$$
U\sim \max\{0,\;\mathcal N(\mu_u,\sigma_u^2)\},
$$

steps 2–3 remain unchanged; only step 1 draws $U$ from a continuous law. The resulting mixture is continuous, yet the hierarchical sampler remains exact.

---

## 7. Time Window

The sampling window length governs how often we re-sample $U$. It should reflect the timescale over which user count fluctuations become significant. Our default is **60 s**, but you can adjust this parameter in your configuration before each simulation.

---

## Limitations of the Requests Model

1. **Independence assumption**
Assumes per-user streams and $U$ are independent. Real traffic often exhibits user-behavior correlations (e.g., flash crowds).

2. **Exponential inter-arrival times**
Implies memorylessness; cannot capture self-throttling or long-range dependence found in real workloads.

3. **No diurnal/trend component**
User count $U$ is IID per window. To model seasonality or trends, you must vary $\lambda_u(t)$ externally.

4. **No burst-control or rate-limiting**
Does not simulate client-side throttling or server back-pressure. Any rate-limit logic must be added externally.

5. **Gaussian truncation artifacts**
In the Gaussian–Poisson variant, truncating negatives to zero and rounding can under-estimate extreme user counts.


**Key takeaway:** By structuring the generator as
$\Lambda = U\,\lambda_r/60$ with a two-stage Poisson→Exponential sampler, FastSim efficiently reproduces compound Poisson traffic dynamics without any complex CDF inversion.
9 changes: 6 additions & 3 deletions src/app/config/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
class TimeDefaults(IntEnum):
"""Default time-related constants (all in seconds)."""

MIN_TO_SEC = 60 # 1 minute → 60 s
SAMPLING_WINDOW = 60 # keep U(t) constant for 60 s
SIMULATION_HORIZON = 3_600 # run 1 h if user gives no other value
MIN_TO_SEC = 60 # 1 minute → 60 s
USER_SAMPLING_WINDOW = 60 # keep U(t) constant for 60 s, default
SIMULATION_TIME = 3_600 # run 1 h if user gives no other value
MIN_SIMULATION_TIME = 1800 # min simulation time
MIN_USER_SAMPLING_WINDOW = 1 # 1 second
MAX_USER_SAMPLING_WINDOW = 120 # 2 minutes
5 changes: 3 additions & 2 deletions src/app/core/event_samplers/gaussian_poisson.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
def gaussian_poisson_sampling(
input_data: SimulationInput,
*,
sampling_window_s: int = TimeDefaults.SAMPLING_WINDOW.value,
rng: np.random.Generator | None = None,
) -> Generator[float, None, None]:
"""
Expand All @@ -41,10 +40,12 @@ def gaussian_poisson_sampling(
rng = rng or np.random.default_rng()

simulation_time = input_data.total_simulation_time
user_sampling_window = input_data.user_sampling_window
# pydantic in the validation assign a value and mypy is not
# complaining because a None cannot be compared in the loop
# to a float
assert simulation_time is not None
assert user_sampling_window is not None

# λ_u : mean concurrent users per window
mean_concurrent_user = float(input_data.avg_active_users.mean)
Expand All @@ -68,7 +69,7 @@ def gaussian_poisson_sampling(
while now < simulation_time:
# (Re)sample U at the start of each window
if now >= window_end:
window_end = now + float(sampling_window_s)
window_end = now + float(user_sampling_window)
users = truncated_gaussian_generator(
mean_concurrent_user,
variance_concurrent_user,
Expand Down
5 changes: 3 additions & 2 deletions src/app/core/event_samplers/poisson_poisson.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
def poisson_poisson_sampling(
input_data: SimulationInput,
*,
sampling_window_s: int = TimeDefaults.SAMPLING_WINDOW.value,
rng: np.random.Generator | None = None,
) -> Generator[float, None, None]:
"""
Expand All @@ -38,10 +37,12 @@ def poisson_poisson_sampling(
rng = rng or np.random.default_rng()

simulation_time = input_data.total_simulation_time
user_sampling_window = input_data.user_sampling_window
# pydantic in the validation assign a value and mypy is not
# complaining because a None cannot be compared in the loop
# to a float
assert simulation_time is not None
assert user_sampling_window is not None

# λ_u : mean concurrent users per window
mean_concurrent_user = float(input_data.avg_active_users.mean)
Expand All @@ -60,7 +61,7 @@ def poisson_poisson_sampling(
while now < simulation_time:
# (Re)sample U at the start of each window
if now >= window_end:
window_end = now + float(sampling_window_s)
window_end = now + float(user_sampling_window)
users = poisson_variable_generator(mean_concurrent_user, rng)
lam = users * mean_req_per_sec_per_user

Expand Down
Loading