Skip to content

Add SimulatedLLM and ResilientExecutor#448

Open
LorenzaVolponi wants to merge 4 commits intoSynkraAI:mainfrom
LorenzaVolponi:main
Open

Add SimulatedLLM and ResilientExecutor#448
LorenzaVolponi wants to merge 4 commits intoSynkraAI:mainfrom
LorenzaVolponi:main

Conversation

@LorenzaVolponi
Copy link

@LorenzaVolponi LorenzaVolponi commented Feb 20, 2026

🎯 Motivation
Looking at the open issues (specifically regarding unit tests), I noticed that testing agent logic currently depends on external API calls (latency, cost, flakiness). Additionally, agents can crash unexpectedly due to malformed JSON responses or parsing errors.

🛠 Solution
This PR introduces two critical components to stabilize the core:

SimulatedLLM: A mock class that mimics the LLM provider interface.
Allows unit tests to run deterministically and offline.
Eliminates API costs in CI/CD pipelines.
Enables testing of complex agent flows without dotenv or keys.
ResilientExecutor: A wrapper for agent execution loops.
Catches parsing errors (broken JSON) automatically.
Implements a retry mechanism to prevent the main process from crashing on temporary glitches.

📈 Impact
Speeds up development: Contributors can write tests without API keys.
Increases reliability: Agents become "crash-resistant" to malformed outputs.
Closes Gaps: Directly supports the goal of increasing test coverage mentioned in the Issues.

✅ How to Test
Check the SimulatedLLM implementation.
Run the example usage in a test environment (no keys required).
Verify that ResilientExecutor catches errors and retries logic.

Summary by CodeRabbit

  • Chores
    • Added an automated CI workflow to run on pushes and pull requests to main.
    • Updated CI to test on newer Node.js versions and improved npm caching and install steps.
  • Tests
    • Added deterministic, resilient testing utilities that enable repeatable, cost-free test runs with retryable parsing and explicit error handling.

Implementa uma infraestrutura de testes com LLM simulado e executor resiliente.
@vercel
Copy link

vercel bot commented Feb 20, 2026

@LorenzaVolponi is attempting to deploy a commit to the Pedro Valério Lopez's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Walkthrough

Adds a new GitHub Actions CI workflow and a TypeScript testing module that provides a SimulatedLLM for deterministic mock responses and a ResilientExecutor that retries parsing LLM outputs with configurable behavior.

Changes

Cohort / File(s) Summary
CI/CD Workflow
.github/workflows/tests.yml
Adds "AIOS Core - CI Pipeline" workflow triggered on push/PR to main; runs job build-and-test on ubuntu-latest with Node.js matrix [20.x, 22.x] (18.x removed), checks out code, sets up Node with npm caching, runs npm ci, optional linter, and npm test with CI=true.
Testing Infrastructure
testing-infrastructure.ts
New TypeScript module introducing SimulatedLLM (enqueue scripted responses, record call history, deterministic fallback) and ResilientExecutor (configurable maxRetries, parser defaulting to JSON.parse, retries parse failures and surfaces network errors).

Sequence Diagram(s)

sequenceDiagram
    participant Test as TestHarness
    participant Exec as ResilientExecutor
    participant LLM as SimulatedLLM
    participant Parser as Parser

    Test->>Exec: execute(llmCall)
    Exec->>LLM: call llmCall()
    LLM-->>Exec: return rawOutput (string)
    Exec->>Parser: parse(rawOutput)
    alt parse succeeds
        Parser-->>Exec: parsedResult
        Exec-->>Test: return parsedResult
    else parse fails
        Parser-->>Exec: throw ParseError
        Exec->>Exec: retry up to maxRetries
        alt retries exhausted
            Exec-->>Test: throw FinalError (last parse error)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • SynkraAI/aios-core#70 — Modifies repository CI workflow and test job configuration; closely related to the new/updated GitHub Actions workflow in this PR.

Suggested labels

core, agents, tests, ci/cd, tools, workflows

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and clearly describes the main additions to the codebase: SimulatedLLM and ResilientExecutor classes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
testing-infrastructure.ts (1)

13-13: Remove unused EventEmitter import.

EventEmitter is imported but never referenced anywhere in this file.

🗑️ Proposed fix
-import { EventEmitter } from 'events';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testing-infrastructure.ts` at line 13, The file imports EventEmitter from
'events' but never uses it; remove the unused import statement (the
"EventEmitter" named import) so the top-level import line no longer includes
EventEmitter and avoid introducing any unused symbols in functions like any
testing helpers or setup functions in this module.
.github/workflows/tests.yml (1)

3-7: Add a concurrency group to cancel stale runs.

Without concurrency, rapid pushes to main will queue up parallel runs rather than cancelling superseded ones. This wastes CI minutes.

⚙️ Suggested addition
 on:
   push:
     branches: [ "main" ]
   pull_request:
     branches: [ "main" ]
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/tests.yml around lines 3 - 7, Add a top-level concurrency
block to the workflow to cancel stale runs: under the existing on: block (the
workflow triggered for push/pull_request to main) add a concurrency stanza that
sets a stable group (e.g., using github.workflow and github.ref or
github.ref_name) and cancel-in-progress: true so new pushes to main cancel
previous CI runs; update the YAML where the triggers (push/pull_request
branches: ["main"]) are defined to include this concurrency configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/tests.yml:
- Around line 13-15: Update the CI matrix under the strategy -> matrix ->
node-version entry to remove the deprecated "18.x" and add the current LTS
"22.x" so the array becomes ["20.x", "22.x"]; ensure the node-version list in
.github/workflows/tests.yml reflects this change (modify the node-version key in
the existing strategy/matrix block).
- Around line 9-11: The workflow job "build-and-test" currently relies on the
default GITHUB_TOKEN permissions; add an explicit minimal permissions block to
the workflow to restrict the token (e.g., set contents: read and any other
narrowly required read scopes like packages: read) under the job or top-level
workflow so the job can read the repo to run tests but cannot write; update the
".github/workflows/tests.yml" workflow by adding a permissions: section scoped
to the build-and-test job (or top-level) with only the necessary read
permissions.

In `@testing-infrastructure.ts`:
- Line 27: Replace the lax any types in SimulatedLLM by declaring a Message
interface (with properties role: string and content: string), changing the
callHistory field from any[] to Message[][], annotating the messages parameter
on createChatCompletion as Message[] and replacing the method's return type with
a typed shape matching the OpenAI completion response (e.g., an object with
choices array where each choice contains a message with role and content and an
optional finish_reason), then update any internal uses in SimulatedLLM
(callHistory push/read and the created completion object) to conform to these
new types.
- Around line 74-77: Change the loose any types to safe, narrowed types: in the
ExecutorConfig interface update parser to return unknown (parser: (output:
string) => unknown); make ResilientExecutor generic (class ResilientExecutor<T>)
and change execute to return Promise<T> by having execute call parser and
cast/return the narrowed T; and change the catch clause from catch (error: any)
to catch (error: unknown) then narrow it with instanceof Error (or extract
message safely) before logging/using error. Ensure all references to parser and
execute use the new generic T so callers must narrow the unknown parser result.
- Around line 56-64: The current fallback assumes messages has at least one item
and does messages[messages.length - 1].content, which will throw when messages
is empty; update the fallback in the createChatCompletion/testing-infrastructure
logic to guard against an empty messages array by computing lastUserMsg =
messages.length ? messages[messages.length - 1].content : '<no messages>' (or
''), then return the simulated response using that safe lastUserMsg so accessing
.content never throws.
- Around line 100-118: The execute() loop currently retries for any error
because a single try/catch wraps both the llmCall() and parser(); change it so
llmCall() is awaited inside its own try/catch (or before the parser try) and
immediately rethrow or propagate any errors from llmCall() (network/auth/etc.)
so they are not counted against config.maxRetries, while keeping the retry logic
only around parser(output) parse failures (increment attempts, set lastError,
warn and continue). Update references in the method: split handling around
llmCall() and this.config.parser(), ensure attempts/lastError/config.maxRetries
logic remains only in the parser-failure branch, and return parsed on success.

---

Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 3-7: Add a top-level concurrency block to the workflow to cancel
stale runs: under the existing on: block (the workflow triggered for
push/pull_request to main) add a concurrency stanza that sets a stable group
(e.g., using github.workflow and github.ref or github.ref_name) and
cancel-in-progress: true so new pushes to main cancel previous CI runs; update
the YAML where the triggers (push/pull_request branches: ["main"]) are defined
to include this concurrency configuration.

In `@testing-infrastructure.ts`:
- Line 13: The file imports EventEmitter from 'events' but never uses it; remove
the unused import statement (the "EventEmitter" named import) so the top-level
import line no longer includes EventEmitter and avoid introducing any unused
symbols in functions like any testing helpers or setup functions in this module.

Updated Node.js versions in the workflow and added permissions.
Refactor test infrastructure to improve resilience and type safety.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
testing-infrastructure.ts (1)

50-57: !messages null-guard is redundant.

TypeScript types messages as ChatCompletionMessage[] (non-nullable), so !messages can never be true. The messages.length === 0 check is the only meaningful condition.

♻️ Proposed refactor
-    if (!messages || messages.length === 0) {
+    if (messages.length === 0) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testing-infrastructure.ts` around lines 50 - 57, The null-guard on the
messages variable is redundant because messages is typed as
ChatCompletionMessage[] (non-nullable); remove the `!messages ||` portion and
only check `messages.length === 0` in the early-return block (the relevant
symbol is the messages parameter/variable and the existing early-return that
returns choices with the simulated error message).
.github/workflows/tests.yml (1)

12-14: Consider adding timeout-minutes to prevent runaway jobs.

Without a timeout, a hung test suite will occupy a runner for the 6-hour GitHub Actions default. Also, since fail-fast defaults to true, a failure on 20.x silently cancels the 22.x run, hiding compatibility differences.

⚙️ Proposed additions
   build-and-test:
     runs-on: ubuntu-latest
+    timeout-minutes: 15
 
     strategy:
+      fail-fast: false
       matrix:
         node-version: [20.x, 22.x]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/tests.yml around lines 12 - 14, Add a job-level timeout
and disable matrix fail-fast for the build-and-test job: in the build-and-test
job (job name "build-and-test") add a timeout-minutes value (e.g., 30) to
prevent hung runners, and if you use a matrix include a strategy block with
fail-fast: false to ensure all matrix variants run (so failures on one node like
20.x don’t cancel others like 22.x).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@testing-infrastructure.ts`:
- Around line 114-127: Replace the Portuguese log/error strings with English
equivalents in the ResilientExecutor parsing/retry flow: change the console.warn
that currently logs "Erro de parsing na tentativa ${attempts}. Retentando..." to
an English message (e.g., "Parsing error on attempt ${attempts}. Retrying..."),
change the console.error that logs "Erro crítico na chamada LLM. Propagando." to
an English message (e.g., "Critical LLM call error. Propagating."), and change
the thrown Error message that uses "Execução falhou após
${this.config.maxRetries} tentativas de parsing. Último erro:
${lastError?.message}" to an English message (e.g., "Execution failed after
${this.config.maxRetries} parsing attempts. Last error: ${lastError?.message}"),
ensuring you keep the same interpolated variables (attempts,
this.config.maxRetries, lastError?.message) and the same control flow in the
method where these logs/throw occur.
- Around line 24-28: The return type currently drops function_call because
ChatCompletionResponse.choices[n].message is typed as ChatCompletionMessage
(which lacks function_call) and the code casts mockResp to
ChatCompletionMessage, losing that field; also mockResp comes from an
array.shift() so it's MockResponse | undefined and the cast masks null-safety.
Fix by extending the declared message type to include function_call (or change
ChatCompletionResponse.choices.message to the union/MockResponse shape that
includes function_call), remove the unsafe cast (mockResp as
ChatCompletionMessage) and instead use a non-null assertion on the shift result
(e.g., mockResp! assigned to the correctly-typed message), referencing the types
ChatCompletionResponse, ChatCompletionMessage, MockResponse and the variable
mockResp to locate the changes.

---

Duplicate comments:
In @.github/workflows/tests.yml:
- Around line 9-10: The workflow permissions block correctly sets permissions:
contents: read to limit GITHUB_TOKEN scope; no code changes are needed—approve
and merge the PR as-is (confirm the permissions: contents: read entry is present
in .github/workflows/tests.yml before approving).
- Around line 16-18: No change required: the GitHub Actions matrix in the
workflow has been correctly updated in the node-version key to [20.x, 22.x]
(removing Node 18 and adding Node 22); leave the strategy.matrix.node-version
(node-version) entry as-is in .github/workflows/tests.yml.

---

Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 12-14: Add a job-level timeout and disable matrix fail-fast for
the build-and-test job: in the build-and-test job (job name "build-and-test")
add a timeout-minutes value (e.g., 30) to prevent hung runners, and if you use a
matrix include a strategy block with fail-fast: false to ensure all matrix
variants run (so failures on one node like 20.x don’t cancel others like 22.x).

In `@testing-infrastructure.ts`:
- Around line 50-57: The null-guard on the messages variable is redundant
because messages is typed as ChatCompletionMessage[] (non-nullable); remove the
`!messages ||` portion and only check `messages.length === 0` in the
early-return block (the relevant symbol is the messages parameter/variable and
the existing early-return that returns choices with the simulated error
message).

Copy link
Contributor

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting concept, @LorenzaVolponi! The SimulatedLLM and ResilientExecutor ideas are solid. A few technical concerns:

Positives:

  • Clean separation: SimulatedLLM for deterministic testing, ResilientExecutor for retry-on-parse-only
  • Proper error discrimination (network errors propagate immediately, parse errors retry)
  • Good TypeScript typing with explicit interfaces

Issues to address:

  1. Unused importEventEmitter is imported (line 8) but never used. Should be removed.

  2. Duplicate CI workflow — The repo already has .github/workflows/ci.yml and .github/workflows/test.yml. Adding tests.yml creates a third overlapping workflow that runs on the same triggers (push/pull_request to main). This would cause duplicate CI runs on every PR. I'd suggest removing the workflow file and letting the existing CI handle it.

  3. File placementtesting-infrastructure.ts is at the repo root. AIOS follows a structured layout:

    • Test utilities → tests/ directory
    • Shared packages → packages/
    • Core modules → .aios-core/core/

    Consider moving to tests/utils/testing-infrastructure.ts or packages/testing-infrastructure/.

  4. Unsafe cast (line 48) — mockResp as ChatCompletionMessage is an unsafe downcast. MockResponse has role: 'assistant' (literal) and optional function_call, while ChatCompletionMessage has role: 'system' | 'user' | 'assistant' | 'function'. The cast works at runtime but masks the type mismatch. Consider using a union type or a proper adapter.

  5. output variable scope (line 98) — let output = "" is declared but never read after assignment (the return parsed exits before output is used outside the try block). Either remove it or use it in the final error message for debugging context.

  6. Missing concurrency in workflow — Without concurrency settings, multiple runs of the same PR won't cancel previous runs, wasting CI minutes.

  7. Missing timeout-minutes — No timeout on the job means it could hang indefinitely.

The core classes are well designed — if the CI workflow is removed and the file is moved to the proper location, this would be a clean contribution.

@nikolasdehor
Copy link
Contributor

Tradução para pt-BR do review acima:

Conceito interessante, @LorenzaVolponi! O SimulatedLLM e ResilientExecutor são boas ideias. Algumas questões técnicas:

Pontos positivos:

  • Separação limpa: SimulatedLLM para testes determinísticos, ResilientExecutor com retry apenas em parse
  • Discriminação correta de erros (rede propaga imediatamente, parse faz retry)

Problemas a resolver:

  1. Import não utilizadoEventEmitter importado mas nunca usado (linha 8).

  2. Workflow CI duplicado — O repo já tem ci.yml e test.yml. Adicionar tests.yml cria um terceiro workflow que roda nos mesmos triggers. Isso causa runs duplicados em todo PR. Sugiro remover o arquivo de workflow.

  3. Localização do arquivotesting-infrastructure.ts está na raiz do repo. O AIOS segue um layout estruturado:

    • Utilitários de teste → diretório tests/
    • Pacotes compartilhados → packages/
      Considerar mover para tests/utils/testing-infrastructure.ts
  4. Cast inseguro (linha 48) — mockResp as ChatCompletionMessage é um downcast inseguro.

  5. Variável output sem uso (linha 98) — let output = "" é declarada mas nunca lida fora do bloco try.

  6. Falta concurrency no workflow — Sem configuração de concurrency, múltiplas execuções do mesmo PR não cancelam as anteriores.

As classes em si são bem projetadas — se o workflow CI for removido e o arquivo for movido para o local correto, seria uma contribuição limpa.

@nikolasdehor
Copy link
Contributor

Re-review: follow-up (28/02)

Oi @LorenzaVolponi, passando pra dar um follow-up na review de 4 dias atras.

Nao vi commits novos nem resposta aos 7 pontos que levantei. Entendo que todo mundo tem seu tempo, mas queria reforcar que as classes SimulatedLLM e ResilientExecutor em si sao bem projetadas — o problema esta na organizacao e nos detalhes de CI.

Status dos 7 pontos (todos em aberto)

# Item Status
1 Import EventEmitter nao utilizado Pendente
2 Workflow CI duplicado (tests.yml vs ci.yml/test.yml existentes) Pendente
3 testing-infrastructure.ts na raiz ao inves de tests/utils/ ou packages/ Pendente
4 Cast inseguro mockResp as ChatCompletionMessage (linha 48) Pendente
5 Variavel output declarada mas nunca usada (linha 98) Pendente
6 Falta concurrency no workflow Pendente
7 Falta timeout-minutes no job Pendente

O que eu sugiro como caminho mais rapido

O fix mais impactante e remover o arquivo .github/workflows/tests.yml inteiro (itens 2, 6, 7 resolvidos de uma vez) — o repo ja tem CI configurado. Depois, mover o arquivo pra tests/utils/testing-infrastructure.ts (item 3), e limpar o codigo (itens 1, 4, 5). Sao mudancas pequenas que desbloqueiam o merge.

Se precisar de ajuda ou tiver duvida sobre algum ponto, e so falar! As classes em si estao solidas, so precisa desses ajustes estruturais.

@LorenzaVolponi
Copy link
Author

Opa @nikolasdehor, obrigado pelo follow-up, e peço desculpas pela demora em retorno, imprevistos técnicos !

Criei uma branch limpa (patch-4) para resolver todos os 7 pontos da sua review de uma vez:
fix: move infra to tests/utils and resolve all review points

Mudanças:

CI Workflow (Points 2, 6, 7): Removed .github/workflows/tests.yml to avoid CI duplication.
File Location (Point 3): Arquivo movido testing-infrastructure.ts.
Code Cleanups (Points 1, 4, 5):
Removed unused EventEmitter import.
Fixed unsafe cast using Non-null assertion (!) and strict typing (MockResponse extends ChatCompletionMessage).
Removed unused output variable and adjusted scope.
Improved error discrimination in ResilientExecutor (network errors propagate immediately).

Obrigado pela paciência e pelo review detalhado! Pronto para the merge. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants