Add SimulatedLLM and ResilientExecutor#448
Add SimulatedLLM and ResilientExecutor#448LorenzaVolponi wants to merge 4 commits intoSynkraAI:mainfrom
Conversation
Implementa uma infraestrutura de testes com LLM simulado e executor resiliente.
|
@LorenzaVolponi is attempting to deploy a commit to the Pedro Valério Lopez's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughAdds a new GitHub Actions CI workflow and a TypeScript testing module that provides a SimulatedLLM for deterministic mock responses and a ResilientExecutor that retries parsing LLM outputs with configurable behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as TestHarness
participant Exec as ResilientExecutor
participant LLM as SimulatedLLM
participant Parser as Parser
Test->>Exec: execute(llmCall)
Exec->>LLM: call llmCall()
LLM-->>Exec: return rawOutput (string)
Exec->>Parser: parse(rawOutput)
alt parse succeeds
Parser-->>Exec: parsedResult
Exec-->>Test: return parsedResult
else parse fails
Parser-->>Exec: throw ParseError
Exec->>Exec: retry up to maxRetries
alt retries exhausted
Exec-->>Test: throw FinalError (last parse error)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
testing-infrastructure.ts (1)
13-13: Remove unusedEventEmitterimport.
EventEmitteris imported but never referenced anywhere in this file.🗑️ Proposed fix
-import { EventEmitter } from 'events';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing-infrastructure.ts` at line 13, The file imports EventEmitter from 'events' but never uses it; remove the unused import statement (the "EventEmitter" named import) so the top-level import line no longer includes EventEmitter and avoid introducing any unused symbols in functions like any testing helpers or setup functions in this module..github/workflows/tests.yml (1)
3-7: Add aconcurrencygroup to cancel stale runs.Without
concurrency, rapid pushes tomainwill queue up parallel runs rather than cancelling superseded ones. This wastes CI minutes.⚙️ Suggested addition
on: push: branches: [ "main" ] pull_request: branches: [ "main" ] +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + jobs:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/tests.yml around lines 3 - 7, Add a top-level concurrency block to the workflow to cancel stale runs: under the existing on: block (the workflow triggered for push/pull_request to main) add a concurrency stanza that sets a stable group (e.g., using github.workflow and github.ref or github.ref_name) and cancel-in-progress: true so new pushes to main cancel previous CI runs; update the YAML where the triggers (push/pull_request branches: ["main"]) are defined to include this concurrency configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/tests.yml:
- Around line 13-15: Update the CI matrix under the strategy -> matrix ->
node-version entry to remove the deprecated "18.x" and add the current LTS
"22.x" so the array becomes ["20.x", "22.x"]; ensure the node-version list in
.github/workflows/tests.yml reflects this change (modify the node-version key in
the existing strategy/matrix block).
- Around line 9-11: The workflow job "build-and-test" currently relies on the
default GITHUB_TOKEN permissions; add an explicit minimal permissions block to
the workflow to restrict the token (e.g., set contents: read and any other
narrowly required read scopes like packages: read) under the job or top-level
workflow so the job can read the repo to run tests but cannot write; update the
".github/workflows/tests.yml" workflow by adding a permissions: section scoped
to the build-and-test job (or top-level) with only the necessary read
permissions.
In `@testing-infrastructure.ts`:
- Line 27: Replace the lax any types in SimulatedLLM by declaring a Message
interface (with properties role: string and content: string), changing the
callHistory field from any[] to Message[][], annotating the messages parameter
on createChatCompletion as Message[] and replacing the method's return type with
a typed shape matching the OpenAI completion response (e.g., an object with
choices array where each choice contains a message with role and content and an
optional finish_reason), then update any internal uses in SimulatedLLM
(callHistory push/read and the created completion object) to conform to these
new types.
- Around line 74-77: Change the loose any types to safe, narrowed types: in the
ExecutorConfig interface update parser to return unknown (parser: (output:
string) => unknown); make ResilientExecutor generic (class ResilientExecutor<T>)
and change execute to return Promise<T> by having execute call parser and
cast/return the narrowed T; and change the catch clause from catch (error: any)
to catch (error: unknown) then narrow it with instanceof Error (or extract
message safely) before logging/using error. Ensure all references to parser and
execute use the new generic T so callers must narrow the unknown parser result.
- Around line 56-64: The current fallback assumes messages has at least one item
and does messages[messages.length - 1].content, which will throw when messages
is empty; update the fallback in the createChatCompletion/testing-infrastructure
logic to guard against an empty messages array by computing lastUserMsg =
messages.length ? messages[messages.length - 1].content : '<no messages>' (or
''), then return the simulated response using that safe lastUserMsg so accessing
.content never throws.
- Around line 100-118: The execute() loop currently retries for any error
because a single try/catch wraps both the llmCall() and parser(); change it so
llmCall() is awaited inside its own try/catch (or before the parser try) and
immediately rethrow or propagate any errors from llmCall() (network/auth/etc.)
so they are not counted against config.maxRetries, while keeping the retry logic
only around parser(output) parse failures (increment attempts, set lastError,
warn and continue). Update references in the method: split handling around
llmCall() and this.config.parser(), ensure attempts/lastError/config.maxRetries
logic remains only in the parser-failure branch, and return parsed on success.
---
Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 3-7: Add a top-level concurrency block to the workflow to cancel
stale runs: under the existing on: block (the workflow triggered for
push/pull_request to main) add a concurrency stanza that sets a stable group
(e.g., using github.workflow and github.ref or github.ref_name) and
cancel-in-progress: true so new pushes to main cancel previous CI runs; update
the YAML where the triggers (push/pull_request branches: ["main"]) are defined
to include this concurrency configuration.
In `@testing-infrastructure.ts`:
- Line 13: The file imports EventEmitter from 'events' but never uses it; remove
the unused import statement (the "EventEmitter" named import) so the top-level
import line no longer includes EventEmitter and avoid introducing any unused
symbols in functions like any testing helpers or setup functions in this module.
Updated Node.js versions in the workflow and added permissions.
Refactor test infrastructure to improve resilience and type safety.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
testing-infrastructure.ts (1)
50-57:!messagesnull-guard is redundant.TypeScript types
messagesasChatCompletionMessage[](non-nullable), so!messagescan never betrue. Themessages.length === 0check is the only meaningful condition.♻️ Proposed refactor
- if (!messages || messages.length === 0) { + if (messages.length === 0) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing-infrastructure.ts` around lines 50 - 57, The null-guard on the messages variable is redundant because messages is typed as ChatCompletionMessage[] (non-nullable); remove the `!messages ||` portion and only check `messages.length === 0` in the early-return block (the relevant symbol is the messages parameter/variable and the existing early-return that returns choices with the simulated error message)..github/workflows/tests.yml (1)
12-14: Consider addingtimeout-minutesto prevent runaway jobs.Without a timeout, a hung test suite will occupy a runner for the 6-hour GitHub Actions default. Also, since
fail-fastdefaults totrue, a failure on20.xsilently cancels the22.xrun, hiding compatibility differences.⚙️ Proposed additions
build-and-test: runs-on: ubuntu-latest + timeout-minutes: 15 strategy: + fail-fast: false matrix: node-version: [20.x, 22.x]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/tests.yml around lines 12 - 14, Add a job-level timeout and disable matrix fail-fast for the build-and-test job: in the build-and-test job (job name "build-and-test") add a timeout-minutes value (e.g., 30) to prevent hung runners, and if you use a matrix include a strategy block with fail-fast: false to ensure all matrix variants run (so failures on one node like 20.x don’t cancel others like 22.x).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@testing-infrastructure.ts`:
- Around line 114-127: Replace the Portuguese log/error strings with English
equivalents in the ResilientExecutor parsing/retry flow: change the console.warn
that currently logs "Erro de parsing na tentativa ${attempts}. Retentando..." to
an English message (e.g., "Parsing error on attempt ${attempts}. Retrying..."),
change the console.error that logs "Erro crítico na chamada LLM. Propagando." to
an English message (e.g., "Critical LLM call error. Propagating."), and change
the thrown Error message that uses "Execução falhou após
${this.config.maxRetries} tentativas de parsing. Último erro:
${lastError?.message}" to an English message (e.g., "Execution failed after
${this.config.maxRetries} parsing attempts. Last error: ${lastError?.message}"),
ensuring you keep the same interpolated variables (attempts,
this.config.maxRetries, lastError?.message) and the same control flow in the
method where these logs/throw occur.
- Around line 24-28: The return type currently drops function_call because
ChatCompletionResponse.choices[n].message is typed as ChatCompletionMessage
(which lacks function_call) and the code casts mockResp to
ChatCompletionMessage, losing that field; also mockResp comes from an
array.shift() so it's MockResponse | undefined and the cast masks null-safety.
Fix by extending the declared message type to include function_call (or change
ChatCompletionResponse.choices.message to the union/MockResponse shape that
includes function_call), remove the unsafe cast (mockResp as
ChatCompletionMessage) and instead use a non-null assertion on the shift result
(e.g., mockResp! assigned to the correctly-typed message), referencing the types
ChatCompletionResponse, ChatCompletionMessage, MockResponse and the variable
mockResp to locate the changes.
---
Duplicate comments:
In @.github/workflows/tests.yml:
- Around line 9-10: The workflow permissions block correctly sets permissions:
contents: read to limit GITHUB_TOKEN scope; no code changes are needed—approve
and merge the PR as-is (confirm the permissions: contents: read entry is present
in .github/workflows/tests.yml before approving).
- Around line 16-18: No change required: the GitHub Actions matrix in the
workflow has been correctly updated in the node-version key to [20.x, 22.x]
(removing Node 18 and adding Node 22); leave the strategy.matrix.node-version
(node-version) entry as-is in .github/workflows/tests.yml.
---
Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 12-14: Add a job-level timeout and disable matrix fail-fast for
the build-and-test job: in the build-and-test job (job name "build-and-test")
add a timeout-minutes value (e.g., 30) to prevent hung runners, and if you use a
matrix include a strategy block with fail-fast: false to ensure all matrix
variants run (so failures on one node like 20.x don’t cancel others like 22.x).
In `@testing-infrastructure.ts`:
- Around line 50-57: The null-guard on the messages variable is redundant
because messages is typed as ChatCompletionMessage[] (non-nullable); remove the
`!messages ||` portion and only check `messages.length === 0` in the
early-return block (the relevant symbol is the messages parameter/variable and
the existing early-return that returns choices with the simulated error
message).
nikolasdehor
left a comment
There was a problem hiding this comment.
Interesting concept, @LorenzaVolponi! The SimulatedLLM and ResilientExecutor ideas are solid. A few technical concerns:
Positives:
- Clean separation: SimulatedLLM for deterministic testing, ResilientExecutor for retry-on-parse-only
- Proper error discrimination (network errors propagate immediately, parse errors retry)
- Good TypeScript typing with explicit interfaces
Issues to address:
-
Unused import —
EventEmitteris imported (line 8) but never used. Should be removed. -
Duplicate CI workflow — The repo already has
.github/workflows/ci.ymland.github/workflows/test.yml. Addingtests.ymlcreates a third overlapping workflow that runs on the same triggers (push/pull_requesttomain). This would cause duplicate CI runs on every PR. I'd suggest removing the workflow file and letting the existing CI handle it. -
File placement —
testing-infrastructure.tsis at the repo root. AIOS follows a structured layout:- Test utilities →
tests/directory - Shared packages →
packages/ - Core modules →
.aios-core/core/
Consider moving to
tests/utils/testing-infrastructure.tsorpackages/testing-infrastructure/. - Test utilities →
-
Unsafe cast (line 48) —
mockResp as ChatCompletionMessageis an unsafe downcast.MockResponsehasrole: 'assistant'(literal) and optionalfunction_call, whileChatCompletionMessagehasrole: 'system' | 'user' | 'assistant' | 'function'. The cast works at runtime but masks the type mismatch. Consider using a union type or a proper adapter. -
outputvariable scope (line 98) —let output = ""is declared but never read after assignment (thereturn parsedexits beforeoutputis used outside the try block). Either remove it or use it in the final error message for debugging context. -
Missing
concurrencyin workflow — Withoutconcurrencysettings, multiple runs of the same PR won't cancel previous runs, wasting CI minutes. -
Missing
timeout-minutes— No timeout on the job means it could hang indefinitely.
The core classes are well designed — if the CI workflow is removed and the file is moved to the proper location, this would be a clean contribution.
|
Tradução para pt-BR do review acima: Conceito interessante, @LorenzaVolponi! O SimulatedLLM e ResilientExecutor são boas ideias. Algumas questões técnicas: Pontos positivos:
Problemas a resolver:
As classes em si são bem projetadas — se o workflow CI for removido e o arquivo for movido para o local correto, seria uma contribuição limpa. |
Re-review: follow-up (28/02)Oi @LorenzaVolponi, passando pra dar um follow-up na review de 4 dias atras. Nao vi commits novos nem resposta aos 7 pontos que levantei. Entendo que todo mundo tem seu tempo, mas queria reforcar que as classes Status dos 7 pontos (todos em aberto)
O que eu sugiro como caminho mais rapidoO fix mais impactante e remover o arquivo Se precisar de ajuda ou tiver duvida sobre algum ponto, e so falar! As classes em si estao solidas, so precisa desses ajustes estruturais. |
|
Opa @nikolasdehor, obrigado pelo follow-up, e peço desculpas pela demora em retorno, imprevistos técnicos ! Criei uma branch limpa (patch-4) para resolver todos os 7 pontos da sua review de uma vez: Mudanças: CI Workflow (Points 2, 6, 7): Removed .github/workflows/tests.yml to avoid CI duplication. Obrigado pela paciência e pelo review detalhado! Pronto para the merge. 🚀 |
🎯 Motivation
Looking at the open issues (specifically regarding unit tests), I noticed that testing agent logic currently depends on external API calls (latency, cost, flakiness). Additionally, agents can crash unexpectedly due to malformed JSON responses or parsing errors.
🛠 Solution
This PR introduces two critical components to stabilize the core:
SimulatedLLM: A mock class that mimics the LLM provider interface.
Allows unit tests to run deterministically and offline.
Eliminates API costs in CI/CD pipelines.
Enables testing of complex agent flows without dotenv or keys.
ResilientExecutor: A wrapper for agent execution loops.
Catches parsing errors (broken JSON) automatically.
Implements a retry mechanism to prevent the main process from crashing on temporary glitches.
📈 Impact
Speeds up development: Contributors can write tests without API keys.
Increases reliability: Agents become "crash-resistant" to malformed outputs.
Closes Gaps: Directly supports the goal of increasing test coverage mentioned in the Issues.
✅ How to Test
Check the SimulatedLLM implementation.
Run the example usage in a test environment (no keys required).
Verify that ResilientExecutor catches errors and retries logic.
Summary by CodeRabbit