Skip to content

Latest commit

 

History

History
83 lines (62 loc) · 2.98 KB

File metadata and controls

83 lines (62 loc) · 2.98 KB

Agentic Sandbox

You are building an AI coding agent that calls third-party APIs. Running the agent against the real service during development is expensive, rate-limited, and produces unpredictable results.

Problem

Third-party API calls from an agent cost money, consume quota, and can fail for reasons unrelated to the agent's logic. Reproducibility is essential for iterating on agent behavior, but real services are not reproducible.

Solution

Point the agent at a Counterfact mock instead of the real service. Control exactly what the mock returns so you can test every response scenario — including failures — cheaply and repeatably. Use the REPL to change mock behavior while the agent is running, without restarting anything.

Example

Generate the mock from the target API's OpenAPI spec:

npx counterfact@latest https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.yaml stripe-mock

Configure the agent to point at the mock instead of the real Stripe API:

const stripe = new Stripe("sk_test_fake", { host: "localhost", port: 3100, protocol: "http" });

Customize the handler to return exactly what your agent needs to see:

// stripe-mock/routes/v1/charges.ts
export const POST: HTTP_POST = ($) => {
  return $.response[200].json({
    id: "ch_mock_001",
    status: "succeeded",
    amount: $.body.amount,
    currency: $.body.currency,
  });
};

To test how the agent handles a rate limit, toggle a context flag and steer the agent from the REPL while it runs:

// stripe-mock/routes/_.context.ts
export class Context {
  simulateRateLimit = false;
}
// stripe-mock/routes/v1/charges.ts
export const POST: HTTP_POST = ($) => {
  if ($.context.simulateRateLimit) {
    return $.response[429].json({ error: { message: "Too many requests" } });
  }
  return $.response[200].json({
    id: "ch_mock_001",
    status: "succeeded",
    amount: $.body.amount,
    currency: $.body.currency,
  });
};
⬣> context.simulateRateLimit = true

The agent's next request hits the 429. Its retry logic runs for real.

Consequences

  • Every request is local, instantaneous, and free — iteration speed is limited only by the agent's logic.
  • Response content is fully controlled, so agent behavior is reproducible across runs.
  • The mock does not replicate the real API's stateful semantics unless you implement them explicitly.
  • The mock is only as accurate as the OpenAPI spec it was generated from.

Related Patterns