Skip to content

Conversation

@mldangelo
Copy link
Member

Summary

  • Add four OpenClaw provider types: Chat (openclaw:main), Responses (openclaw:responses:main), WebSocket Agent (openclaw:agent:main), and Tool Invoke (openclaw:tools:bash)
  • Auto-detect gateway URL and auth token from ~/.openclaw/openclaw.json with env var and explicit config overrides
  • Prevent OpenAI env var fallback (OPENAI_API_KEY, OPENAI_BASE_URL) from leaking into OpenClaw providers

Test plan

  • 72 unit tests covering all four providers, factory routing, config resolution, JSON5 parsing, and edge cases
  • Lint (Biome), TypeScript, and build all pass
  • E2E tested against live OpenClaw gateway: 6/6 pass (chat, responses, ws-agent)

🤖 Generated with Claude Code

mldangelo and others added 3 commits February 8, 2026 00:49
Add four OpenClaw provider types for evaluating agents via the OpenClaw
gateway: Chat Completions (OpenAI-compatible), Responses API, WebSocket
Agent (native RPC protocol), and Tool Invoke (HTTP).

Includes auto-detection of gateway URL and auth token from
~/.openclaw/openclaw.json, env vars, or explicit config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ases

Prevent OpenAI env var fallback (OPENAI_API_KEY, OPENAI_BASE_URL) in
chat/responses providers, fix wildcard bind detection (0.0.0.0/::),
fix concurrent WS socket leak, scope thinking_level to WS Agent only,
and harden JSON5 trailing comma parser against string contents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate env vars example section (already covered in
Configuration), add policy-gated 404 note to Tool Invoke, and
dedupe Claude Agent SDK row in providers index table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@use-tusk
Copy link
Contributor

use-tusk bot commented Feb 8, 2026

⏩ No test execution environment matched (3cf9df9) View output ↗


View check history

Commit Status Output Created (UTC)
b741709 ⏩ No test execution environment matched Output Feb 8, 2026 6:32AM
a947001 ⏩ No test execution environment matched Output Feb 8, 2026 6:37AM
848a6e0 ⏩ No test execution environment matched Output Feb 8, 2026 6:51AM
3cf9df9 ⏩ No test execution environment matched Output Feb 9, 2026 8:56AM

@mldangelo mldangelo marked this pull request as ready for review February 8, 2026 06:34
Copy link
Contributor

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All Clear

I reviewed this PR which adds OpenClaw provider integration to promptfoo. The code implements client adapters that forward requests to an external OpenClaw gateway service. I examined authentication handling, configuration parsing, tool invocation proxying, and data flows. No LLM security vulnerabilities were found.

Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more


Was this helpful?  👍 Yes  |  👎 No 

Add tests for IPv6 wildcard bind, env override params, JSON5 edge
cases (single quotes, block comments in strings, commas in strings),
apiKeyRequired flag, custom header merging, toJSON serialization,
HTTPS→WSS conversion, malformed frame handling, non-assistant stream
filtering, tool name with colons, and resetConfigCache behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

🔄 Security Review ✅

No issues in new changes. The incremental commit (3cf9df9) adds TypeScript type annotations to unused parameters (_url: string, ..._args: any[]) in the WebSocketMock constructor in the test file. This is a type-only change with no behavioral impact.


Last updated: 2026-02-09T09:00:00Z | Reviewing: 3cf9df9

Copy link
Contributor

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All Clear

I reviewed this PR for LLM-specific security vulnerabilities. The PR adds OpenClaw provider integration following standard SDK patterns for credential handling and API communication. No exploitable LLM security vulnerabilities were found.

Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more


Was this helpful?  👍 Yes  |  👎 No 

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive support for the OpenClaw agent framework as a new provider type in promptfoo. The implementation includes four provider variants (Chat, Responses, WebSocket Agent, Tool Invoke), shared configuration utilities for gateway URL and authentication token resolution, environment variable definitions, type definitions, a factory pattern for provider instantiation, provider registry integration, example configuration and documentation, and an extensive test suite covering all provider types and utility functions. All changes are additive with no modifications to existing code.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • typpo
  • MrFlounder
  • danenania
  • swarnap
  • JustinBeckwith
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(providers): add OpenClaw multi-provider support' clearly summarizes the main change: adding a new multi-provider integration for OpenClaw with four provider types (Chat, Responses, Agent, Tool Invoke).
Description check ✅ Passed The description is directly related to the changeset, covering the four OpenClaw provider types, auto-detection of configuration, prevention of env var fallback, and comprehensive test coverage matching the actual changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/openclaw-provider

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/providers/openclaw/shared.ts`:
- Around line 52-58: The block-comment skipping logic can advance past the end
when the comment is unclosed; update the handler around the variables ch, raw
and index i so that after the inner while you only do the i += 2 advance if you
actually found the closing "*/" (i.e., raw[i] === '*' && raw[i+1] === '/');
otherwise set i = raw.length (or break/return from the containing function) to
stop parsing and avoid an out-of-bounds advance. Locate this logic in the block
that tests ch === '/' && raw[i + 1] === '*' and apply the conditional advance to
ensure i is not pushed beyond raw.length.
🧹 Nitpick comments (6)
site/docs/providers/openclaw.md (1)

1-6: Front matter description is shorter than recommended.

The description is approximately 104 characters, which is below the recommended 150-160 character range per coding guidelines. Consider expanding it slightly for better SEO and clarity.

 ---
 title: OpenClaw
 sidebar_label: OpenClaw
 sidebar_position: 42
-description: 'Use OpenClaw, a personal AI assistant framework, as an eval target with auto-detected gateway and auth'
+description: 'Use OpenClaw, a personal AI assistant framework, as an eval target with auto-detected gateway URL and authentication token from config or environment'
 ---

As per coding guidelines: "Front matter is required with title (under 60 chars), description (150-160 chars), and sidebar_position"

src/providers/openclaw/index.ts (1)

33-63: Minor: Consider merging env more carefully.

Line 40 overwrites any existing env in providerOptions with the env parameter. If a caller provides both providerOptions.env and the env parameter, the latter completely replaces the former rather than merging them.

This might be intentional (env parameter takes precedence), but consider if merging would be more appropriate:

- const opts = { ...providerOptions, env };
+ const opts = { ...providerOptions, env: { ...providerOptions.env, ...env } };

If the current behavior is intentional (full override), consider adding a brief comment for clarity.

test/providers/openclaw.test.ts (3)

407-559: Consider adding rate limit and token usage tracking tests for OpenClawToolInvokeProvider.

Per coding guidelines, provider tests should cover rate limits (429 responses) and token usage tracking. The current tests cover success, tool errors, HTTP errors (404), auth tokens, and network errors, but are missing:

  • Rate limit handling (HTTP 429)
  • Token usage tracking in responses
💡 Suggested additional test cases
it('should handle rate limit errors (429)', async () => {
  const provider = new OpenClawToolInvokeProvider('bash', {
    config: { gateway_url: 'http://test:18789' },
  });

  mockFetchWithProxy.mockResolvedValue({
    ok: false,
    status: 429,
    text: async () => 'Too Many Requests',
  } as Response);

  const result = await provider.callApi('{}');
  expect(result.error).toContain('429');
});

Based on learnings: "Every provider must have tests covering: success cases, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking"


561-577: Consider using a more specific type for messageHandlers.

The Function type is overly broad. Using a more specific callback signature improves type safety.

🔧 Suggested type improvement
   describe('OpenClawAgentProvider', () => {
     let mockWs: any;
-    let messageHandlers: Map<string, Function>;
+    let messageHandlers: Map<string, (data: Buffer | Error) => void>;

974-983: Good cleanup test, but consider verifying activeConnections is cleared.

The test verifies ws.close() is called but doesn't verify the internal activeConnections set is cleared. Since cleanup() explicitly clears this set, consider adding an assertion or a second call to ensure idempotency.

💡 Suggested enhancement
     it('should cleanup WS connection', async () => {
       const provider = new OpenClawAgentProvider('main', {
         config: { gateway_url: 'http://test:18789' },
       });

       // Start a call to create a WS
       provider.callApi('Hello');
       await provider.cleanup();
       expect(mockWs.close).toHaveBeenCalled();
+
+      // Verify cleanup is idempotent
+      mockWs.close.mockClear();
+      await provider.cleanup();
+      expect(mockWs.close).not.toHaveBeenCalled();
     });
src/providers/openclaw/shared.ts (1)

120-122: Use structured logging with object context.

Per coding guidelines, logger calls should use object context for auto-sanitization rather than string interpolation.

🔧 Suggested fix
   } catch (err) {
-    logger.debug(`Failed to read OpenClaw config at ${configPath}: ${err}`);
+    logger.debug('[OpenClaw] Failed to read config', { configPath, error: err });
     return undefined;
   }

As per coding guidelines: "Use the logger with object context (auto-sanitized) for logging in format: logger.debug('[Component] Message', { headers, body, config })"

Comment on lines 52 to 58
} else if (ch === '/' && raw[i + 1] === '*') {
// Block comment — skip to */
i += 2;
while (i < raw.length - 1 && !(raw[i] === '*' && raw[i + 1] === '/')) {
i++;
}
i += 2; // skip */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential out-of-bounds read in block comment handling.

If the input ends with an unclosed block comment (e.g., /* no end), the inner while loop condition i < raw.length - 1 prevents reading past the end, but the subsequent i += 2 at line 58 could push i beyond raw.length. This doesn't cause a runtime error (just iterates with undefined characters), but the logic could be tightened.

🛡️ Suggested defensive fix
     } else if (ch === '/' && raw[i + 1] === '*') {
       // Block comment — skip to */
       i += 2;
       while (i < raw.length - 1 && !(raw[i] === '*' && raw[i + 1] === '/')) {
         i++;
       }
-      i += 2; // skip */
+      // Skip closing */ if found
+      if (i < raw.length - 1 && raw[i] === '*' && raw[i + 1] === '/') {
+        i += 2;
+      }
🤖 Prompt for AI Agents
In `@src/providers/openclaw/shared.ts` around lines 52 - 58, The block-comment
skipping logic can advance past the end when the comment is unclosed; update the
handler around the variables ch, raw and index i so that after the inner while
you only do the i += 2 advance if you actually found the closing "*/" (i.e.,
raw[i] === '*' && raw[i+1] === '/'); otherwise set i = raw.length (or
break/return from the containing function) to stop parsing and avoid an
out-of-bounds advance. Locate this logic in the block that tests ch === '/' &&
raw[i + 1] === '*' and apply the conditional advance to ensure i is not pushed
beyond raw.length.

mldangelo and others added 2 commits February 8, 2026 01:50
Add bounds check before advancing past block comment close token to
prevent out-of-bounds read when input ends with an unclosed comment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rror

The WebSocketMock vi.fn() had no parameters, causing TypeScript to infer
mock.calls[0] as an empty tuple. This broke `tsc --noEmit` in CI, which
SIGTERMed the rest of the build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant