-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add unit test for text loss in claude code #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,156 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Copyright (c) Microsoft. All rights reserved. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import asyncio | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from ast import literal_eval | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import Any, Dict, List, Sequence, Type, Union, cast | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+7
to
+8
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import Any, Dict, List, Sequence, Type, Union, cast |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'anthropic' is not used.
| import anthropic |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'openai' is not used.
| import openai |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'CustomLogger' is not used.
| from litellm.integrations.custom_logger import CustomLogger |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'SWEbenchInstance' is not used.
| from swebench.harness.constants import SWEbenchInstance |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LitAgentRunner' is not used.
Import of 'OtelTracer' is not used.
| from agentlightning import LitAgentRunner, OtelTracer | |
| # from agentlightning import LitAgentRunner, OtelTracer |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LightningStore' is not used.
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded | |
| from agentlightning.store import LightningStoreServer, LightningStoreThreaded |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LLM' is not used.
Import of 'Span' is not used.
| from agentlightning.types import LLM, Span |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function name test_claude_code is vague and doesn't clearly describe what is being tested. Based on the PR title "add unit test for text loss in claude code", consider a more descriptive name like test_claude_code_text_preservation_across_turns or test_claude_code_response_text_in_next_prompt.
| async def test_claude_code(otlp_enabled: bool): | |
| async def test_claude_code_text_preservation_across_turns(otlp_enabled: bool): |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function lacks a docstring explaining its purpose, what it tests, and any setup requirements (like the remote endpoint). Add a docstring that describes: 1) What aspect of text preservation is being tested, 2) The test methodology, 3) The expected behavior being validated.
| async def test_claude_code(otlp_enabled: bool): | |
| async def test_claude_code(otlp_enabled: bool): | |
| """ | |
| Test the Claude Code agent's ability to preserve text content when interacting with a remote LLM endpoint. | |
| This test loads a sample from the SWE-bench dataset, initializes the Claude Code agent, and runs it using a specified | |
| remote model endpoint. It checks that the agent processes the input and preserves the relevant text content throughout | |
| its operation. The test requires the remote endpoint to be available and the specified model to be deployed. | |
| Args: | |
| otlp_enabled (bool): Whether to enable OTLP tracing for the test. | |
| Expected behavior: | |
| - The agent should process the input sample and preserve the text content as expected. | |
| - The test will interact with a remote endpoint, so network connectivity and endpoint availability are required. | |
| """ |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is parameterized with only a single value True for otlp_enabled. This is unusual for parameterization - typically you'd either remove the parameterization and set it to a constant, or test both True and False cases. If only True is needed, remove the parameterization decorator.
| @pytest.mark.parametrize( | |
| "otlp_enabled", | |
| [ | |
| True, | |
| ], | |
| ) | |
| async def test_claude_code(otlp_enabled: bool): | |
| async def test_claude_code(): | |
| otlp_enabled = True |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded model name and endpoint suggest this test requires external infrastructure to run. Consider using a test fixture or mock, or document this as a manual test that requires specific setup. This will make the test suite more portable and easier to run in CI/CD environments.
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 5 for max_turns is not explained. Consider adding a comment explaining why this specific value is chosen, or define it as a named constant at the module level (e.g., TEST_MAX_TURNS = 5).
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model version "claude-sonnet-4-5-20250929" has a date of 2025-09-29, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| # NOTE: The model names below ("claude-sonnet-4-5-20250929", "claude-haiku-4-5-20251001") are placeholders for testing purposes. | |
| # They do not refer to real, documented model versions. |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model version "claude-haiku-4-5-20251001" has a date of 2025-10-01, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, | |
| }, | |
| # NOTE: The following model name is intentionally fictional and used as a test placeholder. | |
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing the private attribute _access_host of server_launcher is not recommended as it couples the test to internal implementation details. If this property needs to be overridden for testing, consider adding a public API or test hook in the LLMProxy class.
| proxy.server_launcher._access_host = "localhost" | |
| # Avoid direct access to private attribute _access_host. | |
| # If LLMProxy or server_launcher exposes a public setter, use it here. | |
| # For example: proxy.server_launcher.set_access_host("localhost") | |
| # If not, consider adding a public API to LLMProxy/server_launcher for testing purposes. | |
| # (Direct access to _access_host is discouraged and flagged by CodeQL.) |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debug print statements should be removed or replaced with proper logging. These statements can clutter test output and are typically used during development but should not remain in production test code.
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "# Dry run to generate spans" could be more descriptive. Consider clarifying that this is executing the actual test scenario: "# Execute ClaudeCodeAgent rollout to generate spans for validation".
| # Dry run to generate spans | |
| # Execute ClaudeCodeAgent rollout to generate spans for validation |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests should not write to hardcoded directories like "logs/". This creates several issues: 1) The directory may not exist, causing the test to fail. 2) It leaves artifacts on the filesystem after test completion. 3) It can cause issues in CI/CD environments. Use pytest's tmp_path fixture instead to write to a temporary directory that is automatically cleaned up.
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing this print statement or converting it to use a proper logger. Print statements in tests can clutter CI/CD output and should generally be avoided unless debugging.
| print(f"Generated {len(spans)} spans with {len(valid_spans)} LLM requests.") |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "# Test case 2:" is incomplete. It should describe what Test case 2 is verifying. Consider adding a descriptive comment like "# Test case 2: Verify that previous response text appears in the next prompt".
| # Test case 2: | |
| # Test case 2: Verify that previous response text appears in the next prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'asyncio' is not used.