Skip to content

Commit ae0f11b

Browse files
teocnsdot-agi
andauthored
Test suite v0.4 (#637)
* test: add `__init__` to make `tests/` a package * test: add llm_event_spy fixture for tests * test: add VCR.py fixture for HTTP interaction recording * deps: group integration-testing * test: add fixture to mock package availability in tests * test: Add integration tests for OpenAI provider and features * test: add tests for concurrent API requests handling * Improve vcr.py configuration Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * chore(pyproject): update pytest options and loop scope * chore(tests): update vcr.py ignore_hosts and options * pyproject.toml Signed-off-by: Teo <teocns@gmail.com> * centralize teardown in conftest.py (clear singletons, end all sessions) Signed-off-by: Teo <teocns@gmail.com> * change vcr_config scope to session Signed-off-by: Teo <teocns@gmail.com> * integration: auto start agentops session Signed-off-by: Teo <teocns@gmail.com> * Move unit tests to dedicated folder (tests/unit) Signed-off-by: Teo <teocns@gmail.com> * Isolate vcr_config import into tests/integration Signed-off-by: Teo <teocns@gmail.com> * configure pytest to run only unit tests by default, and include integration tests only when explicitly specified. Signed-off-by: Teo <teocns@gmail.com> * ci(python-tests): separate job between unit-integration tests * set python-tests timeout to 5 minutes Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * Implement jwt fixture, centralized reusable mock_req into conftest.py Signed-off-by: Teo <teocns@gmail.com> reauthorize Signed-off-by: Teo <teocns@gmail.com> * ci(python-tests): simplify env management, remove cov from integration-teests Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * fix: cassette for test_concurrent_api_requests Signed-off-by: Teo <teocns@gmail.com> * Cleanup vcr.py comments Signed-off-by: Teo <teocns@gmail.com> * add a `TODO` for removing `vcrpy` git version after its release * refactor openai assistants response handling for easier testing * add more keys for different llm providers * add integration tests for other providers * remove openai version limitation * add providers as deps * chore: add mistralai to test dependencies * remove `mistral` from dependency since its incorrect * ruff * re-record cassettes * tests/fixtures/providers: fallback to `test-api-key` if no provider is found all provider fixtures will: Use the actual API key if it's set in the environment Fall back to "test-api-key" if no environment variable is found Signed-off-by: Teo <teocns@gmail.com> * set keys for `litellm` * Improve tests/integration/test_llm_providers.py openai assistants Signed-off-by: Teo <teocns@gmail.com> * Make integration tests appropriately skip, regenerate x1 cassette Signed-off-by: Teo <teocns@gmail.com> * explicit tests/integration/conftest finxtures import Signed-off-by: Teo <teocns@gmail.com> * deps: improve dev packages versionings * Make integration tests run with python 3.12 Signed-off-by: Teo <teocns@gmail.com> * add uv.lock Signed-off-by: Teo <teocns@gmail.com> * test concurrent api requests: remove matcher on method, possibly causing contingent error Signed-off-by: Teo <teocns@gmail.com> * Run static-analysis with python 3.12.2 Signed-off-by: Teo <teocns@gmail.com> --------- Signed-off-by: Teo <teocns@gmail.com> Co-authored-by: Pratyush Shukla <ps4534@nyu.edu>
1 parent 81c60c6 commit ae0f11b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+7335
-330
lines changed

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
uv.lock binary

.github/workflows/python-tests.yaml

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
# :: Use nektos/act to run this locally
22
# :: Example:
3-
# :: `act push -j python-tests --matrix python-version:3.10 --container-architecture linux/amd64`
3+
# :: `act push -j unit-tests --matrix python-version:3.10 --container-architecture linux/amd64`
4+
#
5+
# This workflow runs two separate test suites:
6+
# 1. Unit Tests (python-tests job):
7+
# - Runs across Python 3.9 to 3.13
8+
# - Located in tests/unit directory
9+
# - Coverage report uploaded to Codecov for Python 3.11 only
10+
#
11+
# 2. Integration Tests (integration-tests job):
12+
# - Runs only on Python 3.13
13+
# - Located in tests/integration directory
14+
# - Longer timeout (15 min vs 10 min for unit tests)
15+
# - Separate cache for dependencies
16+
417
name: Python Tests
518
on:
619
workflow_dispatch: {}
@@ -23,10 +36,12 @@ on:
2336
- 'tests/**/*.ipynb'
2437

2538
jobs:
26-
python-tests:
39+
unit-tests:
2740
runs-on: ubuntu-latest
2841
env:
2942
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
43+
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
44+
PYTHONUNBUFFERED: "1"
3045

3146
strategy:
3247
matrix:
@@ -49,14 +64,10 @@ jobs:
4964
run: |
5065
uv sync --group test --group dev
5166
52-
- name: Run tests with coverage
53-
timeout-minutes: 10
67+
- name: Run unit tests with coverage
68+
timeout-minutes: 5
5469
run: |
55-
uv run -m pytest tests/ -v --cov=agentops --cov-report=xml
56-
env:
57-
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
58-
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
59-
PYTHONUNBUFFERED: "1"
70+
uv run -m pytest tests/unit -v --cov=agentops --cov-report=xml
6071
6172
# Only upload coverage report for python3.11
6273
- name: Upload coverage to Codecov
@@ -68,3 +79,31 @@ jobs:
6879
flags: unittests
6980
name: codecov-umbrella
7081
fail_ci_if_error: true # Should we?
82+
83+
integration-tests:
84+
runs-on: ubuntu-latest
85+
env:
86+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
87+
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
88+
PYTHONUNBUFFERED: "1"
89+
90+
steps:
91+
- uses: actions/checkout@v4
92+
93+
- name: Setup UV
94+
uses: astral-sh/setup-uv@v5
95+
continue-on-error: true
96+
with:
97+
python-version: "3.12"
98+
enable-cache: true
99+
cache-suffix: uv-3.12-integration
100+
cache-dependency-glob: "**/pyproject.toml"
101+
102+
- name: Install dependencies
103+
run: |
104+
uv sync --group test --group dev
105+
106+
- name: Run integration tests
107+
timeout-minutes: 5
108+
run: |
109+
uv run pytest tests/integration

.github/workflows/static-analysis.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
with:
4141
enable-cache: true
4242
cache-dependency-glob: "**/pyproject.toml"
43-
python-version: "3.11.10"
43+
python-version: "3.12.2"
4444

4545
- name: Install packages
4646
run: |

agentops/llms/providers/openai.py

Lines changed: 64 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,69 @@ async def async_generator():
136136

137137
return response
138138

139+
def handle_assistant_response(self, response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
140+
"""Handle response based on return type"""
141+
from openai.pagination import BasePage
142+
143+
action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
144+
if session is not None:
145+
action_event.session_id = session.session_id
146+
147+
try:
148+
# Set action type and returns
149+
action_event.action_type = (
150+
response.__class__.__name__.split("[")[1][:-1]
151+
if isinstance(response, BasePage)
152+
else response.__class__.__name__
153+
)
154+
action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
155+
action_event.end_timestamp = get_ISO_time()
156+
self._safe_record(session, action_event)
157+
158+
# Create LLMEvent if usage data exists
159+
response_dict = response.model_dump() if hasattr(response, "model_dump") else {}
160+
161+
if "id" in response_dict and response_dict.get("id").startswith("run"):
162+
if response_dict["id"] not in self.assistants_run_steps:
163+
self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}
164+
165+
if "usage" in response_dict and response_dict["usage"] is not None:
166+
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
167+
if session is not None:
168+
llm_event.session_id = session.session_id
169+
170+
llm_event.model = response_dict.get("model")
171+
llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
172+
llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
173+
llm_event.end_timestamp = get_ISO_time()
174+
self._safe_record(session, llm_event)
175+
176+
elif "data" in response_dict:
177+
for item in response_dict["data"]:
178+
if "usage" in item and item["usage"] is not None:
179+
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
180+
if session is not None:
181+
llm_event.session_id = session.session_id
182+
183+
llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
184+
llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
185+
llm_event.completion_tokens = item["usage"]["completion_tokens"]
186+
llm_event.end_timestamp = get_ISO_time()
187+
self._safe_record(session, llm_event)
188+
189+
except Exception as e:
190+
self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))
191+
192+
kwargs_str = pprint.pformat(kwargs)
193+
response = pprint.pformat(response)
194+
logger.warning(
195+
f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
196+
f"response:\n {response}\n"
197+
f"kwargs:\n {kwargs_str}\n"
198+
)
199+
200+
return response
201+
139202
def override(self):
140203
self._override_openai_v1_completion()
141204
self._override_openai_v1_async_completion()
@@ -234,68 +297,6 @@ def _override_openai_assistants_beta(self):
234297
"""Override OpenAI Assistants API methods"""
235298
from openai._legacy_response import LegacyAPIResponse
236299
from openai.resources import beta
237-
from openai.pagination import BasePage
238-
239-
def handle_response(response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
240-
"""Handle response based on return type"""
241-
action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
242-
if session is not None:
243-
action_event.session_id = session.session_id
244-
245-
try:
246-
# Set action type and returns
247-
action_event.action_type = (
248-
response.__class__.__name__.split("[")[1][:-1]
249-
if isinstance(response, BasePage)
250-
else response.__class__.__name__
251-
)
252-
action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
253-
action_event.end_timestamp = get_ISO_time()
254-
self._safe_record(session, action_event)
255-
256-
# Create LLMEvent if usage data exists
257-
response_dict = response.model_dump() if hasattr(response, "model_dump") else {}
258-
259-
if "id" in response_dict and response_dict.get("id").startswith("run"):
260-
if response_dict["id"] not in self.assistants_run_steps:
261-
self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}
262-
263-
if "usage" in response_dict and response_dict["usage"] is not None:
264-
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
265-
if session is not None:
266-
llm_event.session_id = session.session_id
267-
268-
llm_event.model = response_dict.get("model")
269-
llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
270-
llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
271-
llm_event.end_timestamp = get_ISO_time()
272-
self._safe_record(session, llm_event)
273-
274-
elif "data" in response_dict:
275-
for item in response_dict["data"]:
276-
if "usage" in item and item["usage"] is not None:
277-
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
278-
if session is not None:
279-
llm_event.session_id = session.session_id
280-
281-
llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
282-
llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
283-
llm_event.completion_tokens = item["usage"]["completion_tokens"]
284-
llm_event.end_timestamp = get_ISO_time()
285-
self._safe_record(session, llm_event)
286-
287-
except Exception as e:
288-
self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))
289-
290-
kwargs_str = pprint.pformat(kwargs)
291-
response = pprint.pformat(response)
292-
logger.warning(
293-
f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
294-
f"response:\n {response}\n"
295-
f"kwargs:\n {kwargs_str}\n"
296-
)
297-
298-
return response
299300

300301
def create_patched_function(original_func):
301302
def patched_function(*args, **kwargs):
@@ -309,7 +310,7 @@ def patched_function(*args, **kwargs):
309310
if isinstance(response, LegacyAPIResponse):
310311
return response
311312

312-
return handle_response(response, kwargs, init_timestamp, session=session)
313+
return self.handle_assistant_response(response, kwargs, init_timestamp, session=session)
313314

314315
return patched_function
315316

pyproject.toml

Lines changed: 36 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -41,30 +41,47 @@ dependencies = [
4141

4242
[dependency-groups]
4343
test = [
44-
"openai>=1.0.0,<2.0.0",
45-
"langchain",
44+
"openai>=1.0.0",
45+
"anthropic",
46+
"cohere",
47+
"litellm",
48+
"ai21>=3.0.0",
49+
"groq",
50+
"ollama",
51+
"mistralai",
52+
# ;;
53+
# The below is a really hard dependency, that can be installed only between python >=3.10,<3.13.
54+
# CI will fail because all tests will automatically pull this dependency group;
55+
# we need a separate group specifically for integration tests which will run on pinned 3.1x
56+
# ------------------------------------------------------------------------------------------------------------------------------------
57+
# "crewai-tools @ git+https://github.com/crewAIInc/crewAI-tools.git@a14091abb24527c97ccfcc8539d529c8b4559a0f; python_version>='3.10'",
58+
# ------------------------------------------------------------------------------------------------------------------------------------
59+
# ;;
60+
"autogen<0.4.0",
4661
"pytest-cov",
62+
"fastapi[standard]",
4763
]
4864

4965
dev = [
5066
# Testing essentials
51-
"pytest>=7.4.0,<8.0.0", # Testing framework with good async support
52-
"pytest-depends", # For testing complex agent workflows
53-
"pytest-asyncio", # Async test support for testing concurrent agent operations
54-
"pytest-mock", # Mocking capabilities for isolating agent components
55-
"pyfakefs", # File system testing
56-
"pytest-recording", # Alternative to pytest-vcr with better Python 3.x support
57-
"vcrpy @ git+https://github.com/kevin1024/vcrpy.git@81978659f1b18bbb7040ceb324a19114e4a4f328",
67+
"pytest>=8.0.0", # Testing framework with good async support
68+
"pytest-depends", # For testing complex agent workflows
69+
"pytest-asyncio", # Async test support for testing concurrent agent operations
70+
"pytest-mock", # Mocking capabilities for isolating agent components
71+
"pyfakefs", # File system testing
72+
"pytest-recording", # Alternative to pytest-vcr with better Python 3.x support
73+
# TODO: Use release version after vcrpy is released with this fix.
74+
"vcrpy @ git+https://github.com/kevin1024/vcrpy.git@5f1b20c4ca4a18c1fc8cfe049d7df12ca0659c9b",
5875
# Code quality and type checking
59-
"ruff", # Fast Python linter for maintaining code quality
60-
"mypy", # Static type checking for better reliability
61-
"types-requests", # Type stubs for requests library
62-
76+
"ruff", # Fast Python linter for maintaining code quality
77+
"mypy", # Static type checking for better reliability
78+
"types-requests", # Type stubs for requests library
6379
# HTTP mocking and environment
6480
"requests_mock>=1.11.0", # Mock HTTP requests for testing agent external communications
65-
"python-dotenv", # Environment management for secure testing
66-
81+
"python-dotenv", # Environment management for secure testing
6782
# Agent integration testing
83+
"pytest-sugar>=1.0.0",
84+
"pdbpp>=0.10.3",
6885
]
6986

7087
# CI dependencies
@@ -89,19 +106,17 @@ constraint-dependencies = [
89106
# For Python ≥3.10 (where autogen-core might be present), use newer versions
90107
"opentelemetry-api>=1.27.0; python_version>='3.10'",
91108
"opentelemetry-sdk>=1.27.0; python_version>='3.10'",
92-
"opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'"
109+
"opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'",
93110
]
94111

95112
[tool.autopep8]
96113
max_line_length = 120
97114

98115
[tool.pytest.ini_options]
99116
asyncio_mode = "auto"
100-
asyncio_default_fixture_loop_scope = "function" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
101-
test_paths = [
102-
"tests",
103-
]
104-
addopts = "--tb=short -p no:warnings"
117+
asyncio_default_fixture_loop_scope = "module" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
118+
testpaths = ["tests/unit"] # Default to unit tests
119+
addopts = "--tb=short -p no:warnings --import-mode=importlib --ignore=tests/integration" # Ignore integration by default
105120
pythonpath = ["."]
106121
faulthandler_timeout = 30 # Reduced from 60
107122
timeout = 60 # Reduced from 300

tests/__init__.py

Whitespace-only changes.

tests/fixtures/event.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
from collections import defaultdict
2+
from typing import TYPE_CHECKING
3+
4+
import pytest
5+
6+
if TYPE_CHECKING:
7+
from pytest_mock import MockerFixture
8+
9+
10+
@pytest.fixture(scope="function")
11+
def llm_event_spy(agentops_client, mocker: "MockerFixture") -> dict[str, "MockerFixture"]:
12+
"""
13+
Fixture that provides spies on both providers' response handling
14+
15+
These fixtures are reset on each test run (function scope). To use it,
16+
simply pass it as an argument to the test function. Example:
17+
18+
```
19+
def test_my_test(llm_event_spy):
20+
# test code here
21+
llm_event_spy["litellm"].assert_called_once()
22+
```
23+
"""
24+
from agentops.llms.providers.anthropic import AnthropicProvider
25+
from agentops.llms.providers.litellm import LiteLLMProvider
26+
from agentops.llms.providers.openai import OpenAiProvider
27+
28+
return {
29+
"litellm": mocker.spy(LiteLLMProvider(agentops_client), "handle_response"),
30+
"openai": mocker.spy(OpenAiProvider(agentops_client), "handle_response"),
31+
"anthropic": mocker.spy(AnthropicProvider(agentops_client), "handle_response"),
32+
}

tests/fixtures/packaging.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import builtins
2+
import pytest
3+
4+
5+
@pytest.fixture
6+
def hide_available_pkg(monkeypatch):
7+
"""
8+
Hide the availability of a package by mocking the __import__ function.
9+
10+
Usage:
11+
@pytest.mark.usefixtures('hide_available_pkg')
12+
def test_message():
13+
with pytest.raises(ImportError, match='Install "pkg" to use test_function'):
14+
foo('test_function')
15+
16+
Source:
17+
https://stackoverflow.com/questions/60227582/making-a-python-test-think-an-installed-package-is-not-available
18+
"""
19+
import_orig = builtins.__import__
20+
21+
def mocked_import(name, *args, **kwargs):
22+
if name == "pkg":
23+
raise ImportError()
24+
return import_orig(name, *args, **kwargs)
25+
26+
monkeypatch.setattr(builtins, "__import__", mocked_import)

0 commit comments

Comments
 (0)