Please read this first
I found related compaction and session work, including #2317/#2322, #2333/#2334, #2343/#2344, #2727/#2728, #2944, and #3051. I did not find an issue or open PR for the specific data-loss case where OpenAIResponsesCompactionSession.run_compaction() clears the underlying session and then fails while writing compacted replacement items. #2944 and #3051 thread RunContextWrapper through Session methods, but they do not address atomic replacement or clear-then-add failure recovery.
Describe the bug
OpenAIResponsesCompactionSession.run_compaction() clears the underlying session before writing the compacted replacement items:
output_items = _normalize_compaction_output_items(compacted.output or [])
await self.underlying_session.clear_session()
output_items = _strip_orphaned_assistant_ids(output_items)
if output_items:
await self.underlying_session.add_items(output_items)
If clear_session() succeeds and add_items() fails, the original session history has already been deleted. This can happen with remote or persistent session backends if the write path fails after the clear path succeeds.
This is a data durability issue for SQLite, Redis, MongoDB, Dapr, SQLAlchemy, or custom Session implementations that can fail between clearing and rewriting compacted history.
Debug information
- Agents SDK version:
0.15.1
- Repository commit:
9b57f057b43d
- Python version:
Python 3.12.1
Repro steps
from __future__ import annotations
import asyncio
from typing import cast
from unittest.mock import AsyncMock, MagicMock
from agents.items import TResponseInputItem
from agents.memory import OpenAIResponsesCompactionSession
class AddFailsAfterClearSession:
session_id = "test-session"
session_settings = None
def __init__(self, history: list[TResponseInputItem]) -> None:
self.items = list(history)
async def get_items(self, limit: int | None = None) -> list[TResponseInputItem]:
if limit is None:
return list(self.items)
return self.items[-limit:]
async def add_items(self, items: list[TResponseInputItem]) -> None:
raise RuntimeError("backend write failed")
async def pop_item(self) -> TResponseInputItem | None:
return self.items.pop() if self.items else None
async def clear_session(self) -> None:
self.items.clear()
async def main() -> None:
history = [
cast(TResponseInputItem, {"type": "message", "role": "user", "content": "original"}),
cast(TResponseInputItem, {"type": "message", "role": "assistant", "content": "history"}),
]
underlying = AddFailsAfterClearSession(history)
compact_response = MagicMock()
compact_response.output = [
{"type": "message", "role": "assistant", "content": "compacted"}
]
client = MagicMock()
client.responses.compact = AsyncMock(return_value=compact_response)
session = OpenAIResponsesCompactionSession(
session_id="test",
underlying_session=underlying,
client=client,
compaction_mode="input",
)
print(f"before={await underlying.get_items()}")
try:
await session.run_compaction({"force": True})
except Exception as exc:
print(f"raised={type(exc).__name__}: {exc}")
print(f"after={await underlying.get_items()}")
asyncio.run(main())
Actual output on current main:
before=[{'type': 'message', 'role': 'user', 'content': 'original'}, {'type': 'message', 'role': 'assistant', 'content': 'history'}]
raised=RuntimeError: backend write failed
after=[]
The original history is gone after add_items() fails.
Expected behavior
Compaction should not irreversibly delete the existing session history unless the compacted replacement has been durably written.
Possible acceptable behaviors:
- Use a transactional replacement path for built-in persistent sessions that can support it.
- Add an optional
replace_items() capability to Session/SessionABC and use it from OpenAIResponsesCompactionSession.
- For generic sessions, avoid clear-then-add when no atomic replacement capability exists, or use a two-phase/shadow-write strategy where supported.
- If atomic replacement cannot be provided for a backend, fail without clearing the existing history.
Please read this first
I found related compaction and session work, including #2317/#2322, #2333/#2334, #2343/#2344, #2727/#2728, #2944, and #3051. I did not find an issue or open PR for the specific data-loss case where
OpenAIResponsesCompactionSession.run_compaction()clears the underlying session and then fails while writing compacted replacement items. #2944 and #3051 threadRunContextWrapperthrough Session methods, but they do not address atomic replacement or clear-then-add failure recovery.Describe the bug
OpenAIResponsesCompactionSession.run_compaction()clears the underlying session before writing the compacted replacement items:If
clear_session()succeeds andadd_items()fails, the original session history has already been deleted. This can happen with remote or persistent session backends if the write path fails after the clear path succeeds.This is a data durability issue for SQLite, Redis, MongoDB, Dapr, SQLAlchemy, or custom Session implementations that can fail between clearing and rewriting compacted history.
Debug information
0.15.19b57f057b43dPython 3.12.1Repro steps
Actual output on current
main:The original history is gone after
add_items()fails.Expected behavior
Compaction should not irreversibly delete the existing session history unless the compacted replacement has been durably written.
Possible acceptable behaviors:
replace_items()capability toSession/SessionABCand use it fromOpenAIResponsesCompactionSession.