Skip to content

Commit db6bee0

Browse files
authored
Merge pull request #553 from enarjord/feature/fill-events-manager-production
v7.7.0: FillEventsManager production + comprehensive logging improvements
2 parents edb58f0 + 6267e85 commit db6bee0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+4248
-2418
lines changed

CLAUDE.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22

33
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44

5-
**IMPORTANT:** Also read and follow `docs/ai/passivbot_agent_principles.yaml` for detailed conventions on terminology, error handling, testing, and design principles.
5+
**IMPORTANT:** Also read and follow the documentation in `docs/ai/`:
6+
- `passivbot_agent_principles.yaml` - Conventions on terminology, error handling, testing, design principles
7+
- `exchange_api_quirks.md` - Known exchange API limitations and workarounds (check before implementing exchange code)
8+
- `debugging_case_studies.md` - Detailed debugging sessions as reference for complex investigations
9+
- `log_analysis_prompt.md` - Logging level definitions and examples
610

711
## Overview
812

@@ -227,9 +231,22 @@ python3 -m jupyter lab
227231

228232
### Logging
229233

230-
- Use structured, leveled logging (`error`, `warn`, `info`, `debug`, `trace`)
231-
- Every remote API call must have a debug-level log entry (endpoint, params, timing)
232-
- Logging should be non-intrusive but detailed enough for full replay/audit
234+
Use structured, leveled logging with clear separation between levels:
235+
236+
| Level | Audience | Content | Golden Rule |
237+
|-------|----------|---------|-------------|
238+
| **INFO** | Operators | Essential events: orders, fills, positions, balance, mode changes, health summaries | Must be sustainable to tail indefinitely in production |
239+
| **DEBUG** | Developers | Internal state: API timing, cache updates, decision points, fetch summaries | Tolerable for short debugging sessions |
240+
| **TRACE** | Deep debugging | Full firehose: API payloads, per-item iterations, raw data | Expect GB of logs; enable briefly for specific issues |
241+
242+
**Guidelines:**
243+
- INFO should answer "what is the bot doing?" without overwhelming
244+
- DEBUG should answer "why did it make that decision?"
245+
- TRACE should answer "what exact data did it see?"
246+
- Use `[tag]` format consistently: `[order]`, `[pos]`, `[fill]`, `[health]`, `[boot]`
247+
- Prefer `logging.info("msg %s", var)` over f-strings for log aggregation compatibility
248+
- Every remote API call should have a DEBUG-level log entry (endpoint, timing)
249+
- See `docs/ai/log_analysis_prompt.md` for detailed level definitions and examples
233250

234251
## Testing
235252

@@ -274,6 +291,28 @@ Configuration sections form an inheritance hierarchy. When adding new parameters
274291
- When adding dependencies, explain necessity and impact
275292
- Before committing, simulate/dry-run changes
276293

294+
## Documentation Structure
295+
296+
This file (`CLAUDE.md`) serves as the entry point for AI agents. Detailed topic-specific documentation lives in `docs/ai/`:
297+
298+
| File | Purpose |
299+
|------|---------|
300+
| `passivbot_agent_principles.yaml` | Core conventions: terminology, error handling, testing |
301+
| `exchange_api_quirks.md` | Exchange-specific API limitations and workarounds |
302+
| `debugging_case_studies.md` | Detailed debugging sessions as learning references |
303+
| `log_analysis_prompt.md` | Logging level definitions and analysis guidance |
304+
305+
**When to add new docs:**
306+
- **Exchange quirks** → Add to `exchange_api_quirks.md` (or create `{exchange}_quirks.md` if extensive)
307+
- **Complex debugging** → Add case study to `debugging_case_studies.md`
308+
- **New subsystem** → Consider a dedicated `{subsystem}.md` if >50 lines of guidance
309+
310+
**Modularization guidelines:**
311+
- Keep CLAUDE.md as a high-level overview (<300 lines ideal)
312+
- Move detailed reference material to `docs/ai/` subdirectory
313+
- Use consistent naming: `{topic}.md` or `{topic}_{subtopic}.md`
314+
- Always reference new docs from CLAUDE.md's header list
315+
277316
## Changelog
278317

279318
Maintain `CHANGELOG.md` as single source of truth for user-facing changes. Add entries under "Unreleased" as changes land; move to dated version heading when tagging releases.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
:warning: **Used at one's own risk** :warning:
66

7-
v7.6.2
7+
v7.7.0
88

99

1010
## Overview

changelog.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,46 @@
22

33
All notable user-facing changes will be documented in this file.
44

5+
## v7.7.0 - Unreleased
6+
7+
### Fixed
8+
- **Bybit: Missing PnL on some close fills** - Fixed pagination bug in `BybitFetcher._fetch_positions_history()` that caused closed-pnl records to be skipped when >100 records existed in a time window. Now uses hybrid pagination: cursor-based for recent records (no gaps), time-based sliding window for older records.
9+
10+
### Added
11+
- **Fill events now include psize/pprice** - Each fill event is annotated with position size (`psize`) and VWAP entry price (`pprice`) after the fill. Values are computed using a two-phase algorithm and persisted to cache for all exchanges.
12+
- **Logging best practices documentation** - New `docs/ai/log_analysis_prompt.md` with comprehensive logging guidelines, level definitions, and improvement tracking.
13+
- **Exchange API quirks documentation** - New `docs/ai/exchange_api_quirks.md` documenting known exchange-specific limitations and workarounds.
14+
- **Debugging case studies** - New `docs/ai/debugging_case_studies.md` with detailed debugging sessions as reference.
15+
16+
### Changed
17+
- **Logging improvements (7 rounds of refinement)**:
18+
- Standardized log tags: `[memory]`, `[warmup]`, `[hourly]`, `[fills]`, `[mapping]`, `[candle]`, `[ranking]`, `[mode]`
19+
- Moved routine API/cache messages from INFO to DEBUG level (CCXT fetch details, cache updates)
20+
- Moved CCXT API payloads from DEBUG to TRACE level
21+
- EMA ranking logs now throttled to every 5 minutes (was every cycle)
22+
- Mode changes throttled to 2 minutes per symbol (reduces forager oscillation noise)
23+
- KucoinFetcher PnL discrepancy warnings throttled to 1 hour with delta-based deduplication
24+
- WebSocket reconnection now logs explicit `[ws] reconnecting...` messages
25+
- Strict mode gaps changed from WARNING to DEBUG (expected for illiquid markets)
26+
- Persistent gaps changed from WARNING to INFO with throttling
27+
- Zero-candle synthesis warnings aggregated and rate-limited
28+
- **PnL tracking now uses FillEventsManager exclusively** - Legacy `update_pnls` path removed. FillEventsManager provides more accurate fill tracking with proper event deduplication, canonical schemas, and exchange-specific fetchers for all supported exchanges.
29+
- Fill events are now stored in `caches/fill_events/{exchange}/{user}/` instead of the old `caches/{exchange}/{user}_pnls.json` format. Existing legacy cache files are ignored; FillEventsManager will rebuild from exchange API on first run.
30+
- Unstuck allowances now computed from FillEventsManager data instead of legacy pnls list.
31+
- Trailing position change timestamps now derived from FillEventsManager events.
32+
33+
### Removed
34+
- `--shadow-mode` CLI flag (no longer needed; FillEventsManager is production-ready)
35+
- `live.pnls_manager_shadow_mode` config option
36+
- Legacy `init_pnls`, `update_pnls`, `fetch_pnls` methods in passivbot.py
37+
- Legacy `init_fill_events`, `update_fill_events`, `fetch_fill_events` methods (dead code)
38+
- Shadow mode comparison logging (`_compare_pnls_shadow`, etc.)
39+
40+
### Migration Notes
41+
- **No action required** - FillEventsManager automatically fetches and caches fill data
42+
- Old `{user}_pnls.json` cache files can be safely deleted after upgrading
43+
- If using custom exchange configurations, ensure the exchange's fill fetcher is supported (Binance, Bybit, Bitget, GateIO, Hyperliquid, KuCoin, OKX)
44+
545
## v7.6.2 - 2026-01-20
646

747
### Fixed

docs/ai/debugging_case_studies.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Debugging Case Studies
2+
3+
This document captures debugging sessions for complex issues, serving as a reference for future investigations.
4+
5+
## Case Study: Missing Bybit PnL Data (2026-01-25)
6+
7+
### Initial Symptom
8+
User reported that some Bybit close fills had `pnl: 0.0` while others had correct PnL values.
9+
10+
### Investigation Process
11+
12+
#### Step 1: Identify the Pattern
13+
```python
14+
# Load cached fill events and filter to closes
15+
closes = [x for x in events if x['side'] == 'sell' and x['position_side'] == 'long']
16+
zero_pnl = [c for c in closes if abs(c['pnl']) < 0.0001]
17+
```
18+
19+
Found: 1 XMR fill from Dec 30 had zero PnL, while Dec 28 and Jan 12+ fills had PnL.
20+
21+
#### Step 2: Check Raw Data
22+
Examined the `raw` field in the cached fill event:
23+
```python
24+
# Fill with zero PnL only had fetch_my_trades data, no positions_history
25+
event['raw']
26+
# [{'source': 'fetch_my_trades', 'data': {...}}]
27+
# Missing: {'source': 'positions_history', 'data': {...}}
28+
```
29+
30+
#### Step 3: Verify Data Exists on Exchange
31+
Created a test script to query Bybit directly:
32+
```python
33+
# scripts/check_missing_pnl.py
34+
params = {'category': 'linear', 'symbol': 'XMRUSDT', 'limit': 100, 'endTime': end_ms}
35+
result = await api.private_get_v5_position_closed_pnl(params)
36+
# Found the record! It exists on Bybit.
37+
```
38+
39+
**Key finding:** The closed-pnl record existed on Bybit but wasn't being fetched.
40+
41+
#### Step 4: Trace the Fetch Logic
42+
Added debug output to understand pagination:
43+
```python
44+
# Fetch #10: Dec 29 10:45 → Jan 03 14:23 (100 records)
45+
# Missing Dec 30 05:28 record should be in this window!
46+
```
47+
48+
**Key finding:** 128 records existed in the time window, but only 100 were fetched.
49+
50+
#### Step 5: Identify Root Cause
51+
The time-based pagination was skipping records:
52+
1. Fetch returns 100 records, oldest at timestamp T
53+
2. Next fetch uses `endTime = T`
54+
3. Records between T and the previous batch's oldest are skipped
55+
56+
#### Step 6: Test Cursor Pagination
57+
```python
58+
# Use cursor instead of time-based
59+
cursor = response.get('result', {}).get('nextPageCursor')
60+
params['cursor'] = cursor # Continue with cursor
61+
```
62+
63+
**Key finding:** Cursor pagination only covers ~7 days, then cursor becomes empty.
64+
65+
#### Step 7: Implement Hybrid Solution
66+
Combined both approaches:
67+
1. Use cursor pagination for recent data (no gaps)
68+
2. Fall back to time-based sliding window for older data
69+
3. Deduplicate by orderId
70+
71+
### Verification
72+
```
73+
Before fix: 387 records fetched (cursor-only), missing Dec 30 record
74+
After fix: 1434 records fetched (hybrid), all records including Dec 30
75+
```
76+
77+
### Key Lessons
78+
79+
1. **Don't trust CCXT wrappers blindly** - they may not expose all pagination mechanisms
80+
2. **Check raw API responses** - create test scripts to query directly
81+
3. **Understand pagination limits** - each exchange has different behaviors
82+
4. **Compare counts** - if you expect N records but get fewer, investigate
83+
5. **Check multiple endpoints** - the data might exist via different API calls
84+
85+
### Debug Scripts Created
86+
- `scripts/check_missing_pnl.py` - Query specific orderId directly
87+
- `scripts/debug_positions_history.py` - Trace pagination behavior
88+
- `scripts/verify_pagination_fix.py` - Verify fix works
89+
90+
These can be adapted for similar issues on other exchanges.
91+
92+
---
93+
94+
## Template for New Case Studies
95+
96+
### Initial Symptom
97+
(What the user reported or what was observed)
98+
99+
### Investigation Process
100+
1. Identify the pattern
101+
2. Check raw data
102+
3. Verify data exists at source
103+
4. Trace the code path
104+
5. Identify root cause
105+
6. Implement and verify fix
106+
107+
### Key Lessons
108+
(What was learned that applies to future debugging)

docs/ai/exchange_api_quirks.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Exchange API Quirks
2+
3+
This document catalogs known exchange API quirks, limitations, and workarounds discovered during development. When implementing exchange-specific code, check here first.
4+
5+
## Bybit
6+
7+
### Closed-PnL Pagination (Critical)
8+
9+
**Discovered:** 2026-01-25
10+
11+
**Problem:** Bybit's `/v5/position/closed-pnl` endpoint has two pagination mechanisms that behave differently:
12+
13+
1. **Cursor pagination** (`nextPageCursor`): Only covers ~7 days of recent data, then cursor becomes empty
14+
2. **Time-based pagination** (`endTime`): Can reach older data but may skip records when there are >100 records in a time window
15+
16+
**Symptoms:**
17+
- Close fills missing PnL (showing `pnl: 0.0`)
18+
- Inconsistent historical data - some old records present, others missing
19+
- CCXT's `fetch_positions_history` wrapper doesn't expose cursor, making it unreliable
20+
21+
**Root Cause:**
22+
When using time-based pagination alone:
23+
- If a time window has >100 records, only 100 are returned
24+
- Setting `endTime = oldest_timestamp_in_batch` for next request skips records between batches
25+
- Example: 128 records exist in window, only 100 fetched, 28 missed
26+
27+
**Solution:** Hybrid pagination in `BybitFetcher._fetch_positions_history`:
28+
1. Phase 1: Use cursor pagination for recent records (efficient, no gaps)
29+
2. Phase 2: When cursor exhausts (~7 days back), switch to time-based sliding window
30+
3. Deduplicate by orderId to handle overlap
31+
32+
**Code Reference:** `src/fill_events_manager.py` - `BybitFetcher._fetch_positions_history()`
33+
34+
**Testing:** Verified fetching 1434 records vs 387 (cursor-only) or 1200 (time-only with gaps)
35+
36+
### Closed-PnL Record Timing
37+
38+
Each close fill on Bybit immediately generates a closed-pnl record with `avgEntryPrice`. This means:
39+
- PnL can be computed per-fill, not just when position fully closes
40+
- The formula: `(exit_price - avgEntryPrice) * closedSize * direction - fees`
41+
- Old fills (>30 days) may have expired closed-pnl records on Bybit's servers
42+
43+
## Binance
44+
45+
(Add Binance-specific quirks here as discovered)
46+
47+
## General Patterns
48+
49+
### CCXT Wrapper Limitations
50+
51+
CCXT normalizes exchange APIs but sometimes loses important data:
52+
- Pagination cursors may not be exposed
53+
- Exchange-specific fields may be buried in `info` dict
54+
- Always check raw response (`trade.get("info", {})`) when CCXT fields are insufficient
55+
56+
### Debugging Missing Data
57+
58+
When data appears incomplete:
59+
1. **Query raw API directly** - bypass CCXT to see actual response
60+
2. **Check pagination** - are there more pages? Is cursor working?
61+
3. **Check time windows** - does the time range include the missing data?
62+
4. **Check data retention** - how long does the exchange keep this data?
63+
5. **Compare endpoints** - does another endpoint have the data?
64+
65+
See `docs/ai/debugging_case_studies.md` for detailed examples.

0 commit comments

Comments
 (0)