@@ -198,9 +198,69 @@ The bot service calls the Byte API for database operations. See `src/byte_bot/ap
198198- `create_guild(guild_data : dict)` - Create new guild
199199- `update_guild(guild_id : str, guild_data: dict)` - Update guild
200200- `delete_guild(guild_id : str)` - Delete guild
201+ - ` health_check()` - Check API service health
201202
202203All database operations go through the API - the bot does not directly access the database.
203204
205+ # ## API Client Retry Logic
206+
207+ The bot's API client implements automatic retry logic with exponential backoff to improve reliability when communicating with the API service.
208+
209+ # ### Configuration
210+
211+ - **Max Retries**: 3 attempts per request
212+ - **Backoff**: Exponential (1s, 2s, 4s, max 10s between retries)
213+ - **Retryable Errors**: HTTP 5xx (server errors), connection errors
214+ - **Non-Retryable**: HTTP 4xx (client errors like 400, 401, 404)
215+ - **Timeout**: 10s request timeout
216+
217+ # ### Retry Behavior
218+
219+ **Retryable Errors (Automatic Retry):**
220+
221+ The client automatically retries on :
222+
223+ - HTTP 5xx errors : 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout
224+ - Connection errors : ` ConnectError` , `ConnectTimeout`, `ReadTimeout`
225+
226+ **Non-Retryable Errors (Immediate Failure):**
227+
228+ The client does **not** retry on :
229+
230+ - HTTP 4xx errors : 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found
231+ - These errors indicate client-side issues that won't be resolved by retrying
232+
233+ # ### Retry Statistics
234+
235+ Access retry statistics via `api_client.retry_stats` :
236+
237+ ` ` ` python
238+ client = ByteAPIClient("http://api:8000")
239+
240+ # Make some requests...
241+ await client.create_guild(...)
242+ await client.get_guild(...)
243+
244+ # Check stats
245+ print(client.retry_stats)
246+ # {
247+ # "total_retries": 5, # Total retry attempts across all methods
248+ # "failed_requests": 2, # Failed requests (4xx errors)
249+ # "retried_methods": { # Retries per method
250+ # "create_guild": 3,
251+ # "get_guild": 2,
252+ # }
253+ # }
254+ ` ` `
255+
256+ # ### Logging
257+
258+ The client logs retry attempts at **WARNING** level before each retry :
259+
260+ ` ` `
261+ WARNING Retrying byte_bot.api_client.ByteAPIClient.create_guild in 1 seconds as it raised HTTPStatusError: Server error '500 Internal Server Error'
262+ ` ` `
263+
204264# # Testing
205265
206266` ` ` bash
@@ -210,23 +270,25 @@ uv run pytest services/bot/tests
210270
211271# # Status
212272
213- **Phase 1 .2**: ✅ Complete
273+ **Phase 3 .2**: ✅ Complete
214274
215275- [x] Bot service extracted from monolith
216276- [x] All plugins migrated to `services/bot/src/byte_bot/plugins/`
217277- [x] All views migrated to `services/bot/src/byte_bot/views/`
218- - [x] API client implemented
278+ - [x] API client implemented with retry logic
219279- [x] Configuration system set up
220280- [x] Entry points configured
221281- [x] Import errors fixed
222282- [x] Dockerfile created
223- - [ ] Tests implemented (TODO : Phase 1.3)
283+ - [x] Comprehensive test suite (25+ retry tests)
284+ - [x] Retry logic with exponential backoff
285+ - [x] Retry statistics tracking
224286
225287# # Next Steps
226288
227- - **Phase 1 .3**: Implement comprehensive test suite
228- - **Phase 2 **: Remove old monolith code (`byte_bot/byte/`)
229- - **Phase 3 **: Deploy bot and API services independently
289+ - **Phase 3 .3**: Implement circuit breaker pattern (optional)
290+ - **Phase 4 **: Add caching layer to API client
291+ - **Phase 5 **: Implement health checks and monitoring
230292
231293# # License
232294
0 commit comments