feat(quota): add initial implementation of the quota dashboard #1

Masood-Salik · 2026-01-03T10:15:21Z

No description provided.

…ng_content separation - Introduce Gemini 3 special mechanics in AntigravityProvider: - append a constant thoughtSignature into functionCall payloads to preserve Gemini reasoning continuity - filter out thoughtSignature parts from returned content to avoid exposing encrypted reasoning data - separate parts flagged with thought=true into a new reasoning_content field while keeping regular content in content - include thoughtsTokenCount in token accounting: prompt_tokens now includes reasoning tokens and reasoning_tokens are reported under completion_tokens_details.reasoning_tokens when present - Update comments, docstrings, and conversion logic to reflect Gemini 3 behavior - Rotate Antigravity OAuth client secret in AntigravityAuthBase

…token counting Add a per-request file logger and reasoning configuration mapping to the Antigravity provider and expose a token counting helper. - Introduce _AntigravityFileLogger to persist request payloads, streaming chunks, errors, and final responses under logs/antigravity_logs with timestamped directories. - Add optional enable_request_logging kwarg to completion flow to enable per-call file logging; wire logger through streaming and non-streaming handlers. - Log request payloads, raw response chunks, parse errors, and final unwrapped responses when enabled. - Add _map_reasoning_effort_to_thinking_config to map reasoning_effort ('low'|'medium'|'high'|'disable'|None) to Gemini thinkingConfig for gemini-2.5 and gemini-3 families (budgets/levels and include_thoughts). - Add count_tokens method that calls Antigravity :countTokens endpoint using transformed Gemini payloads and returns prompt/total token counts. - Add cautionary comment about Claude parametersJsonSchema handling requiring investigation. No behavioral breaking changes; new logging is opt-in via enable_request_logging and token counting is additive.

…budget toggle Introduce a consolidated mapping for reasoning effort targeted at Gemini 2.5 and Gemini 3 models: - Replace older duplicated logic with a single _map_reasoning_effort_to_thinking_config that detects gemini-2.5 vs gemini-3. - Gemini 2.5: map reasoning_effort to model-specific thinkingBudget values (pro/flash/fallback). Default auto = -1. Apply division by 4 unless kwargs['custom_reasoning_budget'] is True. - Gemini 3: use string thinkingLevel ("low" or "high"), default to "high" when unspecified and do not allow disabling thinking. - Return None for non-Gemini models to avoid changing other providers (e.g., Claude). - Propagate a new custom_reasoning_budget toggle from kwargs to the mapping call. - Add threading and os imports and remove the old obsolete mapping implementation. BREAKING CHANGE: Gemini 3 thinkingConfig format and defaults changed: - thinkingLevel is now a string ("low"/"high") instead of numeric levels. Update any code that inspects thinkingConfig thinkingLevel. - Default thinking behavior for Gemini 3 is now "high" when reasoning_effort is omitted. - The mapping function signature/behavior changed (added custom_reasoning_budget handling). If this method was called externally, update callers to pass the new parameter or rely on kwargs propagation.

…e thoughtSignature handling for Gemini 3 - Introduce ThoughtSignatureCache: TTL-based, thread-safe, auto-cleanup cache for mapping tool_call_id → thoughtSignature. - Integrate cache into AntigravityProvider and add env toggles: - ANTIGRAVITY_SIGNATURE_CACHE_TTL (default 3600s) - ANTIGRAVITY_PRESERVE_THOUGHT_SIGNATURES (client passthrough) - ANTIGRAVITY_ENABLE_SIGNATURE_CACHE (server-side caching) - Update message transformation to accept model and implement a 3-tier thoughtSignature fallback: 1. client-provided signature 2. server-side cache 3. bypass constant ("skip_thought_signature_validator") with warning for Gemini 3 - Fix Gemini → OpenAI chunk conversion: - Stop dropping function calls that include signatures (skip only standalone signature parts). - Store signatures into server cache and optionally include them in responses when passthrough is enabled. - Robustly parse tool responses, map finish reasons, and include reasoning token counts in usage. - Improve tool response grouping and id generation; add informative logging for signature-preservation behavior

…tSignature and decouple cache/passthrough Enforce Gemini 3 behavior where only the first tool call in parallel receives a thoughtSignature. Previously caching and client passthrough were coupled and could result in multiple signatures being stored or passed. This change: - add a first_signature_seen flag to ensure only the first tool call gets the signature - store signature in server-side cache only when _enable_signature_cache is true - pass signature to the client only when _preserve_signatures_in_client is true - preserve logging when a signature is stored in cache

…y aliasing Add "claude-sonnet-4-5" and "claude-sonnet-4-5-thinking" to HARDCODED_MODELS and simplify the alias mappings by removing explicit alias entries for these Claude models since their public names match internal names. This ensures the provider recognizes the new Claude Sonnet variants and avoids incorrect alias translations.

- Add providers/google_oauth_base.py to centralize Google OAuth logic (auth flow, token refresh, env loading, atomic saves, backoff/retry, queueing, headless support, and validation). - Migrate GeminiAuthBase and AntigravityAuthBase to inherit from GoogleOAuthBase and expose provider-specific constants (CLIENT_ID, CLIENT_SECRET, OAUTH_SCOPES, ENV_PREFIX, CALLBACK_PORT, CALLBACK_PATH). - Register "antigravity" in DEFAULT_OAUTH_DIRS and mark it as OAuth-only in credential_tool; include a user-friendly display name for interactive flows. - Remove large duplicated OAuth implementations from provider-specific files and consolidate behavior to reduce maintenance surface and ensure consistent token handling.

…_token helper Add opt-in dynamic model discovery controlled by ANTIGRAVITY_ENABLE_DYNAMIC_MODELS (default: false) to avoid relying on an unstable endpoint. When disabled, the provider returns the hardcoded model list; when enabled, it attempts to fetch models from the API and applies alias mappings. Add clear logging for enabled/disabled states and dynamic discovery results. Also introduce an async get_valid_token helper that loads credentials, refreshes expired tokens, and returns a valid access token for OAuth-style credential paths. - New env var: ANTIGRAVITY_ENABLE_DYNAMIC_MODELS (false by default) - Dynamic discovery returns discovered models prefixed with "antigravity/" - Hardcoded fallback now returns names prefixed with "antigravity/" - Added logs to indicate discovery mode and failures - Added async get_valid_token(credential_identifier) to centralize token refresh/load BREAKING CHANGE: Model names returned by the provider are now namespaced with the "antigravity/" prefix (e.g., "antigravity/xyz"). Update consumers to handle the new prefixed names or strip the prefix as needed. Dynamic discovery is disabled by default; enable it with ANTIGRAVITY_ENABLE_DYNAMIC_MODELS=true if desired.

…edential save - Handle system prompt content as either string or list and strip Claude-specific cache_control fields to avoid 400 errors - Safely parse tool content (JSON or raw) and wrap function responses consistently - Treat merged function response role as "user" to match Antigravity expectations - Add tool_call index for OpenAI streaming format and track index for parallel tool calls - Strip provider prefix from model names and add streaming query param (?alt=sse) when streaming - Include Host and User-Agent headers, set Accept based on streaming, and log error response bodies for easier debugging - Convert OpenAI-style chunks into litellm.ModelResponse objects before yielding in stream handler - Make credential persistence in Gemini CLI provider async (await _save_credentials)

…nd strip unsupported fields Remove dependency on _build_vertex_schema and align tool handling with the Go reference implementation. For function-type tools, build a function declaration with name, description, and a parametersJsonSchema field: - copy parameters when present and remove OpenAI-specific keys (`$schema`, `strict`); - default to an empty object schema when parameters are missing; - avoid mutating the original parameters and embed the declaration in `functionDeclarations`. This ensures Antigravity-compatible tool payloads and fixes schema/compatibility issues when passing tool definitions.

…mas, and fix Gemini tool conversion - Rename _normalize_json_schema → _normalize_type_arrays and convert JSON Schema "type" arrays (e.g. ["string","null"]) to a single non-null type to avoid protobuf "non-repeating" errors. - Add recursive Claude-specific schema cleaner and rename parametersJsonSchema → parameters for claude-sonnet-* models, stripping incompatible fields that break Claude validation. - Ensure thoughtSignature preservation logic remains with proper first-seen handling. - Inline generation of project/request IDs when fetching models. - Replace Vertex helper usage when building Gemini tool declarations: copy/clean parameters, set a safe default parametersJsonSchema, and call _normalize_type_arrays for compatibility.

…ignature handling to gemini-3 Add "id" to functionCall and response objects required by Antigravity/Claude integrations. Restrict preservation/insertion of thoughtSignature to Gemini 3 models only: prefer client-provided signature, fall back to the server-side cache when enabled, and finally use the bypass constant "skip_thought_signature_validator". Emit a warning when a Gemini 3 tool call lacks a signature. Avoid adding thoughtSignature for Claude and other models to prevent sending unsupported fields.

Add an environment-controlled override that modifies requests with `temperature: 0` for chat completions when `OVERRIDE_TEMPERATURE_ZERO` is enabled (default: "false"). - Supported modes: "remove" — delete the `temperature` key; "set"/"true"/"1"/"yes" — set temperature to 1.0. - Rationale: temperature=0 makes models overly deterministic and can cause tool hallucination; the override helps mitigate that when toggled. - Emits debug logs when an override is applied.

…tem-instruction) to reduce tool hallucination Introduce a configurable "Gemini 3" catch-all fix that enforces schema-driven tool usage and reduces tool hallucination by: - adding env-configurable flag ANTIGRAVITY_GEMINI3_TOOL_FIX (default ON) and related vars for prefix, description prompt, and system instruction - implementing namespace prefixing for tool names to break model training associations - injecting strict parameter signatures into tool descriptions to force schema adherence - prepending configurable system instructions for Gemini-3 models to override training-data assumptions - normalizing request/response names (prefix/strip) and preserving function call ids for API consistency - applying transformations only for gemini-3-* models and logging configuration details This change improves robustness when calling external tools by making tool schemas explicit to the model.

Implement dual-TTL caching system with async disk persistence to improve thoughtSignature handling across server restarts and long-running sessions. - Add disk persistence using atomic file writes with tempfile pattern for data integrity - Implement dual-TTL system: 1-hour memory cache, 24-hour disk cache - Create background async tasks for periodic disk writes and memory cleanup - Add disk fallback mechanism for cache misses (loads from disk into memory) - Introduce cache statistics tracking (memory hits, disk hits, misses, writes) - Add graceful shutdown with pending write flush - Convert cache operations from threading.Lock to asyncio.Lock for async support - Add environment variables for configurable write/cleanup intervals - Implement secure file permissions (0o600) for cache files - Add comprehensive logging for cache lifecycle events The cache now survives server restarts and provides better support for multi-turn conversations by persisting thoughtSignatures to disk. Memory cache expires after 1 hour to prevent unbounded growth, while disk cache persists for 24 hours to support longer conversation sessions.

… in tool args - Extend reasoning/thinking mapping to include Claude alongside Gemini 2.5 and Gemini 3: - Claude now uses `thinkingBudget` (same handling as Gemini 2.5, including pro budgets). - Gemini 3 continues to use `thinkingLevel`. - Add a static helper `_recursively_parse_json_strings` to detect and parse JSON-stringified values returned by Antigravity (e.g., `{"files": "[{...}]"}`) and recursively restore proper structures. - Use parsed arguments before `json.dumps()` when building tool call payloads to prevent double-encoding and JSON parsing errors from Antigravity responses. - Update .gitignore to add `launcher_config.json` and `cache/antigravity/thought_signatures.json` and remove the previous `*.log` ignore entry.

…ravity cache handling - Split the single signature cache into separate files: `GEMINI3_SIGNATURE_CACHE_FILE` and `CLAUDE_THINKING_CACHE_FILE`. - Replace `ThoughtSignatureCache` with `AntigravityCache`; disk persistence file is now passed via a `cache_file` constructor argument and in-memory entries are keyed by generic cache keys. - Introduce a stable key generator (`_generate_thinking_cache_key`) that combines tool call IDs and text hashes for Claude thinking caching. - Add separate caches for Gemini 3 signatures (`_signature_cache`) and Claude thinking content (`_thinking_cache`), and wire caching into both streaming and non-streaming flows. - Accumulate reasoning content, tool calls, and the final `thoughtSignature` during streaming (via `stream_accumulator`) and persist complete Claude thinking after the stream (`_cache_claude_thinking_after_stream`). - Inject cached Claude "thinking" parts into assistant messages when available (with signature fallback handling). - Use tool-provided IDs when present (fall back to generated `call_<uuid>` IDs), fix skipping logic for signature-only parts, and accumulate tool calls/text for reliable cache keys. - Adjust reasoning budget division from `// 4` to `// 6` to reduce default thinking budget. - Update `_gemini_to_openai_chunk` signature to accept an optional `stream_accumulator` and propagate accumulator through streaming logic. BREAKING CHANGE: `ThoughtSignatureCache` has been removed/renamed to `AntigravityCache` and its constructor now requires a `cache_file: Path` argument. Update any external imports/usages: - Replace `ThoughtSignatureCache(...)` with `AntigravityCache(cache_file=GEMINI3_SIGNATURE_CACHE_FILE|CLAUDE_THINKING_CACHE_FILE, memory_ttl_seconds=..., disk_ttl_seconds=...)`. - New cache constants `GEMINI3_SIGNATURE_CACHE_FILE` and `CLAUDE_THINKING_CACHE_FILE` were added; ensure integrations use the new names if relying on disk cache paths.

… tier-based onboarding This commit refactors the project discovery logic to strictly follow the official Gemini CLI behavior, fixing critical issues with paid tier support and free tier onboarding. Key changes: - Implement proper discovery flow: cache → configured override → persisted credentials → loadCodeAssist check → tier-based onboarding → fallback - Fix paid tier support: paid tiers now correctly use configured project_id instead of server-managed projects - Fix free tier onboarding: free tier correctly passes cloudaicompanionProject=None for server-managed projects - Add comprehensive tier detection logic: check currentTier from server response and respect userDefinedCloudaicompanionProject flag - Improve error handling: add specific error messages for 412 (precondition failed) and better guidance for missing project_id on paid tiers - Add detailed debug logging: log all tier information, server responses, and decision flow for troubleshooting - Add paid tier visibility: log paid tier usage on each request for transparency - Remove noisy debug logging: disable verbose chunk conversion logs The previous implementation incorrectly assumed all users should use server-managed projects and failed to properly distinguish between free tier (server-managed) and paid tier (user-provided) project handling. This caused 403/412 errors for paid users and incorrect onboarding flow for free users.

… organization and documentation This is a major refactoring of the Antigravity provider implementation that significantly improves code structure, readability, and maintainability without changing functionality. Key improvements: - Reorganized code into logical sections with clear separators (configuration, utilities, caching, transformations, API interface) - Consolidated helper functions with consistent naming patterns (underscore prefix for internal methods) - Simplified complex methods by extracting reusable components (e.g., _parse_content_parts, _extract_tool_call, _format_type_hint) - Enhanced documentation with comprehensive module docstring explaining features and capabilities - Streamlined environment variable handling with dedicated helper functions (_env_bool, _env_int) - Improved type hints and method signatures for better IDE support - Reduced code duplication in message transformation logic - Consolidated tool schema transformations into focused methods - Better separation of concerns between streaming and non-streaming response handling - Standardized error handling and logging patterns - Improved cache implementation with clearer separation of responsibilities The refactoring maintains full backward compatibility while making the codebase significantly easier to understand, test, and extend. All existing features including Gemini 3 thoughtSignature preservation, Claude thinking caching, tool hallucination prevention, and base URL fallback remain fully functional.

…module Extracted the AntigravityCache class into a new shared ProviderCache module to eliminate code duplication and improve maintainability across providers. - Created src/rotator_library/providers/provider_cache.py with generic, reusable cache implementation - Removed 266 lines of cache-specific code from antigravity_provider.py - Updated AntigravityProvider to use ProviderCache for both signature and thinking caches - Added configurable env_prefix parameter for flexible environment variable namespacing - Improved cache naming with _cache_name for better logging context - Added convenience factory function create_provider_cache() for streamlined cache creation - Removed unused imports (shutil, tempfile) from antigravity_provider.py - Updated .gitignore to include cache/ directory The new ProviderCache maintains full backward compatibility with the previous AntigravityCache implementation while providing a more modular, reusable foundation for other providers.

…automatic -thinking mapping This commit streamlines the handling of Claude Sonnet 4.5 model variants by automatically mapping the base model to its -thinking variant when reasoning_effort is provided. - Remove explicit "claude-sonnet-4-5-thinking" from AVAILABLE_MODELS list - Add inline documentation explaining internal mapping behavior - Implement automatic model variant selection in _transform_to_antigravity_format based on reasoning_effort parameter - Thread reasoning_effort parameter through generate_content call chain - Check for base claude-sonnet-4-5 model and append "-thinking" suffix when reasoning_effort is present This improves the API surface by reducing redundant model options while maintaining full functionality through intelligent runtime model selection.

…ure caching This commit integrates comprehensive support for `gemini-3-pro-preview`, addressing specific requirements for reasoning models and tool reliability. - Update `AntigravityProvider` and `GeminiCliProvider` model lists to prioritize Gemini 3. - Implement a "Tool Fix" mechanism to prevent parameter hallucinations: - Inject strict parameter signatures and type hints into tool descriptions. - Add specific system instructions to enforce schema adherence. - Apply `gemini3_` namespace prefixing to isolate tool contexts. - Integrate `ProviderCache` to persist `thoughtSignature` values, ensuring reasoning continuity during tool execution. - Refactor `_handle_reasoning_parameters` to support Gemini 3's `thinkingLevel` (string) alongside Gemini 2.5's `thinkingBudget` (integer). - Add environment variable configuration for cache TTL and feature flags.

…quest payload The `model` and `project` parameters were being incorrectly included at the top level of the request payload. These fields are not part of the Gemini API request body structure and should only be used for endpoint construction or authentication context.

…g for Antigravity - Change reasoning parameters log from info to debug level in main.py - Move reasoning parameters logging outside logger conditional block for consistent monitoring - Enhance _clean_claude_schema documentation to clarify it's for Antigravity/Google's Proto-based API - Add support for converting 'const' to 'enum' with single value in schema cleaning - Improve code organization with better comments explaining unsupported fields These changes improve logging granularity and enhance JSON Schema compatibility with Antigravity's Proto-based API requirements.

…model switches This commit introduces intelligent handling of Claude's thinking mode when switching models mid-conversation during incomplete tool use loops. **New Features:** - Auto-detection of incomplete tool turns (when messages end with tool results without assistant completion) - Configurable turn completion injection via `ANTIGRAVITY_AUTO_INJECT_TURN_COMPLETION` (default: true) - Configurable thinking mode suppression via `ANTIGRAVITY_AUTO_SUPPRESS_THINKING` (default: false) - Customizable turn completion placeholder text via `ANTIGRAVITY_TURN_COMPLETION_TEXT` (default: "...") **Implementation Details:** - `_detect_incomplete_tool_turn()`: Analyzes message history to identify incomplete tool use patterns - `_inject_turn_completion()`: Appends a synthetic assistant message to close incomplete turns - `_handle_thinking_mode_toggle()`: Orchestrates the toggling strategy based on configuration **Behavior:** When switching to Claude with thinking mode enabled during an incomplete tool loop: 1. If auto-injection is enabled: Inject a completion message to allow thinking mode 2. If auto-suppression is enabled: Disable thinking mode to prevent API errors 3. If both disabled: Allow the request to proceed (likely resulting in API error) This resolves API compatibility issues when transitioning between models with different conversation state requirements.

The generic key handling logic was incorrectly concatenating the 'role' field when processing streaming message chunks. The role field should always be replaced with the latest value, not concatenated like content fields. This fix adds an explicit check to ensure the 'role' key is always overwritten rather than appended to, preventing malformed role values in the final message object.

Antigravity sometimes returns malformed JSON strings with extra trailing characters (e.g., '[{...}]}' instead of '[{...}]'). This enhancement extends the JSON parsing logic to automatically detect and correct such malformations by: - Detecting JSON-like strings that don't have proper closing delimiters - Finding the last valid closing bracket/brace and truncating extra characters - Logging warnings when auto-correction is applied for debugging purposes - Recursively parsing the corrected JSON structures This prevents parsing failures when Antigravity returns double-encoded or malformed JSON in tool arguments.

…dentials The `_get_provider_instance` method now checks if credentials exist for a provider before attempting initialization. This prevents potential errors from initializing providers that lack proper configuration. - Added credential existence check at the start of the method - Returns `None` early if provider credentials are not configured - Added debug logging to indicate when provider initialization is skipped - Enhanced docstring with detailed Args and Returns documentation This change improves system robustness by failing gracefully when providers are referenced but not properly configured.

This commit removes the thinking mode toggling functionality that was previously used to handle model switches mid-conversation when tool use loops were incomplete. - Removed `_detect_incomplete_tool_turn`, `_inject_turn_completion`, and `_handle_thinking_mode_toggle` helper methods - Removed environment variable configuration for turn completion behavior (`ANTIGRAVITY_AUTO_INJECT_TURN_COMPLETION`, `ANTIGRAVITY_AUTO_SUPPRESS_THINKING`, `ANTIGRAVITY_TURN_COMPLETION_TEXT`) - Removed thinking mode toggle logic from `acompletion` method - Added provider prefix to JSON auto-correction warning log for better debugging The removed feature was designed to automatically handle incomplete tool use loops when switching to Claude models with thinking mode enabled, but was buggy as hell.

…odel access Implements a comprehensive credential prioritization system that enables providers to enforce tier-based access controls and optimize credential selection based on account types. Key changes: - Added `get_credential_priority()` and `get_model_tier_requirement()` methods to ProviderInterface, allowing providers to define credential tiers and model restrictions - Enhanced UsageManager.acquire_key() to respect credential priorities, always attempting highest-priority credentials first before falling back to lower tiers - Implemented Gemini-specific tier detection in GeminiCliProvider, mapping paid tier credentials to priority 1, free tier to priority 2, and unknown to priority 10 - Added model-based filtering in RotatingClient to exclude incompatible credentials before acquisition (e.g., Gemini 3 models require paid-tier credentials) - Improved logging to show priority-aware credential selection and tier compatibility warnings The system gracefully handles unknown credential tiers by treating them as potentially compatible until their actual tier is discovered on first use. Within each priority level, load balancing by usage count is preserved.

Configure JSON file logging driver with 10MB max size and 3 file rotation for both nginx-proxy-manager and llm-proxy containers to prevent unbounded log growth.

Tool results with images (e.g., from Read tool) were being dropped during Anthropic→OpenAI translation, and not properly converted to Gemini format. - translator.py: Extract image blocks from tool_result content and convert to OpenAI image_url format - antigravity_provider.py: Handle multimodal tool responses by converting image_url to Gemini inlineData format

…irrowel#52) Adds 'propertyNames' to the list of JSON Schema validation keywords that are stripped from tool schemas when converting for Claude via the Antigravity provider. This keyword is not supported by Google's Proto-based API and was causing 400 Bad Request errors with nested object schemas. Closes Mirrowel#48 Co-authored-by: mirrobot-agent[bot] <[email protected]>

@Mirrowel

Set @Mirrowel as the code owner for all files.

Adjusts the default quota consumption rates to reflect updated usage limits effective 2025-12-30. - **Standard Tier**: - Claude/GPT-OSS group: cost increased from 0.40 to 0.67 (reducing capacity to ~150 requests). - Gemini 3 Pro group: cost increased from 0.25 to 0.42 (reducing capacity to ~240 requests). - **Free Tier**: - Claude/GPT-OSS group: cost increased from 1.333 to 2.0 (reducing capacity to 50 requests). - Gemini 3 Pro group: cost increased from 0.40 to 0.67 (reducing capacity to ~150 requests).

- Force default Claude thinking budget to 31999 when thinking is enabled - Inject interleaved thinking hint for Claude tool calls - Log request headers and raw/unwrapped Claude responses for debugging - Preserve thinking signatures across Anthropic compat translation - Improve thinking signature validation/filtering in Antigravity provider Signed-off-by: Moeeze Hassan <[email protected]>

Pass through the exact budget_tokens value from the Anthropic request instead of using a hardcoded constant. This allows Claude Code and other clients to control the thinking budget directly. Changes: - translator.py: Pass thinking_budget from request.thinking.budget_tokens - antigravity_provider.py: Accept and use thinking_budget parameter in _get_thinking_config(), falling back to default if not provided Signed-off-by: Moeeze Hassan <[email protected]>

When thinking is enabled but the last assistant message has no thinking block AND no tool calls (simple text response), Claude API rejects with "Expected thinking but found text". Add synthetic user message to start a fresh turn, allowing thinking to be generated naturally. Signed-off-by: Moeeze Hassan <[email protected]>

Require a thinking block before each tool call and after tool results for Claude interleaved thinking. Signed-off-by: Moeeze Hassan <[email protected]>

Signed-off-by: Moeeze Hassan <[email protected]>

…config Claude models always return early before reaching the model-specific budgets section, making the `or is_claude` condition dead code.

Logs a debug message when skipping non-data URL images, helping developers troubleshoot why images may not appear in requests.

Google's promptTokenCount INCLUDES cached tokens, but Anthropic's input_tokens EXCLUDES cached tokens. This fix: - Extract cachedContentTokenCount from Google's usageMetadata - Subtract cached tokens from input_tokens in responses - Include cache_read_input_tokens and cache_creation_input_tokens - Apply fix to both streaming and non-streaming responses

1. Session ID for Prompt Caching (High Priority) - Derive stable session ID from first user message hash - Enables prompt caching continuity across conversation turns - Falls back to random ID if no user message found 2. Content Reordering (Medium Priority) - Reorder assistant content blocks: thinking → text → tool_use - Matches Anthropic's expected ordering - Sanitizes thinking blocks by removing cache_control 3. Document/PDF Handling (Low Priority) - Support for 'document' type content blocks - Converts base64/URL documents to OpenAI image_url format - Default media type: application/pdf 4. Gemini Output Token Cap (Low Priority) - Add GEMINI_MAX_OUTPUT_TOKENS constant (16384) - Cap maxOutputTokens for non-Claude models - Prevents errors from exceeding Gemini limits 5. Schema Sanitization Improvements (Low Priority) - Add _score_schema_option() for smarter anyOf/oneOf selection - Add _merge_all_of() to properly merge allOf schemas - Add description hints when flattening union types - Select best option (objects > arrays > primitives > null)

Use structured format with CRITICAL prefix and bullet points to reduce skipped thinking blocks between tool calls. Signed-off-by: Moeeze Hassan <[email protected]>

Mirrowel added 30 commits November 23, 2025 14:17

FammasMaz and others added 30 commits December 22, 2025 22:53

fix(telegram): silence httpx polling logs to reduce noise

dcc88f3

feat: add log rotation to docker-compose services

7cedfcb

Configure JSON file logging driver with 10MB max size and 3 file rotation for both nginx-proxy-manager and llm-proxy containers to prevent unbounded log growth.

Merge branch 'feature/docker-support' into server

2bbd86c

Merge branch 'feature/anthropic-endpoints' into server

33aeb3c

feat(telegram): add inline buttons for quota refresh and navigation

2ed7582

Add CODEOWNERS file to require approval from @Mirrowel

0379d5d

Add CODEOWNERS file for @Mirrowel

bb3d31f

Set @Mirrowel as the code owner for all files.

Merge origin/main into server

57d731b

Merge origin/dev into server

ac24b5f

Merge feature/anthropic-endpoints into server

23d781a

fix(anthropic): strengthen interleaved thinking hint

0bb8a52

Require a thinking block before each tool call and after tool results for Claude interleaved thinking. Signed-off-by: Moeeze Hassan <[email protected]>

Merge branch 'feature/anthropic-endpoints' into server

90af1f4

HARDCODE: FORCE: enable max thinking for claude models

f345437

Signed-off-by: Moeeze Hassan <[email protected]>

fix(antigravity): remove unreachable is_claude condition in thinking …

991a8e3

…config Claude models always return early before reaching the model-specific budgets section, making the `or is_claude` condition dead code.

fix(antigravity): add debug logging for non-data URL images

354ac17

Logs a debug message when skipping non-data URL images, helping developers troubleshoot why images may not appear in requests.

Merge branch 'feature/anthropic-endpoints' into server

79c1dae

Merge branch 'feature/telegram-bot' into server

4c2202a

Merge branch 'feature/anthropic-endpoints' into server

6a13cae

Merge branch 'feature/anthropic-endpoints' into server

bad88dd

fix(antigravity): make interleaved thinking hint more explicit

dc19691

Use structured format with CRITICAL prefix and bullet points to reduce skipped thinking blocks between tool calls. Signed-off-by: Moeeze Hassan <[email protected]>

Merge branch 'feature/anthropic-endpoints' into server

962f5f1

feat(quota): add initial implementation of the quota dashboard

70888bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(quota): add initial implementation of the quota dashboard #1

feat(quota): add initial implementation of the quota dashboard #1

Uh oh!

Masood-Salik commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(quota): add initial implementation of the quota dashboard #1

Are you sure you want to change the base?

feat(quota): add initial implementation of the quota dashboard #1

Uh oh!

Conversation

Masood-Salik commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants