Skip to content

Conversation

@Stijnus
Copy link
Collaborator

@Stijnus Stijnus commented Aug 29, 2025

πŸ”§ Fix Token Limits & Invalid JSON Response Errors

Issues Resolved

  • ❌ Invalid JSON Response Errors - AI SDK streaming failures causing malformed responses
  • ❌ Token Limit API Rejections - Models hitting API limits due to incorrect configurations
  • ❌ Outdated Model Configurations - Static models with severely underestimated token limits
  • ❌ Poor Error Messages - Generic errors that didn't help users troubleshoot

Root Causes Identified

  1. Incorrect Token Limits: Models configured with 8k limits instead of actual 128k-2M context windows
  2. Streaming Error Handling: Insufficient error detection in AI SDK response processing
  3. Model Validation: No validation of model capabilities before API calls
  4. Static Model Bloat: 40+ hardcoded models causing maintenance overhead

🎯 Solutions Implemented

1. Accurate Token Limits & Context Sizes

Updated all providers with their actual context window capabilities:

Provider Model Before After Improvement
OpenAI GPT-4o 8k 128k 16x increase
OpenAI GPT-3.5-turbo 8k 16k 2x increase
Anthropic Claude 3.5 Sonnet 8k 200k 25x increase
Anthropic Claude 3 Haiku 8k 200k 25x increase
Google Gemini 1.5 Pro 8k 2M 250x increase
Google Gemini 1.5 Flash 8k 1M 125x increase
Groq Llama 3.1/3.3 8k 128k 16x increase
Together Llama 3.2 90B 8k 128k 16x increase
OpenRouter Claude 3.5 Sonnet 8k 200k 25x increase

2. Dynamic Model Intelligence

// Smart context detection from provider APIs
if (m.context_length) {
  contextWindow = m.context_length; // OpenAI API
} else if (m.inputTokenLimit) {
  contextWindow = m.inputTokenLimit; // Google API
} else if (m.max_tokens) {
  contextWindow = m.max_tokens; // Anthropic API
}

3. Enhanced Error Handling

  • Invalid JSON Response: Specific detection and user-friendly messages
  • Token Limit Exceeded: Clear warnings with model upgrade suggestions
  • API Key Issues: Validation with setup guidance
  • Rate Limiting: Automatic detection with retry recommendations
  • Network Errors: Timeout handling with connectivity checks

4. Performance Optimizations

  • Static Models: Reduced from 40+ to 12 essential models (70% reduction)
  • Safety Caps: Smart token limits preventing API rejections (100k max)
  • Context Display: Enhanced model labels showing M/k context units
  • Streaming Reliability: Improved error detection in AI SDK processing

πŸ“ Files Modified

  • app/lib/.server/llm/constants.ts - Updated MAX_TOKENS from 8k to 32k
  • app/lib/modules/llm/providers/openai.ts - GPT models with accurate 128k/16k limits
  • app/lib/modules/llm/providers/anthropic.ts - Claude models with 200k context
  • app/lib/modules/llm/providers/google.ts - Gemini models with 1M-2M context
  • app/lib/modules/llm/providers/groq.ts - Llama models with 128k context
  • app/lib/modules/llm/providers/together.ts - Updated model configurations
  • app/lib/modules/llm/providers/open-router.ts - Enhanced context detection
  • app/lib/.server/llm/stream-text.ts - Token validation and safety caps
  • app/routes/api.chat.ts - Comprehensive error handling improvements

βœ… Verification

  • βœ… All linting checks pass
  • βœ… TypeScript compilation successful
  • βœ… No breaking changes to existing functionality
  • βœ… Backward compatibility maintained
  • βœ… Error handling thoroughly tested

πŸŽ‰ Impact & Benefits

For Users:

  • No More Invalid JSON Errors: Streaming responses now handle errors gracefully
  • Full Model Capabilities: Access to complete context windows (up to 2M tokens)
  • Better Error Messages: Clear, actionable guidance when issues occur
  • Improved Reliability: Enhanced API interaction and error recovery

For Developers:

  • Accurate Token Management: Prevents API rejections and optimizes usage
  • Reduced Maintenance: Dynamic model fetching reduces hardcoded configurations
  • Better Debugging: Comprehensive error logging and categorization
  • Future-Proof: Automatic adaptation to new model releases

For System Performance:

  • 70% Fewer Static Models: Faster startup and reduced memory usage
  • Smarter API Usage: Optimal token allocation prevents waste
  • Enhanced Reliability: Robust error handling and recovery mechanisms

πŸ” Technical Details

Token Limit Strategy

// Safety-first approach with smart caps
const safeMaxTokens = Math.min(dynamicMaxTokens, 100000); // 100k safety cap
const maxAllowed = 2000000; // 2M absolute maximum for largest models

Error Classification

// Comprehensive error handling with specific user guidance
if (errorMessage.includes('Invalid JSON response')) {
  return 'Custom error: The AI service returned an invalid response. This may be due to an invalid model name, API rate limiting, or server issues. Try selecting a different model or check your API key.';
}

Dynamic Context Detection

  • OpenAI: Uses context_length from /v1/models API
  • Anthropic: Uses max_tokens from /v1/models API
  • Google: Uses inputTokenLimit from Generative AI API
  • OpenRouter: Uses context_length from aggregated models
  • Fallbacks: Intelligent defaults when API unavailable

πŸš€ Deployment Notes

  • Zero Breaking Changes: Fully backward compatible
  • Gradual Rollout Recommended: Monitor error rates in initial deployment
  • API Key Validation: Ensure all providers have valid API keys configured
  • Performance Monitoring: Track token usage and API response times

Fixes: #1917
Type: Bug Fix, Enhancement
Priority: High
Breaking: No

ISSUES FIXED:
- ❌ Invalid JSON response errors during streaming
- ❌ Incorrect token limits causing API rejections
- ❌ Outdated hardcoded model configurations
- ❌ Poor error messages for API failures

SOLUTIONS IMPLEMENTED:

🎯 ACCURATE TOKEN LIMITS & CONTEXT SIZES
- OpenAI GPT-4o: 128k context (was 8k)
- OpenAI GPT-3.5-turbo: 16k context (was 8k)
- Anthropic Claude 3.5 Sonnet: 200k context (was 8k)
- Anthropic Claude 3 Haiku: 200k context (was 8k)
- Google Gemini 1.5 Pro: 2M context (was 8k)
- Google Gemini 1.5 Flash: 1M context (was 8k)
- Groq Llama models: 128k context (was 8k)
- Together models: Updated with accurate limits

οΏ½οΏ½ DYNAMIC MODEL FETCHING ENHANCED
- Smart context detection from provider APIs
- Automatic fallback to known limits when API unavailable
- Safety caps to prevent token overflow (100k max)
- Intelligent model filtering and deduplication

πŸ›‘οΈ IMPROVED ERROR HANDLING
- Specific error messages for Invalid JSON responses
- Token limit exceeded warnings with solutions
- API key validation with clear guidance
- Rate limiting detection and user guidance
- Network timeout handling

⚑ PERFORMANCE OPTIMIZATIONS
- Reduced static models from 40+ to 12 essential
- Enhanced streaming error detection
- Better API response validation
- Improved context window display (shows M/k units)

πŸ”§ TECHNICAL IMPROVEMENTS
- Dynamic model context detection from APIs
- Enhanced streaming reliability
- Better token limit enforcement
- Comprehensive error categorization
- Smart model validation before API calls

IMPACT:
βœ… Eliminates Invalid JSON response errors
βœ… Prevents token limit API rejections
βœ… Provides accurate model capabilities
βœ… Improves user experience with clear errors
βœ… Enables full utilization of modern LLM context windows
@Stijnus Stijnus merged commit b5d9055 into stackblitz-labs:main Aug 29, 2025
3 checks passed
oizidbih added a commit to El-Technology/Ellogy_Coder that referenced this pull request Aug 29, 2025
Updates from upstream bolt.diy:
- Update LLM providers and constants (stackblitz-labs#1937)
- Fix Token Limits & Invalid JSON Response Errors (stackblitz-labs#1934)
- GitHub deployment cleanup improvements
- Code quality and formatting improvements

Ellogy Coder branding maintained:
- Custom favicon and logo assets
- Blue color scheme (#1a5eec) throughout UI
- Ellogy Coder header logo
- Custom .gitignore configuration

πŸ€– Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@Stijnus Stijnus deleted the error-1917 branch August 29, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant