-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: stop reading big files that crash context #6667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
5c48253
9f612d5
e2b13a6
9923b7c
32098cb
7d7df19
0e0bd6c
fe60b11
8fc176d
2505ac6
6347e57
b556d64
2285015
e4bac4f
8269f3a
75fb09a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| import { Task } from "../task/Task" | ||
| import * as fs from "fs/promises" | ||
|
|
||
| /** | ||
| * Conservative buffer percentage for file reading. | ||
| * We use a very conservative estimate to ensure files fit in context. | ||
| */ | ||
| const FILE_READ_BUFFER_PERCENTAGE = 0.4 // 40% buffer for safety | ||
|
|
||
| /** | ||
| * Very conservative character to token ratio | ||
| * Using 2.5 chars per token instead of 3-4 to be extra safe | ||
| */ | ||
| const CHARS_PER_TOKEN_CONSERVATIVE = 2.5 | ||
|
|
||
| /** | ||
| * File size thresholds | ||
| */ | ||
| const TINY_FILE_SIZE = 10 * 1024 // 10KB - always safe | ||
| const SMALL_FILE_SIZE = 50 * 1024 // 50KB - safe if context is mostly empty | ||
| const MEDIUM_FILE_SIZE = 500 * 1024 // 500KB - needs validation | ||
| const LARGE_FILE_SIZE = 1024 * 1024 // 1MB - always limit | ||
|
|
||
| export interface ContextValidationResult { | ||
| shouldLimit: boolean | ||
| safeContentLimit: number // Character count limit | ||
| reason?: string | ||
| } | ||
|
|
||
| /** | ||
| * Simple validation based on file size and available context. | ||
| * Uses very conservative estimates to avoid context overflow. | ||
| */ | ||
| export async function validateFileSizeForContext( | ||
| filePath: string, | ||
| totalLines: number, | ||
| currentMaxReadFileLine: number, | ||
| cline: Task, | ||
| ): Promise<ContextValidationResult> { | ||
| try { | ||
| // Get file size | ||
| const stats = await fs.stat(filePath) | ||
| const fileSizeBytes = stats.size | ||
|
|
||
| // Tiny files are always safe | ||
| if (fileSizeBytes < TINY_FILE_SIZE) { | ||
| return { shouldLimit: false, safeContentLimit: -1 } | ||
| } | ||
|
|
||
| // Get context information | ||
| const modelInfo = cline.api.getModel().info | ||
| const { contextTokens: currentContextTokens } = cline.getTokenUsage() | ||
| const contextWindow = modelInfo.contextWindow | ||
| const currentlyUsed = currentContextTokens || 0 | ||
|
|
||
| // Calculate available space with conservative buffer | ||
| const remainingTokens = contextWindow - currentlyUsed | ||
| const usableTokens = Math.floor(remainingTokens * (1 - FILE_READ_BUFFER_PERCENTAGE)) | ||
|
|
||
| // Reserve space for response (use 25% of remaining or 4096, whichever is smaller) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should use the common logic for this |
||
| const responseReserve = Math.min(Math.floor(usableTokens * 0.25), 4096) | ||
| const availableForFile = usableTokens - responseReserve | ||
|
|
||
| // Convert to conservative character estimate | ||
| const safeCharLimit = Math.floor(availableForFile * CHARS_PER_TOKEN_CONSERVATIVE) | ||
|
|
||
| // For small files with mostly empty context, allow full read | ||
| const contextUsagePercent = currentlyUsed / contextWindow | ||
| if (fileSizeBytes < SMALL_FILE_SIZE && contextUsagePercent < 0.3) { | ||
| return { shouldLimit: false, safeContentLimit: -1 } | ||
| } | ||
|
|
||
| // For medium files, check if they fit within safe limit | ||
| if (fileSizeBytes < MEDIUM_FILE_SIZE && fileSizeBytes <= safeCharLimit) { | ||
| return { shouldLimit: false, safeContentLimit: -1 } | ||
| } | ||
|
|
||
| // For large files or when approaching limits, always limit | ||
| if (fileSizeBytes > safeCharLimit || fileSizeBytes > LARGE_FILE_SIZE) { | ||
| // Use a very conservative limit | ||
| const finalLimit = Math.min(safeCharLimit, 100000) // Cap at 100K chars | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this might annoy people who are trying to use a model with a large context window to read large files
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the plan for this PR was to do something stupid to fix the temporary error-- basically never reading big files This PR has the full implementation and doesn't have that limit |
||
|
|
||
| return { | ||
| shouldLimit: true, | ||
| safeContentLimit: finalLimit, | ||
| reason: "This is a partial read - the remaining content cannot be accessed due to context limitations.", | ||
| } | ||
| } | ||
|
|
||
| return { shouldLimit: false, safeContentLimit: -1 } | ||
| } catch (error) { | ||
| // On any error, use ultra-conservative defaults | ||
| console.warn(`[validateFileSizeForContext] Error during validation: ${error}`) | ||
| return { | ||
| shouldLimit: true, | ||
| safeContentLimit: 50000, // 50K chars as safe fallback | ||
| reason: "This is a partial read - the remaining content cannot be accessed due to context limitations.", | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 40% buffer intentionally this conservative? It might be worth making this configurable or adjusting based on model capabilities. Some models might handle closer-to-limit content better than others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we shouldn’t need to be so conservative here if the rest of the logic is working right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sorry-- I think I just picked a big number for the simple version