Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 8, 2025

Summary

This PR fixes an issue where codebase indexing would freeze at 69% when using Ollama with the mxbai-embed-large model.

Problem

The indexing process was hanging indefinitely due to:

  1. Missing timeout handling for batch embedding operations in the Ollama embedder
  2. Silent failures in batch processing that weren't properly propagated
  3. Inadequate error state updates when batch processing failed

Solution

  • Added dynamic timeout handling: Batch embedding operations now have a timeout that scales with batch size (minimum 60 seconds, plus 2 seconds per text)
  • Improved error propagation: Batch processing errors are now properly thrown to ensure the orchestrator catches them
  • Enhanced error reporting: Added immediate error state updates when batch processing fails
  • Better debugging: Added detailed logging for timeout and retry scenarios

Changes

  1. Ollama Embedder (src/services/code-index/embedders/ollama.ts):

    • Added dynamic timeout calculation based on batch size
    • Improved timeout error messages with batch context
  2. Scanner (src/services/code-index/processors/scanner.ts):

    • Added specific error handling for embedding timeouts
    • Ensured batch errors are properly thrown (not just logged)
    • Added better logging for retry attempts
  3. Orchestrator (src/services/code-index/orchestrator.ts):

    • Added immediate error state updates when batch errors occur
    • Improved progress reporting comments
  4. Localization (src/i18n/locales/en/embeddings.json):

    • Added new translation key for batch timeout errors

Testing

  • All existing tests pass
  • Tested with Ollama embedder tests
  • Tested with scanner tests
  • Tested with manager tests

Fixes #6849


Important

Fixes indexing freeze at 69% by adding dynamic timeout handling and improving error propagation in Ollama embedder.

  • Behavior:
    • Fixes freeze at 69% during indexing with Ollama embedder by adding dynamic timeout handling in ollama.ts.
    • Errors in batch processing are now properly propagated and logged in scanner.ts and orchestrator.ts.
    • Immediate error state updates added in orchestrator.ts.
  • Error Handling:
    • Dynamic timeout calculation based on batch size in ollama.ts.
    • Improved error messages for timeouts and batch processing in ollama.ts and scanner.ts.
    • New translation key for batch timeout errors in embeddings.json.
  • Logging:
    • Added detailed logging for timeout and retry scenarios in scanner.ts and orchestrator.ts.

This description was created by Ellipsis for 2eb26fc. You can customize this summary. It will automatically update as commits are pushed.

- Add dynamic timeout for batch embedding operations based on batch size
- Improve error handling and recovery in batch processing
- Ensure errors are properly thrown to prevent silent failures
- Add immediate error state updates when batch processing fails
- Add better logging for debugging timeout and retry scenarios

Fixes #6849
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 8, 2025 14:21
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Aug 8, 2025
@roomote roomote bot mentioned this pull request Aug 8, 2025
texts.length > 1
? t("embeddings:ollama.batchTimeoutError", {
count: texts.length,
timeout: Math.round(Math.max(OLLAMA_EMBEDDING_TIMEOUT_MS, texts.length * 2000) / 1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Instead of recalculating the batch timeout for the error message (using Math.max again), reuse the 'batchTimeout' variable calculated earlier to ensure consistency.

Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed my own code and found it acceptable, but only because the alternative was infinite recursion.


// Add timeout to prevent indefinite hanging
// Use a longer timeout for batch operations as they can take more time
const batchTimeout = Math.max(OLLAMA_EMBEDDING_TIMEOUT_MS, texts.length * 2000) // At least 2 seconds per text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 2-second-per-text multiplier intentional? This seems like a reasonable heuristic, but could we make this configurable or at least document why 2 seconds was chosen? Some models or systems might need different timeout scaling.

"modelNotEmbeddingCapable": "Ollama model is not embedding capable: {{modelId}}",
"hostNotFound": "Ollama host not found: {{baseUrl}}"
"hostNotFound": "Ollama host not found: {{baseUrl}}",
"batchTimeoutError": "Ollama embedding timed out after {{timeout}} seconds while processing {{count}} texts. Consider reducing batch size or increasing timeout."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new 'batchTimeoutError' translation key is only added to the English locale. Should we add placeholder translations to all other locale files (fr, de, es, ca, hi, id, it, ja, ko, nl, pl, pt-BR, ru, tr, vi) to prevent missing translation errors?

if (embeddingError.message?.includes("timed out") || embeddingError.message?.includes("timeout")) {
throw new Error(
`Embedding timeout for batch of ${batchTexts.length} texts. This may indicate the Ollama service is overloaded or the model is too slow. ${embeddingError.message}`,
{ cause: embeddingError },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When wrapping errors with the 'cause' option, could we lose stack traces in some environments? Consider preserving the original error more explicitly, perhaps by including the original stack in the message or using a custom error class that better preserves debugging information.

texts.length > 1
? t("embeddings:ollama.batchTimeoutError", {
count: texts.length,
timeout: Math.round(Math.max(OLLAMA_EMBEDDING_TIMEOUT_MS, texts.length * 2000) / 1000),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout calculation logic appears both here (line 135) and earlier (line 73). Could we extract this to a helper function like 'calculateBatchTimeout(textCount: number)' for better maintainability?


if (attempts < MAX_BATCH_RETRIES) {
const delay = INITIAL_RETRY_DELAY_MS * Math.pow(2, attempts - 1)
console.log(`[DirectoryScanner] Retrying batch processing in ${delay}ms...`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other error messages in this file, should this retry log message include the workspace context? Something like: '[DirectoryScanner] Retrying batch processing in ms for workspace ...'

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 8, 2025
@daniel-lxs
Copy link
Member

No repro steps or scope

@daniel-lxs daniel-lxs closed this Aug 11, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 11, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

codebase indexing not work

4 participants