fix: improve Ollama embeddings error handling and add retry logic #6528

roomote · 2025-07-31T23:37:39Z

Summary

This PR improves the error handling for Ollama embeddings to address the issue where embedding generation fails and stops the indexing process.

Changes

Enhanced error logging: Added detailed logging to capture the actual response from Ollama API, making it easier to diagnose issues
Retry logic with exponential backoff: Implemented automatic retry (up to 3 attempts) for transient failures like network issues or temporary service unavailability
Better error detection: Added specific handling for model not found errors and invalid response formats
Improved error messages: Made error messages more descriptive to help users understand what went wrong
Response validation: Added validation to ensure the embeddings response is in the expected format (array of arrays)

Testing

All existing tests pass ✅
The retry logic will help with transient failures
Better error messages will help users diagnose configuration issues

Related Issue

Fixes #6526

Notes for Reviewers

The main improvements are:

When Ollama returns an unexpected response, we now log the actual response structure to help diagnose the issue
Transient network failures will be automatically retried with exponential backoff
Model-specific errors (like model not found) are now properly detected and reported

This should help users who are experiencing embedding generation failures with local Ollama setups.

Important

Improves error handling and adds retry logic with exponential backoff for Ollama embeddings in ollama.ts.

Error Handling:
- Enhanced error logging in createEmbeddings() to capture actual response from Ollama API.
- Added specific handling for model not found errors and invalid response formats.
- Improved error messages for better user understanding.
Retry Logic:
- Implemented retry logic with exponential backoff in createEmbeddings() for transient failures (up to 3 attempts).
- Handles network issues, service unavailability, and other transient errors.
Response Validation:
- Added validation in createEmbeddings() to ensure response is an array of arrays.
Telemetry:
- Captures telemetry events for errors in createEmbeddings() and validateConfiguration().
Misc:
- Adjusted timeout settings for embedding and validation requests.

^{This description was created by}^{for b9c9d21. You can customize this summary. It will automatically update as commits are pushed.}

- Add detailed error logging to capture actual Ollama API responses - Implement retry logic with exponential backoff for transient failures - Add better error detection for model not found errors - Improve error messages to be more descriptive - Add validation for embeddings response format Fixes #6526

ellipsis-dev · 2025-07-31T23:39:31Z

src/services/code-index/embedders/ollama.ts

+
+				// Don't retry for certain errors
+				if (
+					error.message?.includes(t("embeddings:ollama.modelNotFound", { modelId: "" }).split(":")[0]) ||


Using the translated error message (via t()) and splitting it with split(":") to check for a model-not-found condition is brittle. Consider using custom error types or dedicated error codes for more robust error detection.

daniel-lxs · 2025-07-31T23:41:23Z

Closing, issue is not scoped

roomote

@roomote reviewed @roomote's code and found it needs more tests. Classic @roomote move. I've reviewed the changes and left some suggestions inline to improve the implementation.

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

-			// Add timeout to prevent indefinite hanging
-			const controller = new AbortController()
-			const timeoutId = setTimeout(() => controller.abort(), OLLAMA_EMBEDDING_TIMEOUT_MS)
+		for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {


This PR adds significant retry logic and error handling improvements but doesn't include any unit tests. Could we add tests to verify:

Retry behavior with exponential backoff

Proper handling of different error types

Maximum retry limit enforcement

Error message validation

This is especially important for ensuring the retry logic works as expected.

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

+				// Check if we should retry
+				if (attempt < MAX_RETRIES) {
+					// Check if it's a transient error that we should retry
+					if (


The retry logic currently retries for many error types. Is it intentional to retry for all of these?

ENOTFOUND might indicate a configuration issue (wrong host) rather than a transient failure

AbortError happens on timeout - retrying might just hit the same timeout again

Consider limiting retries to truly transient errors like ECONNRESET and ETIMEDOUT?

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

 const OLLAMA_VALIDATION_TIMEOUT_MS = 30000 // 30 seconds for validation requests

+// Retry configuration
+const MAX_RETRIES = 3


Consider making these retry parameters configurable through the ApiHandlerOptions:

Suggested change

const MAX_RETRIES = 3

// Retry configuration

const MAX_RETRIES = options.ollamaMaxRetries ?? 3

const INITIAL_RETRY_DELAY_MS = options.ollamaInitialRetryDelay ?? 1000 // 1 second

const MAX_RETRY_DELAY_MS = options.ollamaMaxRetryDelay ?? 10000 // 10 seconds

This would allow users to adjust retry behavior based on their specific setup.

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

+
+				// Don't retry for certain errors
+				if (
+					error.message?.includes(t("embeddings:ollama.modelNotFound", { modelId: "" }).split(":")[0]) ||


Using string matching on localized error messages could be fragile. If the translation changes, this check might break. Consider using a more robust approach, perhaps by checking the error type or adding a specific error code property?

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

+		// Handle specific error types with better messages
+		if (lastError.name === "AbortError") {
+			throw new Error(t("embeddings:validation.connectionFailed"))
+		} else if (lastError.message?.includes("fetch failed") || (lastError as any).code === "ECONNREFUSED") {


There's inconsistency in error checking - sometimes using error.code (line 174) and sometimes error.message?.includes() (line 208). Consider standardizing to one approach for better maintainability?

roomote · 2025-07-31T23:44:46Z

src/services/code-index/embedders/ollama.ts

+						error.code === "ETIMEDOUT" ||
+						error.code === "ECONNRESET"
+					) {
+						console.log(`Ollama embedding attempt ${attempt} failed, retrying in ${retryDelay}ms...`)


Consider adding telemetry for retry attempts to help monitor retry patterns:

Suggested change

console.log(`Ollama embedding attempt ${attempt} failed, retrying in ${retryDelay}ms...`)

console.log(`Ollama embedding attempt ${attempt} failed, retrying in ${retryDelay}ms...`)

TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_RETRY, {

attempt,

retryDelay,

errorCode: error.code,

location: "OllamaEmbedder:createEmbeddings"

})

roomote bot requested review from cte, jr and mrubens as code owners July 31, 2025 23:37

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 31, 2025

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 31, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 31, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jul 31, 2025

roomote bot mentioned this pull request Jul 31, 2025

Failing to create code index using local setup #6526

Closed

ellipsis-dev bot reviewed Jul 31, 2025

View reviewed changes

daniel-lxs closed this Jul 31, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 31, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 31, 2025

roomote bot commented Jul 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve Ollama embeddings error handling and add retry logic #6528

fix: improve Ollama embeddings error handling and add retry logic #6528

Uh oh!

roomote bot commented Jul 31, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot Jul 31, 2025

Uh oh!

daniel-lxs commented Jul 31, 2025

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-const MAX_RETRIES = 3
+// Retry configuration
+const MAX_RETRIES = options.ollamaMaxRetries ?? 3
+const INITIAL_RETRY_DELAY_MS = options.ollamaInitialRetryDelay ?? 1000 // 1 second
+const MAX_RETRY_DELAY_MS = options.ollamaMaxRetryDelay ?? 10000 // 10 seconds

-						console.log(`Ollama embedding attempt ${attempt} failed, retrying in ${retryDelay}ms...`)
+console.log(`Ollama embedding attempt ${attempt} failed, retrying in ${retryDelay}ms...`)
+TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_RETRY, {
+  attempt,
+  retryDelay,
+  errorCode: error.code,
+  location: "OllamaEmbedder:createEmbeddings"
+})

fix: improve Ollama embeddings error handling and add retry logic #6528

fix: improve Ollama embeddings error handling and add retry logic #6528

Uh oh!

Conversation

roomote bot commented Jul 31, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Related Issue

Notes for Reviewers

Uh oh!

ellipsis-dev bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-lxs commented Jul 31, 2025

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roomote bot commented Jul 31, 2025 •

edited by ellipsis-dev bot

Loading