feat: parallelize sync generate method for improved LLM throughput #34043
+39
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR optimizes the sync
generate()method inBaseChatModelto improve throughput when processing multiple prompts by parallelizing LLM calls using a thread-pool executor.Changes
chat_models.py:904-947get_executor_for_configcontext manager for proper thread pool lifecycleTechnical Details
The refactor maintains the same API while significantly improving performance for batch processing scenarios:
on_llm_errorcallbacksPerformance Impact
This change will improve throughput when processing multiple prompts simultaneously, especially beneficial for:
Testing
The changes preserve all existing behavior:
Related
Part of the broader LLM optimization initiative documented in
/Users/bytedance/langchain/langchain/.trae/documents/Optimize LLM Calls Across Codebase.mdChecklist
get_executor_for_config)