You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
Hi all,
I was looking at writing my own package for this, but it seems like extending langchain might be easier/more useful, since it has most of the plumbing already implemented to do most of this.
So, my idea was to be able to batch out inferences to separate endpoints. So langchain runnables would have a pool of inference endpoints for a certain type of inference. Chat being the most obvious of them, but I imagine functions, images, text to speech, speech to text would be others. You could then submit a batch of requests to the pool, and let langchain route and process the results.
Motivation
There are a number of reasons it would be interesting/useful to do this, mostly involving running local inference endpoints and combining them with externally hosted (i.e. open ai, huggingface, sagemaker, google, etc) endpoints.
max token length - local endpoints tend to be more limited in the size of requests/responses
cost - open ai/sagemaker, etc aren't free, while local endpoints are. this can reduce the total cost of inference
it would be useful for failover
it would be kind of cool to have
Proposal (If applicable)
I'm not very familiar with langchain nor it's internals, but it seems like extending the Runnable class to have a pool would be the easiest way to do this.
EDIT: looking closer, fallbacks are just a general case of a endpoint pool. maybe that can be the basis for this?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Hi all,
I was looking at writing my own package for this, but it seems like extending langchain might be easier/more useful, since it has most of the plumbing already implemented to do most of this.
So, my idea was to be able to batch out inferences to separate endpoints. So langchain runnables would have a pool of inference endpoints for a certain type of inference. Chat being the most obvious of them, but I imagine functions, images, text to speech, speech to text would be others. You could then submit a batch of requests to the pool, and let langchain route and process the results.
Motivation
There are a number of reasons it would be interesting/useful to do this, mostly involving running local inference endpoints and combining them with externally hosted (i.e. open ai, huggingface, sagemaker, google, etc) endpoints.
Proposal (If applicable)
I'm not very familiar with langchain nor it's internals, but it seems like extending the Runnable class to have a pool would be the easiest way to do this.
EDIT: looking closer, fallbacks are just a general case of a endpoint pool. maybe that can be the basis for this?
Beta Was this translation helpful? Give feedback.
All reactions