Distributed inference #19188

rlippmann · 2024-03-16T14:50:12Z

rlippmann
Mar 16, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

Hi all,

I was looking at writing my own package for this, but it seems like extending langchain might be easier/more useful, since it has most of the plumbing already implemented to do most of this.

So, my idea was to be able to batch out inferences to separate endpoints. So langchain runnables would have a pool of inference endpoints for a certain type of inference. Chat being the most obvious of them, but I imagine functions, images, text to speech, speech to text would be others. You could then submit a batch of requests to the pool, and let langchain route and process the results.

Motivation

There are a number of reasons it would be interesting/useful to do this, mostly involving running local inference endpoints and combining them with externally hosted (i.e. open ai, huggingface, sagemaker, google, etc) endpoints.

max token length - local endpoints tend to be more limited in the size of requests/responses
cost - open ai/sagemaker, etc aren't free, while local endpoints are. this can reduce the total cost of inference
it would be useful for failover
it would be kind of cool to have

Proposal (If applicable)

I'm not very familiar with langchain nor it's internals, but it seems like extending the Runnable class to have a pool would be the easiest way to do this.

EDIT: looking closer, fallbacks are just a general case of a endpoint pool. maybe that can be the basis for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed inference #19188

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Distributed inference #19188

Uh oh!

Uh oh!

rlippmann Mar 16, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

rlippmann
Mar 16, 2024