Replicat Vicuna API returning empty response

Here is an example prompt that we will generate for QA chains:

```
Use the following pieces of context to answer the question at the end. Use three sentences maximum. \n transformer simultaneously that made it very successful.\n\nAnd I think the authors were kind of, uh, deliberately trying to, uh, make\n\nthis really, uh, powerful architecture.\n\nAnd, um, so basically it's very powerful in the forward pass because it's able\n\nto express, um, very general computation as sort of something that looks like\n\nmessage passing, uh, you have nodes and they all store vectors and, uh, these\n\nnodes get to basically look at each other and it's, uh, each other's vectors Our hardware is a massive throughput machine, like GPUs.\n\nUh, they prefer lots of parallelism.\n\nSo you don't want to do lots of sequential operations.\n\nSo you want to do a lot of operations serially and the transformer is designed\n\nwith that in mind as well.\n\nAnd so it's designed for our hardware and it's designed to both be very\n\nexpressive in a forward pass, but also very optimizable in the backward pass.\n\nAnd you said that, uh, the residual connections support a kind of ability Yes.\n\nUh, if it was too grand, it would over promise and then under deliver\n\npotentially.\n\nSo you want to just, uh, meme your way to greatness.\n\nThat should be a t-shirt.\n\nSo you, you tweeted the transformer is a magnificent neural network architecture\n\nbecause it is a general purpose, differentiable computer.\n\nIt is simultaneously expressive in the forward pass, optimizable via back\n\npropagation, gradient descent, and efficient high parallelism compute graph. Um, but I do think that there should be even better architectures potentially.\n\nBut it's, uh, you're, you admire the resilience here.\n\nYeah.\n\nThere's something profound about this architecture that, that least\n\nresilient, so maybe we can, everything can be turned into a, uh, into a problem\n\nthat transformers can solve.\n\nCurrently definitely looks like the transformer is taking over AI and you\n\ncan feed basically arbitrary problems into it.\nQuestion: Why is the transformer architecture expressive in the forward pass?\nAnswer: Think step by step
```

It has `2150` characters.

If we submit it with default parameters on the [Web UI](https://replicate.com/replicate/vicuna-13b?prediction=uw5t544ckfgize7hzugnpanmzm), we see no output and an error message:

```
Running predict()...
Input length of input_ids is 581, but `max_length` is set to 496. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
```

If I bump `max_length ` to 2000 with the UI toggle, then it works.
 
Now, I try to reproduce the same in python:
```
llm = Replicate(model="replicate/vicuna-13b:e6d469c2b11008bb0e446c3e9629232f9674581224536851272c54871f84076e",
                temperature=0.75,
                max_length=2000)
```

I then see a single output: `'1'`

@dankolesnikov see if you can get the Replicate logs so we can see the error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicat Vicuna API returning empty response #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replicat Vicuna API returning empty response #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions