feat: add support for replicate.stream() #75

zeke · 2025-09-30T20:10:11Z

This PR adds support for streaming predictions via the replicate.stream() method, as specified in DP-671

This change is intended to support feature parity with the legacy pre-Stainless 1.x client.

Changes

Add stream() method to both Replicate and AsyncReplicate clients
Add module-level stream() function for convenience
Create new lib/_predictions_stream.py module with streaming logic
Add tests for sync and async streaming
Update README with documentation and examples using anthropic/claude-4-sonnet

The stream() method creates a prediction and returns an iterator that yields output chunks as strings as they become available from the streaming API. This is useful for language models where you want to display output as it's generated rather than waiting for the entire response.

Example Usage

import replicate

for event in replicate.stream(
    "anthropic/claude-4-sonnet",
    input={
        "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast.",
        "max_tokens": 8192,
        "system_prompt": "You are a helpful assistant",
    },
):
    print(str(event), end="")

Testing locally

Clone the repo and checkout the branch:

gh repo clone replicate/replicate-python-stainless
cd replicate-python-stainless
gh pr checkout 75

Set up the development environment:
```
scripts/bootstrap
```
Run the tests:
```
scripts/test
```

Try the example:

import replicate

for event in replicate.stream(
    "meta/meta-llama-3-70b-instruct",
    input={"prompt": "Write a haiku about Python"},
):
    print(str(event), end="")

Prompts

Please implement this: https://linear.app/replicate/issue/DP-671/add-support-for-replicatestream

Remember to add docs and tests. Run scripts/test to make sure it works. Then lint.

the new docs says it's emitting SSEs. but that's at the API level, not in the python client, right?

read the comments in the linear ticket to make sure we're also supporting streaming file outputs

let's forget about streaming file outputs for now, and just make this initial implementation support streaming text responses

Related: DP-671

This PR adds support for streaming predictions via the `replicate.stream()` method. Changes: - Add `stream()` method to both Replicate and AsyncReplicate clients - Add module-level `stream()` function for convenience - Create new `lib/_predictions_stream.py` module with streaming logic - Add comprehensive tests for sync and async streaming - Update README with documentation and examples using anthropic/claude-4-sonnet The stream method creates a prediction and returns an iterator that yields output chunks as they become available via Server-Sent Events (SSE). This is useful for language models where you want to display output as it's generated.

linear · 2025-09-30T20:10:14Z

DP-671 Add support for `replicate.stream()`

The legacy 1.x client supports a method called replicate.stream():

for event in replicate.stream(
    "anthropic/claude-4-sonnet",
    input={
        "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.",
        "max_tokens": 8192,
        "system_prompt": "You are a helpful assistant",
        "extended_thinking": False,
        "max_image_resolution": 0.5,
        "thinking_budget_tokens": 1024
    },
):
    print(str(event), end="")

When creating a prediction via APi, the returned prediction object will always have a stream entry in its url property if the model supports streaming:

prediction=$(
    curl --silent --show-error https://api.replicate.com/v1/models/anthropic/claude-4-sonnet/predictions \
			--request POST \
    	--header "Authorization: Bearer $REPLICATE_API_TOKEN" \
    	--header "Content-Type: application/json" \
    	--data - <<'EOM'
{
	"input": {
      "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.",
      "system_prompt": "You are a helpful assistant"
	}
}
EOM
)

stream_url=$(printf "%s" "$prediction" | jq -r .urls.stream)

curl --silent --show-error --no-buffer "$stream_url" \
    --header "Accept: text/event-stream" \
    --header "Cache-Control: no-store"

Docs about streaming are here: https://replicate.com/docs/topics/predictions/streaming

Tasks:

Implement replicate.stream() in the client.
Add tests
Update the README with documentation and a working example that uses anthropic/claude-4-sonnet

The API uses Server-Sent Events internally, but the Python client yields plain string chunks to the user, not SSE event objects.

…ions

dgellow · 2025-10-01T18:27:22Z

Some thoughts:

stream() seems to overlap with the new replicate.use("...", streaming=True). That will create some confusion, and more code has to be documented and maintained
I feel the SDK version bump is a good time to push people to the replicate.use() wherever possible given it is more flexible and does relate to your concept of pipelines
if added, it may be simpler to implement it as a wrapper of replicate.use("...", streaming=True)
if added, I would recommend to mark it as @deprecated("Use replicate.use() instead")

zeke · 2025-10-02T02:52:14Z

Great feedback @dgellow

cc @bfirsh would love your thoughts.

zeke · 2025-10-02T15:18:07Z

I would recommend to mark it as @deprecated("Use replicate.use() instead")

What effect does that have? The function still works, but a user also sees that error message if they try to run it?

Is there a proper way to not implement it at all, but display a helpful message when users call replicate.steam()?

zeke · 2025-10-06T20:10:51Z

Closing! Gonna start a new PR for this based on the feedback from @dgellow and @aron 👍🏼

zeke · 2025-10-06T20:53:50Z

Replaced by #79

zeke requested a review from a team as a code owner September 30, 2025 20:10

zeke added 3 commits September 30, 2025 13:18

docs: clarify that stream() yields strings, not SSE events

5e6be60

The API uses Server-Sent Events internally, but the Python client yields plain string chunks to the user, not SSE event objects.

refactor: DRY up duplicate reference resolution logic in stream funct…

9776bfd

…ions

refactor: DRY up duplicate docstrings in stream functions

f2d2683

zeke requested review from aron, dgellow and erbridge September 30, 2025 20:56

zeke added 4 commits September 30, 2025 14:00

fix: remove unused version variable from stream functions

fe45237

refactor: use _version instead of _ for unused variable

58ac19f

fix: add type annotations to test_stream.py for linter

924ab59

fix: add type annotations to output lists in stream tests

3333e0f

zeke closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for replicate.stream() #75

feat: add support for replicate.stream() #75

zeke commented Sep 30, 2025 •

edited

Loading

Uh oh!

linear bot commented Sep 30, 2025

Uh oh!

dgellow commented Oct 1, 2025

Uh oh!

zeke commented Oct 2, 2025

Uh oh!

zeke commented Oct 2, 2025

Uh oh!

zeke commented Oct 6, 2025

Uh oh!

zeke commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add support for replicate.stream() #75

feat: add support for replicate.stream() #75

Conversation

zeke commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example Usage

Testing locally

Prompts

Uh oh!

linear bot commented Sep 30, 2025

Uh oh!

dgellow commented Oct 1, 2025

Uh oh!

zeke commented Oct 2, 2025

Uh oh!

zeke commented Oct 2, 2025

Uh oh!

zeke commented Oct 6, 2025

Uh oh!

zeke commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeke commented Sep 30, 2025 •

edited

Loading