Skip to content

Conversation

@zeke
Copy link
Member

@zeke zeke commented Sep 30, 2025

This PR adds support for streaming predictions via the replicate.stream() method, as specified in DP-671

This change is intended to support feature parity with the legacy pre-Stainless 1.x client.

Changes

  • Add stream() method to both Replicate and AsyncReplicate clients
  • Add module-level stream() function for convenience
  • Create new lib/_predictions_stream.py module with streaming logic
  • Add tests for sync and async streaming
  • Update README with documentation and examples using anthropic/claude-4-sonnet

The stream() method creates a prediction and returns an iterator that yields output chunks as strings as they become available from the streaming API. This is useful for language models where you want to display output as it's generated rather than waiting for the entire response.

Example Usage

import replicate

for event in replicate.stream(
    "anthropic/claude-4-sonnet",
    input={
        "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast.",
        "max_tokens": 8192,
        "system_prompt": "You are a helpful assistant",
    },
):
    print(str(event), end="")

Testing locally

  1. Clone the repo and checkout the branch:

    gh repo clone replicate/replicate-python-stainless
    cd replicate-python-stainless
    gh pr checkout 75
  2. Set up the development environment:

    scripts/bootstrap
  3. Run the tests:

    scripts/test
  4. Try the example:

    import replicate
    
    for event in replicate.stream(
        "meta/meta-llama-3-70b-instruct",
        input={"prompt": "Write a haiku about Python"},
    ):
        print(str(event), end="")

Prompts

Please implement this: https://linear.app/replicate/issue/DP-671/add-support-for-replicatestream

Remember to add docs and tests. Run scripts/test to make sure it works. Then lint.

the new docs says it's emitting SSEs. but that's at the API level, not in the python client, right?

read the comments in the linear ticket to make sure we're also supporting streaming file outputs

let's forget about streaming file outputs for now, and just make this initial implementation support streaming text responses

Related: DP-671

This PR adds support for streaming predictions via the `replicate.stream()` method.

Changes:
- Add `stream()` method to both Replicate and AsyncReplicate clients
- Add module-level `stream()` function for convenience
- Create new `lib/_predictions_stream.py` module with streaming logic
- Add comprehensive tests for sync and async streaming
- Update README with documentation and examples using anthropic/claude-4-sonnet

The stream method creates a prediction and returns an iterator that yields
output chunks as they become available via Server-Sent Events (SSE). This is
useful for language models where you want to display output as it's generated.
@zeke zeke requested a review from a team as a code owner September 30, 2025 20:10
@linear
Copy link

linear bot commented Sep 30, 2025

DP-671 Add support for `replicate.stream()`

The legacy 1.x client supports a method called replicate.stream():

for event in replicate.stream(
    "anthropic/claude-4-sonnet",
    input={
        "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.",
        "max_tokens": 8192,
        "system_prompt": "You are a helpful assistant",
        "extended_thinking": False,
        "max_image_resolution": 0.5,
        "thinking_budget_tokens": 1024
    },
):
    print(str(event), end="")

When creating a prediction via APi, the returned prediction object will always have a stream entry in its url property if the model supports streaming:

prediction=$(
    curl --silent --show-error https://api.replicate.com/v1/models/anthropic/claude-4-sonnet/predictions \
			--request POST \
    	--header "Authorization: Bearer $REPLICATE_API_TOKEN" \
    	--header "Content-Type: application/json" \
    	--data - <<'EOM'
{
	"input": {
      "prompt": "Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.",
      "system_prompt": "You are a helpful assistant"
	}
}
EOM
)

stream_url=$(printf "%s" "$prediction" | jq -r .urls.stream)

curl --silent --show-error --no-buffer "$stream_url" \
    --header "Accept: text/event-stream" \
    --header "Cache-Control: no-store"

Docs about streaming are here: https://replicate.com/docs/topics/predictions/streaming

Tasks:

  • Implement replicate.stream() in the client.
  • Add tests
  • Update the README with documentation and a working example that uses anthropic/claude-4-sonnet

The API uses Server-Sent Events internally, but the Python client
yields plain string chunks to the user, not SSE event objects.
@zeke zeke requested review from aron, dgellow and erbridge September 30, 2025 20:56
@dgellow
Copy link
Collaborator

dgellow commented Oct 1, 2025

Some thoughts:

  • stream() seems to overlap with the new replicate.use("...", streaming=True). That will create some confusion, and more code has to be documented and maintained
  • I feel the SDK version bump is a good time to push people to the replicate.use() wherever possible given it is more flexible and does relate to your concept of pipelines
  • if added, it may be simpler to implement it as a wrapper of replicate.use("...", streaming=True)
  • if added, I would recommend to mark it as @deprecated("Use replicate.use() instead")

@zeke
Copy link
Member Author

zeke commented Oct 2, 2025

Great feedback @dgellow

cc @bfirsh would love your thoughts.

@zeke
Copy link
Member Author

zeke commented Oct 2, 2025

I would recommend to mark it as @deprecated("Use replicate.use() instead")

What effect does that have? The function still works, but a user also sees that error message if they try to run it?

Is there a proper way to not implement it at all, but display a helpful message when users call replicate.steam()?

@zeke
Copy link
Member Author

zeke commented Oct 6, 2025

Closing! Gonna start a new PR for this based on the feedback from @dgellow and @aron 👍🏼

@zeke zeke closed this Oct 6, 2025
@zeke
Copy link
Member Author

zeke commented Oct 6, 2025

Replaced by #79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants