Skip to content

works#1531

Open
BenCowen wants to merge 1 commit intomainfrom
customer-retries
Open

works#1531
BenCowen wants to merge 1 commit intomainfrom
customer-retries

Conversation

@BenCowen
Copy link
Copy Markdown
Contributor

@BenCowen BenCowen commented Mar 27, 2026

Type of Change

This is a bit of a workaround for "custom retries" (i.e. retrying via modal.Retries on some exceptions, and circumventing retries on other, more fatal errors). It's based on this demo I made for Achira last year, and I'm bringing it up again because another customer asked for it here.

The demo follows a scripted sequence of errors that requires each function execution to "know" which iteration it's on. This is accomplished with a modal.Dict but could potentially simplify this by perhaps choosing a random error? (but then, it's random)
image

  • New example for the GitHub repo

Monitoring Checklist

  • Example is configured for testing in the synthetic monitoring system, or lambda-test: false is provided in the example frontmatter and I have gotten approval from a maintainer
    • [X Example is tested by executing with modal run, or an alternative cmd is provided in the example frontmatter (e.g. cmd: ["modal", "serve"])
    • Example is tested by running the cmd with no arguments, or the args are provided in the example frontmatter (e.g. args: ["--prompt", "Formula for room temperature superconductor:"]
    • Example does not require third-party dependencies besides fastapi to be installed locally (e.g. does not import requests or torch in the global scope or other code executed locally)

Documentation Site Checklist

Content

  • Example is documented with comments throughout, in a Literate Programming style
  • All media assets for the example that are rendered in the documentation site page are retrieved from modal-cdn.com

Build Stability

  • Example pins all dependencies in container images
    • Example pins container images to a stable tag like v1, not a dynamic tag like latest
    • [] Example specifies a python_version for the base image, if it is used
    • --> currently just using default image, should i add .debian_slim(add_python=...)?
    • Example pins all dependencies to at least SemVer minor version, ~=x.y.z or ==x.y, or we expect this example to work across major versions of the dependency and are committed to maintenance across those versions
      • Example dependencies with version < 1 are pinned to patch version, ==0.y.z

Outside Contributors

You're great! Thanks for your contribution.


Open with Devin

@BenCowen
Copy link
Copy Markdown
Contributor Author

Might also want SDK's blessing on this (hence @thomasjpfan tag)?

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

# This function follows a scripted sequence to demonstrate the behavior:
#
# 1. **Call 1** — raises `TimeoutError` (retryable → Modal retries)
# 2. **Call 2** — raises `ConnectionError` (retryable → Modal retries)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a list of exceptions that we'll retry? Do users want to configure their function so they can control when their retries?

I'm thinking of:

@app.function(retries=modal.Retries(max_retries=5, initial_delay=1.0)) 
def my_func():
    try:
        my_custo_funcdtion()
    except MyCustomException as exc:
        raise ModalRetriableError(...) from exc  # signal to model that we want to retry.
       

Copy link
Copy Markdown
Contributor Author

@BenCowen BenCowen Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do users want to configure their function so they can control when their retries

I have only gotten this request twice in the last year, so possibly not often enough to dedicate engineering...

Do we have a list of exceptions that we'll retry

Right now, retries are configured based on task status, and there is no exception filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants