Skip to content

Conversation

@hirokuni-kitahara
Copy link
Member

@hirokuni-kitahara hirokuni-kitahara commented Mar 25, 2025

Signed-off-by: hirokuni-kitahara [email protected]

This PR is based on the issue #823 and it adds two new fields retry and trace_error_on_retry to the block class.
retry is an integer value which indicates the number of retry when some errors happen within the block. retry is enabled only when the value is specified and positive.
trace_error_on_retry is a boolean value whether to add error information to the trace. This defaults to None (=False), but when it is set to True, errors during retry are added to the trace. This is useful for multiple trials with model block to leverage self-reflection behavior of LLM.

The following is the original PR description before discussion in the comments.
---
This PR is based on the issue https://github.com/IBM/prompt-declaration-language/issues/823 and it adds two new fields `retry_on_error` and `retry_max` to the existing `repeat` block.
`retry_on_error` is a simple boolean value which indicates if the retry feature is enabled or not, and if true, errors while running the `repeat` block are added to the background context of the LLM.
`retry_max` is an integer value of the number of maximum retry.
This PR contains the following changes
- Update to the RepeatBlock model in `pdl_ast.py`
- Update while loop for repeat block in `pdl_interpreter.py`
- Update to the `schema.json`

@vazirim
Copy link
Member

vazirim commented Mar 25, 2025

Hi @hirokuni-kitahara, the PR looks great. How about instead of adding 2 new fields we only have retry_max, set to 0 by default. When it's set to 0 it means that retry_on_error == False. When it's set to anything strictly greater than 0, then it's equivalent to retry_on_error == True with that number of max tries. wdyt?

When you make changes to the AST, you can run:

pdl --schema > src/pdl/pdl-schema.json

to automatically regenerate the jsonschema. This is probabaly what you did, but just double checking.

There are also some other files that need to change (with an AST change):
pdl_dumper.py
pdl_ast_utils.py

Finally, you can run the following locally to make sure everything is in good shape:

pytest

and:

pre-commit run --all-files

See the contribution docs
Thank you so much!

@hirokuni-kitahara hirokuni-kitahara force-pushed the error-handling-in-repeat-1 branch from fe52fdc to 4ea3dde Compare March 26, 2025 07:29
@hirokuni-kitahara
Copy link
Member Author

Thank you for your feedback @vazirim !
I have updated the codes so that it uses retry_max for the feature flag and I removed retry_on_error based on your suggestion.
Also, I followed the steps in the contribution doc and now all checks passed.
I really appreciate your if you can review the updated codes.

@starpit
Copy link
Member

starpit commented Mar 26, 2025

@hirokuni-kitahara can you rebase (other PRs have also changed the schema) and then run npm run types in pdl-live-react? this will re-generate the typescript types in accordance with the changes to the schema. thanks!

@mandel
Copy link
Collaborator

mandel commented Mar 26, 2025

Thank you that is a great feature!

Instead of doing it only on repeat blocks, what about adding this field in the Block class inherited by all the blocks?

I would also rename retry_max simply into retry.

@hirokuni-kitahara
Copy link
Member Author

@starpit @mandel
Thank you very much for your feedbacks!
Yes, I will rebase and update ts with the command, and also I will add just retry to the Block class.

@hirokuni-kitahara hirokuni-kitahara changed the title Add a new retry feature to repeat block Add a new retry feature to block Apr 10, 2025
@hirokuni-kitahara hirokuni-kitahara force-pushed the error-handling-in-repeat-1 branch 2 times, most recently from a4bd87d to 506e109 Compare April 10, 2025 08:44
@hirokuni-kitahara
Copy link
Member Author

Hi @vazirim @mandel @starpit,
the retry feature is ready now and I think I already updated all the related files (including the schema.json and ts file).
Could you please review the changes? Thank you!

@vazirim
Copy link
Member

vazirim commented Apr 14, 2025

Thank you very much for the changes @hirokuni-kitahara!

LGTM

@mandel @starpit any more feedback?

@starpit
Copy link
Member

starpit commented Apr 22, 2025

does this handle the case where an async/future'd model block invocation fails?

Copy link
Collaborator

@mandel mandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great. I just added a couple of comments.

raise exc from exc
if do_retry:
error = f"An error occurred in a PDL block. Error details: {err_msg}"
print(f"\n\033[0;31m[Retry {trial_idx+1}/{max_retry}] {error}\033[0m\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be printed on stderr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mandel , I will fix this part.

if do_retry:
error = f"An error occurred in a PDL block. Error details: {err_msg}"
print(f"\n\033[0;31m[Retry {trial_idx+1}/{max_retry}] {error}\033[0m\n")
scope = set_error_to_scope_for_retry(scope, error, block.pdl__id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the error to the trace instead of re-executing the block in the original scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mandel ,
I'm adding the error to the trace so that the agent's LLM can understand what went wrong in the previous trial. (I'm assuming the retry feature is being used within a ReAct-style loop.)

This allows the LLM to generate a different output from the model block in the next iteration.

However, I now realize there's also a need to support more traditional retry scenarios—such as retrying an HTTP connection—where adding the error to the trace might be excessive.

What do you think about adding a new block attribute like trace_error_on_retry as a boolean field? It could default to false and can be set to true for agent use-case.

Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
@hirokuni-kitahara hirokuni-kitahara force-pushed the error-handling-in-repeat-1 branch from 2845347 to baa673e Compare May 22, 2025 13:29
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
Signed-off-by: hirokuni-kitahara <[email protected]>
@hirokuni-kitahara
Copy link
Member Author

@vazirim @mandel I could fix the schema issue and now all the checks passed!
Could you please review this? Thank you!

@mandel mandel merged commit 47f4adc into IBM:main May 23, 2025
7 checks passed
@mandel
Copy link
Collaborator

mandel commented May 23, 2025

That's great. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants