Skip to content

Conversation

@lukasdotcom
Copy link
Member

The implementation for TopicsProvider isn't that great, but is passable.
Also I added progress to all the chunked providers because it made debugging easier.

@lukasdotcom lukasdotcom force-pushed the feat/chunking branch 3 times, most recently from ce995c1 to 72c3dff Compare July 21, 2025 12:41
…formulateProvider, TopicsProvider, and TranslateProvider

Signed-off-by: Lukas Schaefer <[email protected]>
Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some task types like proofread, the output can be weird as one chunk could be correct and another one incorrect. So one completion request might produce there are no spelling or grammar mistakes and the other one would list the mistakes, ending up in a confusing/contradicting answer.

Maybe the proofread prompt could be adjusted to something like "do not output anything if there is no mistake or correction suggestion". And in the end we assemble the results and if all the chunks were clean, we artificially set the response to there are no spelling or grammar mistakes. Wdyt?

@lukasdotcom lukasdotcom force-pushed the feat/chunking branch 2 times, most recently from c8b9842 to c9ecd9c Compare July 29, 2025 12:35
@lukasdotcom
Copy link
Member Author

For some task types like proofread, the output can be weird as one chunk could be correct and another one incorrect. So one completion request might produce there are no spelling or grammar mistakes and the other one would list the mistakes, ending up in a confusing/contradicting answer.

Maybe the proofread prompt could be adjusted to something like "do not output anything if there is no mistake or correction suggestion". And in the end we assemble the results and if all the chunks were clean, we artificially set the response to there are no spelling or grammar mistakes. Wdyt?

I did try to change the prompt to attempt to output nothing, but it really doesn't want to do that. It will either ignore that instruction or literally say 'empty string' or 'nothing' or it will output an empty json list. This happens with many different models. Also I don't think we can just set the response to there are no spelling and grammar mistakes when none of them give an output in case the user doesn't expect/speak english.

@marcelklehr
Copy link
Member

We could use another LLM request to combine the outputs

@julien-nc
Copy link
Member

We could use another LLM request to combine the outputs

@lukasdotcom That sounds safer indeed. This will hopefully get rid of the contradiction 😁

@kyteinsky
Copy link
Contributor

For some task types like proofread, the output can be weird as one chunk could be correct and another one incorrect
We could use another LLM request to combine the outputs

one other method could be to use structured outputs and instruct the LLM to output changes in different JSON keys. Even smaller models are capable of that now.
for example for the summary task type, appending this text works nicely:

output the summary in the following JSON format:
{"summary": "string"}

the actual text can be wrapped in something like """ """

@lukasdotcom
Copy link
Member Author

For some task types like proofread, the output can be weird as one chunk could be correct and another one incorrect
We could use another LLM request to combine the outputs

one other method could be to use structured outputs and instruct the LLM to output changes in different JSON keys. Even smaller models are capable of that now. for example for the summary task type, appending this text works nicely:

output the summary in the following JSON format:
{"summary": "string"}

the actual text can be wrapped in something like """ """

The advantage I see of @marcelklehr method is that it would also allow for duplicate recommendations to be removed. Like if the person misspelled the same word multiple times in different chunks, but the JSON could save an llm request.

@marcelklehr
Copy link
Member

What would be the benefit of having the LLM wrap the output in json? We still need to combine potentially conflicting outputs somehow

@lukasdotcom
Copy link
Member Author

What would be the benefit of having the LLM wrap the output in json? We still need to combine potentially conflicting outputs somehow

That llms will actually output an empty string.

@lukasdotcom
Copy link
Member Author

What would be the benefit of having the LLM wrap the output in json? We still need to combine potentially conflicting outputs somehow

Implemented another LLM request when multiple chunks exist to merge all the feedback into one list. Also made the formatting nicer because a lot of the time the LLM spit out a numbered list and it looks weird when it counts 1, 2, 3, 1, 2...

Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works nicely with gpt-3.5.

We should improve the system prompts to increase the chances the answer is given in the same language as the input text. Can be done in another issue/PR.

@lukasdotcom lukasdotcom merged commit bbc46db into main Jul 30, 2025
29 checks passed
@lukasdotcom lukasdotcom deleted the feat/chunking branch July 30, 2025 12:06
@kyteinsky kyteinsky mentioned this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants