Skip to content

Conversation

@vblagoje
Copy link
Member

Update of audio and builder pydocs components to ensure:

  • snippets in pydocs are actually valid executable python scripts
  • are up to date with current models etc
  • running these scripts doesn't raise any exceptions/errors etc

@vblagoje vblagoje requested a review from a team as a code owner November 27, 2025 15:38
@vblagoje vblagoje requested review from anakin87 and removed request for a team November 27, 2025 15:38
@vercel
Copy link

vercel bot commented Nov 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
haystack-docs Ignored Ignored Preview Nov 28, 2025 0:20am

@vblagoje vblagoje requested review from dfokina and removed request for anakin87 November 27, 2025 15:38
@github-actions github-actions bot added the type:documentation Improvements on the docs label Nov 27, 2025
@vblagoje vblagoje requested a review from anakin87 November 27, 2025 15:38
@vblagoje
Copy link
Member Author

@anakin87 @dfokina this is ongoing effort, a bit of updates every day until we ensure all snippets are runnable without errors. And easier to review as well.

@coveralls
Copy link
Collaborator

coveralls commented Nov 27, 2025

Pull Request Test Coverage Report for Build 19763653080

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 8 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 91.571%

Files with Coverage Reduction New Missed Lines %
components/builders/chat_prompt_builder.py 2 98.21%
components/audio/whisper_remote.py 6 70.45%
Totals Coverage Status
Change from base Build 19737955616: 0.02%
Covered Lines: 13939
Relevant Lines: 15222

💛 - Coveralls

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that we should clarify the goal of this task.

In general, I would like to avoid errors in snippets, but also readability for users seem relevant to me.

(@dfokina can probably judge this better than me)

whisper = RemoteWhisperTranscriber(api_key=Secret.from_token("<your-api-key>"), model="tiny")
transcription = whisper.run(sources=["path/to/audio/file"])
whisper = RemoteWhisperTranscriber(api_key=Secret.from_env_var("OPENAI_API_KEY"), model="whisper-1")
Copy link
Member

@anakin87 anakin87 Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure... Secret.from_token("<your-api-key>") was a meaningful placeholder.
Secret.from_env_var("OPENAI_API_KEY") is just the default value. (If we want to go this route, we can just remove the api_key parameter from the example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use whisper = RemoteWhisperTranscriber(model="whisper-1")
If you think it's helpful, add a comment about setting the OPENAI_API_KEY env var.

# Output example (truncated):
# {'llm': {'replies': [ChatMessage(...)]}}
>> {'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the long output can be helpful for users

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will put it back in

@vblagoje
Copy link
Member Author

I have the impression that we should clarify the goal of this task.

In general, I would like to avoid errors in snippets, but also readability for users seem relevant to me.

(@dfokina can probably judge this better than me)

@anakin87 the goal is to make sure that pydocs are not just some hand-waving examples that actually don't work. As in usage of tiny whisper model that doesn't exist on OpenAI provider endpoint. We want to actually run these snippets nightly and confirm they are actually valid, executable scripts - to minimize the drift from pydocs and reality.

@anakin87
Copy link
Member

@anakin87 the goal is to make sure that pydocs are not just some hand-waving examples that actually don't work. As in usage of tiny whisper model that doesn't exist on OpenAI provider endpoint. We want to actually run these snippets nightly and confirm they are actually valid, executable scripts - to minimize the drift from pydocs and reality.

Yes, I understand. I just have the impression that testing everything without compromising user learning experience is challenging. But let's keep this work going and see how it evolves.

@vblagoje
Copy link
Member Author

@anakin87 the goal is to make sure that pydocs are not just some hand-waving examples that actually don't work. As in usage of tiny whisper model that doesn't exist on OpenAI provider endpoint. We want to actually run these snippets nightly and confirm they are actually valid, executable scripts - to minimize the drift from pydocs and reality.

Yes, I understand. I just have the impression that testing everything without compromising user learning experience is challenging. But let's keep this work going and see how it evolves.

Totally, not taking this to extreme that every single pydoc code snippet has to run or.... but where it makes sense and it doesn't compromise learning experience. There is a way to tag snippet not to be run so where it is highly impractical to actually run the snippet we mark it so.

@vblagoje
Copy link
Member Author

@dfokina I'll let you decide here for these changes. I'm ok with whatever you think is the best direction for audio and builders and in the meantime I'll remove the reno note that should def. not be there

This reverts commit 18733fa.
@dfokina
Copy link
Contributor

dfokina commented Nov 28, 2025

@vblagoje @anakin87 I don't mind these changes honestly, I still think the examples are meaningful for the audience, plus we add some more explanations in the documentation guides. Our goal was always to make the code immediately runnable, which is not possible yet for the snippets inside the guides, so there are these placeholders, so why not use actual files and values in the docstrings :)

@anakin87 anakin87 added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Nov 28, 2025
# no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
llm = OpenAIChatGenerator(api_key=Secret.from_token("<your-api-key>"), model="gpt-4o-mini")
llm = OpenAIChatGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"), model="gpt-5-mini")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.
We can just use llm = OpenAIChatGenerator(model="gpt-5-mini")
If you think it's helpful, add a comment about setting the OPENAI_API_KEY env var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ignore-for-release-notes PRs with this flag won't be included in the release notes. type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants