Skip to content

Conversation

@kingychiu
Copy link

@kingychiu kingychiu commented Dec 24, 2024

Purpose

  1. The original code force people to use Azure CLI to login, this PR add another option (using the API Key generated from Azure Portal)

  2. The client.send_item expects the message to have an id attribute because it expects an Item type, so the current demo will raise an exception when user try to send a text message. Updating to UserMessageItem type has default id=None attribute.

  3. Change the default voice to alloy, because coral voice also trigger a runtime exception.

  4. The import order changes are from ruff linter & format, if you wish I can revert those.

Does this introduce a breaking change?

[ ] Yes
[X] No

Pull Request Type

What kind of change does this Pull Request introduce?

[X] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

  • Setup and start the backend under: samples/middle-tier/python-fastapi
  • Setup and start the frontend under: samples/middle-tier/generic-frontend
  • Send a text message for a response.

Other Information

I found another issue (not fixed by this PR): the RTClient.generate_response conflicts with RTClient.events, because both will take the response.created event. As a result

  • RTClient.generate_response will be blocked forever, because response.created is removed by RTClient.events
  • In the demo here, we cannot send text messages more than once, because the first text message will block the process.

I am creating another issue for this: #111

@kingychiu
Copy link
Author

@kingychiu please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@ElliotRoe
Copy link

Please merge 🙏 I am also have these troubles with the Python middle tier

@ElliotRoe
Copy link

@kingychiu are you maintaining an updated version of the rtclient package? I'd be happy to contribute if so.

@kingychiu
Copy link
Author

@ElliotRoe, If you clone the code from this PR, it should work except the text input issue I mentioned here #111.

Because I need both audio and text input, I switched to using OpenAI-python beta realtime api.
https://github.com/openai/openai-python/tree/main?tab=readme-ov-file#realtime-api-beta

Init client:

client = AsyncAzureOpenAI(
        azure_endpoint="AZURE_OPENAI_ENDPOINT",
        api_key="AZURE_OPENAI_KEY",
        api_version="2024-10-01-preview",
    )

Connection:

async with self.client.beta.realtime.connect(
    model="gpt-4o-realtime-preview"
) as conn:

The rest of the websocket logic is mostly similar to this aoai-realtime-audio-sdk repo, except I didn't implement the locking I mentioned in the text input issue linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants