Skip to content

Conversation

john0isaac
Copy link
Contributor

Purpose

fix #1892

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[x] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[x] No

Type of change

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[x] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@john0isaac
Copy link
Contributor Author

john0isaac commented Aug 7, 2024

TODO

  • Cache the Generated Audio after the first time it's generated to not incur more charges.
  • Handel cases where the Azure Speech API request function returns null. [This was already being handled by console.error()]

@john0isaac
Copy link
Contributor Author

john0isaac commented Aug 8, 2024

@TaylorN15, can you look at this and let me know if it's working as you expect?

@pamelafox done, it's not the prettiest implementation but it works.
I couldn't figure out a way to update the URLs and Caching them on click other than passing all of the URLs, index, and setter to the AzureSpeechUrls function to check then make a request and use setter to update the value.

This is the initial implementation for on-demand speech request b4c1b21 but every time you click on it, it will make a new request to Azure Speech Service.

@taylorn-ai
Copy link
Contributor

taylorn-ai commented Aug 8, 2024

This looks good, very quick, nice job :)

Why is speechUrls an array? And why pass it down from Chat.tsx? Is that because you were just trying not change too much code?

@taylorn-ai
Copy link
Contributor

Also it seems you can play multiple concurrently, this doesn't happen with the web speech API as it seems to handle it automatically. I think it would be best to have some sort of state to manage if audio is playing on any answer, and pause/stop on others if another one is played.

@john0isaac
Copy link
Contributor Author

john0isaac commented Aug 9, 2024

Why is speechUrls an array? And why pass it down from Chat.tsx? Is that because you were just trying not change too much code?

nope like I said on the issue the AzureSpeechOutput is a component that can't keep track of its state anything done inside it is not saved.
To save its state you need to keep track of it in the Parent Page like we keep an array of answers to get the Thought Process etc.

Also it seems you can play multiple concurrently, this doesn't happen with the web speech API as it seems to handle it automatically. I think it would be best to have some sort of state to manage if audio is playing on any answer, and pause/stop on others if another one is played.

This is out of the scope of this PR, and it was never being handled correctly for Azure Speech Service.
The web speech Api is not handling it automatically I implemented the logic to handle this to not play multiple audios concurrently.
But happy to fix it too!

Tried working on it but I couldn't get it to work, people just have to not click on multiple audio or when they click on an audio they need to click on it again to pause it before clicking on another audio.

@john0isaac john0isaac force-pushed the on-demand-azure-speech branch 2 times, most recently from c6dcd30 to ff043d8 Compare August 9, 2024 10:26
@taylorn-ai
Copy link
Contributor

I have gotten this working in my implementation, but quite a few changes from what you have so it's hard to compare.

// Chat.tsx
    const [activeAudioAnswer, setActiveAudioAnswer] = useState<number | null>(null);
    const [isAudioPlaying, setIsAudioPlaying] = useState<boolean>(false);
    const [cachedAudioUrls, setCachedAudioUrls] = useState<{ [key: number]: string }>({});
    const ttsAudio = useRef(new Audio()).current;

    const onToggleAudioPlay = (index: number, speechUrl: string) => {
        if (isAudioPlaying && activeAudioAnswer === index) {
            stopAudio();  // Stop if it's currently playing the same answer
        } else {
            if (isAudioPlaying) {
                stopAudio();  // Stop any currently playing audio
            }
            playAudio(index, speechUrl);
        }
    };

    const playAudio = (index: number, speechUrl: string) => {
        ttsAudio.src = speechUrl;
        ttsAudio.play().then(() => {
            setActiveAudioAnswer(index);
            setIsAudioPlaying(true);
        }).catch((error) => {
            handleError(error, "Error playing audio:");
        });

        ttsAudio.onended = () => {
            setIsAudioPlaying(false);
            setActiveAudioAnswer(null);
        };
    };

    const stopAudio = () => {
        ttsAudio.pause();
        ttsAudio.currentTime = 0;
        setActiveAudioAnswer(null);
        setIsAudioPlaying(false);
    };

    const cacheAudioUrl = (index: number, url: string) => {
        setCachedAudioUrls(prevUrls => ({
            ...prevUrls,
            [index]: url
        }));
    };
...
<Answer
    index={index}
    isAudioPlaying={index === activeAudioAnswer && isAudioPlaying}
    onToggleAudioPlay={onToggleAudioPlay}
    cachedAudioUrl={cachedAudioUrls[index] || null}
    onCacheAudioUrl={cacheAudioUrl}
...
/>
// Answer.tsx
const [isAudioLoading, setIsAudioLoading] = useState<boolean>(false);

    const toggleAudioPlay = async () => {
        if (cachedAudioUrl) {
            onToggleAudioPlay(index, cachedAudioUrl);
        } else {
            setIsAudioLoading(true);
            const plainText = removeHtmlTags(sanitizedAnswerHtml);
            const token = client ? await getToken(client) : undefined;
            const speechUrl = await getSpeechApi(token, plainText);

            if (!speechUrl) {
                handleError("Error", "An error occurred while generating speech");
                setIsAudioLoading(false);
                return;
            }

            onCacheAudioUrl(index, speechUrl);
            onToggleAudioPlay(index, speechUrl);
            setIsAudioLoading(false);
        }
    };

Then I pass the state of the audio loading/playing into the buttons so that it shows a play/stop depending on state.

@john0isaac
Copy link
Contributor Author

@TaylorN15 welp, I didn't want to do this I found a workaround.
I implemented it differently to maintain the modularity of the implementation.
Thanks for your code though it helped, can you give it a go now and let me know if there is anything else.

@taylorn-ai
Copy link
Contributor

@TaylorN15 welp, I didn't want to do this I found a workaround.

I implemented it differently to maintain the modularity of the implementation.

Thanks for your code though it helped, can you give it a go now and let me know if there is anything else.

Looks good. I don't have time to deploy and test, like I said my code is quite different.

@john0isaac john0isaac force-pushed the on-demand-azure-speech branch 2 times, most recently from 4019b99 to 84147b3 Compare August 19, 2024 21:36
John Aziz added 3 commits August 22, 2024 08:16
@john0isaac john0isaac force-pushed the on-demand-azure-speech branch from 84147b3 to 9c551f3 Compare August 22, 2024 05:16
@pamelafox
Copy link
Collaborator

Hm I just had a bug where I got an answer, heard the speech, then cleared chat and asked anopther question, and it played the original speech. I'm trying to replicate it to figure out what leads to it.

@john0isaac
Copy link
Contributor Author

john0isaac commented Aug 23, 2024

When you clear the chat all of the generated voice urls are set to null so, i'm not sure about that.
Maybe it was cached in the audio var?
Do I need to clear the audio too, when you click on clear?

@john0isaac
Copy link
Contributor Author

Hm I just had a bug where I got an answer, heard the speech, then cleared chat and asked anopther question, and it played the original speech. I'm trying to replicate it to figure out what leads to it.

I could not replicate this issue.

@john0isaac john0isaac requested a review from pamelafox August 23, 2024 17:03
@john0isaac john0isaac requested a review from pamelafox August 23, 2024 17:12
@pamelafox
Copy link
Collaborator

I haven't replicated the issue either. I think the new code architecture looks good.

I just pushed a change which preloads the "sync" icon. That's because I was briefly seeing the no-image-indicator briefly before the sync icon came in, and I found that confusing.
I preload it by adding an invisible disabled button with the same icon, and verified it passes accessibility checks.

@pamelafox pamelafox merged commit f7969c0 into Azure-Samples:main Aug 23, 2024
10 checks passed
@pamelafox
Copy link
Collaborator

Thanks, merged!

@john0isaac john0isaac deleted the on-demand-azure-speech branch August 23, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Generate text to speech on demand
3 participants