You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Inference Providers] Async calls for text-to-video with fal.ai (#1292)
## What does this PR do?
This PR adds asynchronous polling to the fal.ai text-to-video
generation. This allows running inference with models that may take > 2
min to generate results. The other motivation behind this PR is to align
the Python and JS clients, the Python client has already been merged
into main: huggingface/huggingface_hub#2927
## Main Changes
- Replaced static `baseUrl` property with `makeBaseUrl()` function
across all providers. This is needed to be able to customize the base
url based on the task. We want to use `FAL_AI_API_BASE_URL_QUEUE` for
`text-to-video` only. I'm not convinced if it's the simplest and the
best way to do that.
- Added a `pollFalResponse()` for `text-to-video`(similarly to what it's
done with BFL for `text-to-image`).
Any refactoring suggestions are welcome! I'm willing to spend some
additional time to make provider-specific updates easier to implement
and better align our two clients 🙂
btw, I did not update the VCR tests as we've discussed that it'd be best
to remove the VCR for `text-to-video`. Maybe we should remove them here?
**EDIT**: removed the text-to-video tests in
[f8a6386](f8a6386).
I've tested it locally with
[tencent/HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) for
which the generation takes more than 2min and it works fine:
https://github.com/user-attachments/assets/3cd38900-c4ed-4b28-ae79-8a4e724f58d1
---------
Co-authored-by: Julien Chaumond <[email protected]>
Co-authored-by: Simon Brandeis <[email protected]>
0 commit comments