You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use makeRequestOptions to generate inference snippets (#1273)
The broader goal of this PR is to use `makeRequestOptions` from JS
InferenceClient in order to get all the implementation details (correct
URL, correct authorization header, correct payload, etc.). JS
InferenceClient is supposed to be the ground truth in this case.
**In practice:**
- fixed `makeUrl` when chatCompletion + image-text-to-text (review
[here](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-a6509c908fd0fb05fdbd3803492d6e9e2570d6dff2a21db575a76b26bff4d565)
+ other providers)
- fixed wrong URL in `openai` python snippet (e.g.
[here](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-338b930b960057f85d0d5dd27032b73cd4834a6a7c7ce4db60af19395b8e56f9),
[here](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-a253bcdfdf33df1ac53a4051a8ce7bb047a99f32618c048754d617cb55815c14))
- fixed DQA `requests` snippet
([here](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-3a47136351b4572144f2fd42a2518da9be108b66fd5dca392d9d899a125b02d9))
**Technically, this PR:**
- splits `makeRequestOptions` in two parts: the async part that does the
model ID resolution (depending on task+provider) and the sync part which
generates the url, headers, body, etc. For snippets we only need the
second part which is a sync call. => new (internal) method
`makeRequestOptionsFromResolvedModel`
- moves most of the logic inside `snippetGenerator`
- logic is: _get inputs_ => _make request options_ => _prepare template
data_ => _iterate over clients_ => _generate snippets_
- **Next:** now that the logic is unified, adapting cURL and JS to use
the same logic should be fairly easy (e.g. "just" need to create the
jinja templates)
- => final goal is to handle all languages/clients/providers with the
same code and swap the templates
- update most providers to use `/chat/completions` endpoint when
`chatCompletion` is enabled
- Previously we were also checking that task is `text-generation` => now
we will also use /chat/completion on "image-text-to-text" models
- that was mostly a bug in existing codebase => detected it thanks to
the snippets
- updated `./packages/inference/package.json` to allow dev mode. Now
running `pnpm run dev` in `@inference` makes it much easier to work with
`@tasks-gen` (no need to rebuild each time I make a change)
---
**EDIT:** ~there is definitely a breaking change in how I handle the
`makeRequestOptions` split (hence the broken CI). Will fix this.~ =>
fixed.
---------
Co-authored-by: Simon Brandeis <[email protected]>
0 commit comments