Skip to content

Commit 34e5352

Browse files
authored
Accept string[] as feature-extraction input (huggingface#1166)
Related to huggingface/huggingface_hub#2824. This PR makes it possible to send a `string[]` instead of `string` as `feature-extraction` inputs. This is already possible in practice in Inference API but not documented. In the past, I've pushed back on this change (see huggingface/huggingface_hub#1745 and huggingface/huggingface_hub#1746 (comment)) but I think it's fine to revisit it now. The main reason I mentioned was that `feature-extraction`'s server-side implementation was mostly a for-loop on the text input so acception a `string[]` would not really improve performances. That been said, there has been quite some improvements since then and especially the `text-embedding-inference` framework.
1 parent 18e56ea commit 34e5352

File tree

2 files changed

+12
-4
lines changed

2 files changed

+12
-4
lines changed

packages/tasks/src/tasks/feature-extraction/inference.ts

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ export type FeatureExtractionOutput = Array<number[]>;
1313
*/
1414
export interface FeatureExtractionInput {
1515
/**
16-
* The text to embed.
16+
* The text or list of texts to embed.
1717
*/
18-
inputs: string;
18+
inputs: FeatureExtractionInputs;
1919
normalize?: boolean;
2020
/**
2121
* The name of the prompt that should be used by for encoding. If not set, no prompt
@@ -34,4 +34,8 @@ export interface FeatureExtractionInput {
3434
truncation_direction?: FeatureExtractionInputTruncationDirection;
3535
[property: string]: unknown;
3636
}
37+
/**
38+
* The text or list of texts to embed.
39+
*/
40+
export type FeatureExtractionInputs = string[] | string;
3741
export type FeatureExtractionInputTruncationDirection = "Left" | "Right";

packages/tasks/src/tasks/feature-extraction/spec/input.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,12 @@
77
"required": ["inputs"],
88
"properties": {
99
"inputs": {
10-
"type": "string",
11-
"description": "The text to embed."
10+
"title": "FeatureExtractionInputs",
11+
"oneOf": [
12+
{ "type": "string" },
13+
{ "type": "array", "items": { "type": "string" } }
14+
],
15+
"description": "The text or list of texts to embed."
1216
},
1317
"normalize": {
1418
"type": "boolean",

0 commit comments

Comments
 (0)