Skip to content

Commit 59e0df2

Browse files
vvmnnnkvcoyotte508
andauthored
Image to text (#131)
Co-authored-by: Eliott C <[email protected]>
1 parent fef09f2 commit 59e0df2

File tree

10 files changed

+207
-145
lines changed

10 files changed

+207
-145
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,11 @@ await inference.textToImage({
9797
negative_prompt: 'blurry',
9898
}
9999
})
100+
101+
await inference.imageToText({
102+
data: await (await fetch('https://picsum.photos/300/300')).blob(),
103+
model: 'nlpconnect/vit-gpt2-image-captioning',
104+
})
100105
```
101106

102107
There are more features of course, check each library's README!

packages/inference/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,11 @@ await hf.textToImage({
157157
negative_prompt: 'blurry',
158158
}
159159
})
160+
161+
await hf.imageToText({
162+
data: readFileSync('test/cats.png'),
163+
model: 'nlpconnect/vit-gpt2-image-captioning'
164+
})
160165
```
161166

162167
## Supported Tasks
@@ -188,6 +193,7 @@ await hf.textToImage({
188193
- [x] Object detection
189194
- [x] Image segmentation
190195
- [x] Text to image
196+
- [x] Image to text
191197

192198
## Running tests
193199

packages/inference/package.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,15 +45,15 @@
4545
"format": "prettier --write .",
4646
"format:check": "prettier --check .",
4747
"prepublishOnly": "pnpm run build",
48-
"test": "vitest run",
49-
"test:browser": "vitest run --browser.name=chrome --browser.headless",
48+
"test": "vitest run --config vitest.config.ts",
49+
"test:browser": "vitest run --browser.name=chrome --browser.headless --config vitest.config.ts",
5050
"type-check": "tsc"
5151
},
5252
"devDependencies": {
5353
"@types/node": "18.13.0",
5454
"typescript": "4.9.5",
5555
"vite": "^4.1.4",
56-
"vitest": "^0.29.2"
56+
"vitest": "^0.29.8"
5757
},
5858
"resolutions": {}
5959
}

packages/inference/pnpm-lock.yaml

Lines changed: 38 additions & 30 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/inference/src/HfInference.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,6 +608,20 @@ export type TextToImageArgs = Args & {
608608

609609
export type TextToImageReturn = Blob;
610610

611+
export type ImageToTextArgs = Args & {
612+
/**
613+
* Binary image data
614+
*/
615+
data: Blob | ArrayBuffer;
616+
};
617+
618+
export interface ImageToTextReturn {
619+
/**
620+
* The generated caption
621+
*/
622+
generated_text: string;
623+
}
624+
611625
export class HfInference {
612626
private readonly apiKey: string;
613627
private readonly defaultOptions: Options;
@@ -946,6 +960,18 @@ export class HfInference {
946960
return res;
947961
}
948962

963+
/**
964+
* This task reads some image input and outputs the text caption.
965+
*/
966+
public async imageToText(args: ImageToTextArgs, options?: Options): Promise<ImageToTextReturn> {
967+
return (
968+
await this.request<[ImageToTextReturn]>(args, {
969+
...options,
970+
binary: true,
971+
})
972+
)?.[0];
973+
}
974+
949975
/**
950976
* Helper that prepares request arguments
951977
*/

0 commit comments

Comments
 (0)