You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and view the text examples at the end of this guide.
34
+
35
+
This guide shows how to use the File API to upload a media file and include it in a `GenerateContent` call to the Gemini API.
36
+
37
+
*/
38
+
39
+
/* Markdown (render)
40
+
## Setup
41
+
### Install SDK and set-up the client
42
+
43
+
### API Key Configuration
44
+
45
+
To ensure security, avoid hardcoding the API key in frontend code. Instead, set it as an environment variable on the server or local machine.
46
+
47
+
When using the Gemini API client libraries, the key will be automatically detected if set as either `GEMINI_API_KEY` or `GOOGLE_API_KEY`. If both are set, `GOOGLE_API_KEY` takes precedence.
48
+
49
+
For instructions on setting environment variables across different operating systems, refer to the official documentation: [Set API Key as Environment Variable](https://ai.google.dev/gemini-api/docs/api-key#set-api-env-var)
50
+
51
+
In code, the key can then be accessed as:
52
+
53
+
```js
54
+
ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. [thinking notebook](https://github.com/google-gemini/cookbook/blob/main/quickstarts-js/Get_started_thinking.ipynb) for more details and in particular learn how to switch the thinking off).
68
+
69
+
For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.
The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/generate-content#method:-models.generatecontent) or [`model.streamGenerateContent`](https://ai.google.dev/api/generate-content#method:-models.streamgeneratecontent).
80
+
81
+
The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.
82
+
83
+
First, you will prepare a sample image to upload to the API.
The `response` object confirms that the File API stored the specified `displayName` for the uploaded file along with a `uri` that can be used to reference the file in future Gemini API calls. You can use the `response` to track how uploaded files are associated with their URIs.
123
+
124
+
Depending on your use case, you might want to store the URIs in structures like a plain JavaScript object or a database.
125
+
126
+
*/
127
+
128
+
/* Markdown (render)
129
+
## Get file
130
+
131
+
After uploading the file, you can verify the API has successfully received the files by calling `files.get`.
132
+
133
+
It lets you get the file metadata that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique. Only use the `displayName` to identify files if you manage uniqueness yourself.
After uploading the file, you can make `GenerateContent` requests that reference the file by providing the URI. In the Python SDK you can pass the returned object directly.
151
+
152
+
Here you create a prompt that starts with text and includes the uploaded image.
153
+
*/
154
+
155
+
// [CODE STARTS]
156
+
response=awaitai.models.generateContent({
157
+
model: MODEL_ID,
158
+
contents: [
159
+
{
160
+
fileData: {
161
+
fileUri: uploadedFile.uri,
162
+
mimeType: "image/jpeg"
163
+
}
164
+
},
165
+
"Describe the image with a creative description."
166
+
]
167
+
});
168
+
169
+
console.log(response.text);
170
+
171
+
// [CODE ENDS]
172
+
173
+
/* Output Sample
174
+
175
+
Behold, a visionary concept scribbled with bold blue ink on classic ruled notebook paper: the "JETPACK BACKPACK."
176
+
177
+
This ingenious invention, proudly titled at the top, presents itself as an unassuming rounded backpack, sleek in its sketched simplicity. But the arrows pointing to its various features tell a tale of discreet power and futuristic practicality.
178
+
179
+
On the left, we learn of its ergonomic design, promising "PADDED STRAP SUPPORT" for comfort, even when soaring above the daily commute. Below, the future of charging is here, with "USB-C CHARGING," though adventurers will need to plan their flights carefully, as it offers a "15-MIN BATTERY LIFE."
180
+
181
+
To the right, the backpack's true magic is revealed. It's designed to be "LIGHTWEIGHT" and, crucially, "LOOKS LIKE A NORMAL BACKPACK," allowing its wearer to blend seamlessly into any crowd before making a grand exit. It's capacious too, designed to fit an "18" LAPTOP." The core of its power lies beneath, with "RETRACTABLE BOOSTERS" ready to emerge when needed, unleashing whimsical, swirling plumes of "STEAM-POWERED" propulsion – a promise of "GREEN/CLEAN" flight that leaves nothing but a charming vapor trail in its wake.
182
+
183
+
This handwritten blueprint captures the dream of everyday flight, balancing the fantastical with thoughtful, if ambitious, specifications.
184
+
185
+
*/
186
+
187
+
/* Markdown (render)
188
+
## Delete files
189
+
190
+
Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`.
191
+
*/
192
+
193
+
// [CODE STARTS]
194
+
awaitai.files.delete({name: uploadedFile.name});
195
+
console.log(`Deleted ${uploadedFile.name}.`);
196
+
// [CODE ENDS]
197
+
198
+
/* Output Sample
199
+
200
+
Deleted files/wrcset8sfug1.
201
+
202
+
*/
203
+
204
+
/* Markdown (render)
205
+
## Supported text types
206
+
207
+
As well as supporting media uploads, the File API can be used to embed text files, such as Python code, or Markdown files, into your prompts.
208
+
209
+
This example shows you how to load a markdown file into a prompt using the File API.
Before you start writing, when contributing to the Gemini API Cookbook, you should do the following:
255
+
256
+
1. **Sign the Contributor License Agreement (CLA):** All contributions require you (or your employer) to sign a CLA. Visit [https://cla.developers.google.com/](https://cla.developers.google.com/) to check your current agreements or sign a new one.
257
+
2. **Review Style Guides:**
258
+
* Take a look at the [technical writing style guide](https://developers.google.com/style), specifically reading the [highlights](https://developers.google.com/style/highlights) to understand common feedback points.
259
+
* Check out the relevant [style guide](https://google.github.io/styleguide/) for the language you will be using (e.g., Python).
260
+
3. **File an Issue:** For any new content submission (beyond small fixes), you **must** file an [issue](https://github.com/google-gemini/cookbook/issues) on GitHub. This allows for discussion of your idea, guidance on structure, and ensures your concept has support before you invest time in writing.
261
+
262
+
Deleted files/qvq94l4c5jhz
263
+
264
+
*/
265
+
266
+
/* Markdown (render)
267
+
Some common text formats are automatically detected, such as `text/x-python`, `text/html` and `text/markdown`. If you are using a file that you know is text, but is not automatically detected by the API as such, you can specify the MIME type as `text/plain` explicitly.
This C++ program is a minimal example demonstrating how to load and use Google's **Gemma large language model** for text generation, with a specific focus on **constrained decoding**.
311
+
312
+
Here's a breakdown of what it does:
313
+
314
+
1. **Argument Parsing:**
315
+
* It processes command-line arguments using `gcpp::LoaderArgs`, `gcpp::InferenceArgs`, and `gcpp::AppArgs`. These likely configure aspects like the model path, batch size, number of threads, etc.
316
+
* It specifically looks for a `"--reject"` flag. Any integer arguments following `--reject` are interpreted as token IDs that should *never* be generated by the model. This is the core of its constrained decoding demonstration.
317
+
318
+
2. **Model Initialization:**
319
+
* It sets up the necessary environment for the Gemma model, including:
320
+
* `gcpp::BoundedTopology` and `gcpp::NestedPools`: For managing threading and parallel computations (likely using Highway, a SIMD library).
321
+
* `gcpp::MatMulEnv`: An environment for matrix multiplication, essential for neural networks.
322
+
* `gcpp::Gemma model`: The actual Gemma model instance, loaded based on `loader` arguments.
323
+
* `gcpp::KVCache kv_cache`: A Key-Value Cache, crucial for efficient LLM inference by storing past activations, avoiding recomputation.
324
+
325
+
3. **Prompt Definition and Tokenization:**
326
+
* It defines a fixed input prompt: `"Write a greeting to the world."`
327
+
* It then uses the model's tokenizer (`model.Tokenizer()`) to convert this string prompt into a sequence of integer token IDs.
328
+
329
+
4. **Text Generation with Callbacks:**
330
+
* **Random Number Generator:** Initializes a `std::mt19937` (Mersenne Twister) random number generator, seeded by `std::random_device`, which is used for probabilistic token sampling during generation (e.g., when `temperature` > 0).
331
+
* **`stream_token` Callback:** A lambda function is defined to be called every time the model generates a new token.
332
+
* If the token is part of the initial prompt (before generation truly starts), it currently does nothing (the placeholder comment `// print feedback` suggests it *could* be used for that).
333
+
* Once actual generation begins, it decodes the integer token ID back into human-readable text using `model.Tokenizer().Decode()` and prints it immediately to `std::cout` using `std::flush`. This provides a streaming output experience.
334
+
* **`accept_token` Callback (Constrained Decoding):** Another lambda function is defined. This is the core of the `--reject` functionality.
335
+
* Before the model chooses a token, this callback is invoked for potential candidate tokens.
336
+
* It checks if the candidate token is present in the `reject_tokens` set (populated from the `--reject` command-line argument).
337
+
* If the token *is* in `reject_tokens`, it returns `false`, effectively preventing the model from ever outputting that specific token. This "constrains" the output.
338
+
* **`model.Generate()`:** The program then calls `model.Generate()`, passing in:
339
+
* `runtime_config`: Containing generation parameters like `max_generated_tokens` (1024), `temperature` (1.0), the random generator, the `stream_token` callback, and crucially, the `accept_token` callback.
340
+
* `tokens`: The initial prompt tokens.
341
+
* `kv_cache`: For efficient inference.
342
+
343
+
**In Summary:**
344
+
345
+
This program loads the Gemma large language model, initializes it with various parameters (some configurable via command-line), tokenizes a hardcoded prompt ("Write a greeting to the world."), and then generates a response. Its primary demonstration feature is **constrained decoding**, allowing the user to specify (via the `--reject` flag) certain token IDs that the model must *never* output during the generation process. The generated text is streamed to standard output token by token.
346
+
347
+
Deleted files/ih9sjrj7h21x
348
+
349
+
*/
350
+
351
+
/* Markdown (render)
352
+
## Next Steps
353
+
### Useful API references:
354
+
355
+
For more information about the File API, check its [API reference](https://ai.google.dev/api/files). You will also find more code samples [in this folder](https://github.com/google-gemini/cookbook/blob/main/quickstarts-js/).
356
+
357
+
### Related examples
358
+
359
+
Check those examples using the File API to give you more ideas on how to use that very useful feature:
360
+
* Share [Voice memos](https://github.com/google-gemini/cookbook/blob/main/examples/Voice_memos.ipynb) with Gemini API and brainstorm ideas
361
+
* Analyze videos to [classify](https://github.com/google-gemini/cookbook/blob/main/examples/Analyze_a_Video_Classification.ipynb) or [summarize](https://github.com/google-gemini/cookbook/blob/main/examples/Analyze_a_Video_Summarization.ipynb) them
362
+
363
+
### Continue your discovery of the Gemini API
364
+
365
+
If you're not already familiar with it, learn how [tokens are counted](https://github.com/google-gemini/cookbook/blob/main/quickstarts-js/Counting_Tokens.js). Then check how to use the File API to use [Audio](https://github.com/google-gemini/cookbook/blob/main/quickstarts-js/Audio.js) or [Video_understanding](https://github.com/google-gemini/cookbook/blob/main/quickstarts-js/Video_understanding.js) files with the Gemini API.
Copy file name to clipboardExpand all lines: quickstarts-js/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,5 +18,5 @@ Stay tuned, more JavaScript notebooks are on the way!
18
18
| --- | --- | --- | --- | --- |
19
19
| Get Started | A comprehensive introduction to the Gemini JS/TS SDK, demonstrating features such as text and multimodal prompting, token counting, system instructions, safety filters, multi-turn chat, output control, function calling, content streaming, file uploads, and using URL or YouTube video context. | Explore core Gemini capabilities in JS/TS |[](https://aistudio.google.com/apps/bundled/get_started?showPreview=true)| <imgsrc="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/javascript/javascript-original.svg"alt="JS"width="20"/> [Get_Started.js](./Get_Started.js)|
20
20
| Image Output | Generate and iterate on images using Gemini’s multimodal capabilities. Learn to use text+image responses, edit images mid-conversation, and handle multiple image outputs with chat-style prompting. | Image generation, multimodal output, image editing, iterative refinement |[](https://aistudio.google.com/apps/bundled/get_started_image_out?showPreview=true)| <imgsrc="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/javascript/javascript-original.svg"alt="JS"width="20"/> [ImageOutput.js](./ImageOutput.js)|
21
-
21
+
| File API | Learn how to upload, use, retrieve, and delete files (text, image, audio, code) with the Gemini File API for multimodal prompts. | File upload, multimodal prompts, text/code/media files |[](https://aistudio.google.com/apps/bundled/file_api?showPreview=true)| <imgsrc="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/javascript/javascript-original.svg"alt="JS"width="20"/> [File_API.js](./File_API.js)|
0 commit comments