Skip to content

Commit 1c9ebfa

Browse files
authored
Merge branch 'google:main' into main
2 parents 65ceb6a + 582e6f8 commit 1c9ebfa

26 files changed

+6319
-1875
lines changed

examples/gemini/javascript/langchain_quickstart_node/main.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,10 @@ async function invokeGeminiPro() {
3939
}
4040

4141
/**
42-
* Creates a Gemini Pro Vision multimodal chat model, invokes the model with an
42+
* Creates a Gemini Flash multimodal chat model, invokes the model with an
4343
* input containing text and image data, and logs the result.
4444
*/
45-
async function invokeGeminiProVision() {
45+
async function invokeGeminiFlash() {
4646
const model = new ChatGoogleGenerativeAI({
4747
modelName: 'gemini-1.5-flash',
4848
maxOutputTokens: 1024,
@@ -87,7 +87,7 @@ async function embedText() {
8787
*/
8888
async function run() {
8989
invokeGeminiPro();
90-
invokeGeminiProVision();
90+
invokeGeminiFlash();
9191
embedText();
9292
}
9393

site/en/gemini-api/docs/get-started/python.ipynb

Lines changed: 106 additions & 82 deletions
Large diffs are not rendered by default.

site/en/gemini-api/docs/get-started/rest.ipynb

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@
8686
"id": "ywtfO3mO26KO"
8787
},
8888
"source": [
89+
"## Prerequisites\n",
8990
"### Set up your API key\n",
9091
"\n",
9192
"To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.\n",
@@ -99,9 +100,9 @@
99100
"id": "4EsvRU-s3FJx"
100101
},
101102
"source": [
102-
"In Colab, add the key to the secrets manager under the \"🔑\" in the left panel. Give it the name `GOOGLE_API_KEY`. You can then add it as an environment variable to pass the key in your curl call.\n",
103+
"In Colab, add the key to the secrets manager under the \"🔑\" in the left panel. Give it the name `GEMINI_API_KEY`. You can then add it as an environment variable to pass the key in your curl call.\n",
103104
"\n",
104-
"In a terminal, you can just run `GOOGLE_API_KEY=\"Your API Key\"`."
105+
"In a terminal, you can just run `GEMINI_API_KEY=\"Your API Key\"`."
105106
]
106107
},
107108
{
@@ -115,7 +116,7 @@
115116
"import os\n",
116117
"from google.colab import userdata\n",
117118
"\n",
118-
"os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')"
119+
"os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')"
119120
]
120121
},
121122
{
@@ -136,7 +137,7 @@
136137
"### Text-only input\n",
137138
"\n",
138139
"Use the `generateContent` method\n",
139-
"to generate a response from the model given an input message. If the input contains only text, use the `gemini-pro` model."
140+
"to generate a response from the model given an input message. Always start with the `gemini-1.5-flash` model."
140141
]
141142
},
142143
{
@@ -209,7 +210,7 @@
209210
],
210211
"source": [
211212
"%%bash\n",
212-
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GOOGLE_API_KEY \\\n",
213+
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GEMINI_API_KEY \\\n",
213214
" -H 'Content-Type: application/json' \\\n",
214215
" -X POST \\\n",
215216
" -d '{\n",
@@ -319,7 +320,7 @@
319320
],
320321
"source": [
321322
"%%bash\n",
322-
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=${GOOGLE_API_KEY} \\\n",
323+
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=${GEMINI_API_KEY} \\\n",
323324
" -H 'Content-Type: application/json' \\\n",
324325
" -d @request.json 2> /dev/null | grep \"text\""
325326
]
@@ -352,7 +353,7 @@
352353
],
353354
"source": [
354355
"%%bash\n",
355-
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GOOGLE_API_KEY \\\n",
356+
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GEMINI_API_KEY \\\n",
356357
" -H 'Content-Type: application/json' \\\n",
357358
" -X POST \\\n",
358359
" -d '{\n",
@@ -403,7 +404,7 @@
403404
],
404405
"source": [
405406
"%%bash\n",
406-
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GOOGLE_API_KEY \\\n",
407+
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GEMINI_API_KEY \\\n",
407408
" -H 'Content-Type: application/json' \\\n",
408409
" -X POST \\\n",
409410
" -d '{\n",
@@ -470,7 +471,7 @@
470471
}
471472
],
472473
"source": [
473-
"!curl \"https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:streamGenerateContent?alt=sse&key=${GOOGLE_API_KEY}\" \\\n",
474+
"!curl \"https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:streamGenerateContent?alt=sse&key=${GEMINI_API_KEY}\" \\\n",
474475
" -H 'Content-Type: application/json' \\\n",
475476
" --no-buffer \\\n",
476477
" -d '{ \"contents\":[{\"parts\":[{\"text\": \"Write long a story about a magic backpack.\"}]}]}' \\\n",
@@ -518,7 +519,7 @@
518519
],
519520
"source": [
520521
"%%bash\n",
521-
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:countTokens?key=$GOOGLE_API_KEY \\\n",
522+
"curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:countTokens?key=$GEMINI_API_KEY \\\n",
522523
" -H 'Content-Type: application/json' \\\n",
523524
" -X POST \\\n",
524525
" -d '{\n",
@@ -587,7 +588,7 @@
587588
],
588589
"source": [
589590
"%%bash\n",
590-
"curl https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent?key=$GOOGLE_API_KEY \\\n",
591+
"curl https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent?key=$GEMINI_API_KEY \\\n",
591592
" -H 'Content-Type: application/json' \\\n",
592593
" -X POST \\\n",
593594
" -d '{\n",
@@ -623,7 +624,7 @@
623624
],
624625
"source": [
625626
"%%bash\n",
626-
"curl https://generativelanguage.googleapis.com/v1beta/models/embedding-001:batchEmbedContents?key=$GOOGLE_API_KEY \\\n",
627+
"curl https://generativelanguage.googleapis.com/v1beta/models/embedding-001:batchEmbedContents?key=$GEMINI_API_KEY \\\n",
627628
" -H 'Content-Type: application/json' \\\n",
628629
" -X POST \\\n",
629630
" -d '{\n",
@@ -684,7 +685,7 @@
684685
}
685686
],
686687
"source": [
687-
"!curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro?key=$GOOGLE_API_KEY"
688+
"!curl https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash?key=$GEMINI_API_KEY"
688689
]
689690
},
690691
{
@@ -813,7 +814,7 @@
813814
}
814815
],
815816
"source": [
816-
"!curl https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY"
817+
"!curl https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY"
817818
]
818819
}
819820
],

site/en/gemini-api/docs/model-tuning/python.ipynb

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -68,15 +68,6 @@
6868
"In this notebook, you'll learn how to get started with the tuning service using the Python client library for the Gemini API. Here, you'll learn how to tune the text model behind the Gemini API's text generation service."
6969
]
7070
},
71-
{
72-
"cell_type": "markdown",
73-
"metadata": {
74-
"id": "4JXd-HdCsKdZ"
75-
},
76-
"source": [
77-
"**Note**: At this time, tuning is only available for the `gemini-1.0-pro-001` model."
78-
]
79-
},
8071
{
8172
"cell_type": "markdown",
8273
"metadata": {

site/en/gemini-api/docs/vision.ipynb

Lines changed: 39 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -179,15 +179,15 @@
179179
"\n",
180180
"Images must be in one of the following image data [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
181181
"\n",
182-
"- PNG - image/png\n",
183-
"- JPEG - image/jpeg\n",
184-
"- WEBP - image/webp\n",
185-
"- HEIC - image/heic\n",
186-
"- HEIF - image/heif\n",
182+
"- PNG - `image/png`\n",
183+
"- JPEG - `image/jpeg`\n",
184+
"- WEBP - `image/webp`\n",
185+
"- HEIC - `image/heic`\n",
186+
"- HEIF - `image/heif`\n",
187187
"\n",
188188
"Each image is equivalent to 258 tokens.\n",
189189
"\n",
190-
"While there are no specific limits to the number of pixels in an image besides the model’s context window, larger images are scaled down to a maximum resolution of 3072 x 3072 while preserving their original aspect ratio, while smaller images are scaled up to 768 x 768 pixels. There is no cost reduction for images at lower sizes, other than bandwidth, or performance improvement for images at higher resolution.\n",
190+
"While there are no specific limits to the number of pixels in an image besides the model’s context window, larger images are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller images are scaled up to 768x768 pixels. There is no cost reduction for images at lower sizes, other than bandwidth, or performance improvement for images at higher resolution.\n",
191191
"\n",
192192
"For best results:\n",
193193
"\n",
@@ -204,11 +204,11 @@
204204
"source": [
205205
"### Upload an image file using the File API\n",
206206
"\n",
207-
"Use the File API to upload an image of any size. (Always use the File API when the combination of files and system instructions that you intend to send is larger than 20MB.)\n",
207+
"Use the File API to upload an image of any size. (Always use the File API when the combination of files and system instructions that you intend to send is larger than 20 MB.)\n",
208208
"\n",
209-
"**NOTE**: The File API lets you store up to 20GB of files per project, with a per-file maximum size of 2GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available.\n",
209+
"**NOTE**: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available.\n",
210210
"\n",
211-
"Start by calling this [sketch of a jetpack](https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg)."
211+
"Start by downloading this [sketch of a jetpack](https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg)."
212212
]
213213
},
214214
{
@@ -265,7 +265,7 @@
265265
"source": [
266266
"### Verify image file upload and get metadata\n",
267267
"\n",
268-
"You can verify the API successfully stored the uploaded file and get its metadata by calling [files.get](https://ai.google.dev/api/rest/v1beta/files/get) through the SDK. Only the `name` (and by extension, the `uri`) are unique. Use `display_name` to identify files only if you manage uniqueness yourself."
268+
"You can verify the API successfully stored the uploaded file and get its metadata by calling [`files.get`](https://ai.google.dev/api/rest/v1beta/files/get) through the SDK. Only the `name` (and by extension, the `uri`) are unique. Use `display_name` to identify files only if you manage uniqueness yourself."
269269
]
270270
},
271271
{
@@ -331,7 +331,7 @@
331331
"\n",
332332
"<img width=400 src=\"https://ai.google.dev/tutorials/images/colab_upload.png\">\n",
333333
"\n",
334-
"When the combination of files and system instructions that you intend to send is larger than 20MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API:\n"
334+
"When the combination of files and system instructions that you intend to send is larger than 20 MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API:\n"
335335
]
336336
},
337337
{
@@ -394,7 +394,9 @@
394394
"source": [
395395
"### Get bounding boxes\n",
396396
"\n",
397-
"You can ask the model for the coordinates of bounding boxes for objects in images."
397+
"You can ask the model for the coordinates of bounding boxes for objects in images. For object detection, the Gemini model has been trained to provide\n",
398+
"these coordinates as relative widths or heights in range `[0,1]`, scaled by 1000 and converted to an integer. Effectively, the coordinates given are for a\n",
399+
"1000x1000 version of the original image, and need to be converted back to the dimensions of the original image."
398400
]
399401
},
400402
{
@@ -414,6 +416,19 @@
414416
"print(response.text)"
415417
]
416418
},
419+
{
420+
"cell_type": "markdown",
421+
"metadata": {
422+
"id": "b8e422c55df2"
423+
},
424+
"source": [
425+
"To convert these coordinates to the dimensions of the original image:\n",
426+
"\n",
427+
"1. Divide each output coordinate by 1000.\n",
428+
"1. Multiply the x-coordinates by the original image width.\n",
429+
"1. Multiply the y-coordinates by the original image height."
430+
]
431+
},
417432
{
418433
"cell_type": "markdown",
419434
"metadata": {
@@ -436,19 +451,19 @@
436451
"Gemini 1.5 Pro and Flash support up to approximately an hour of video data.\n",
437452
"\n",
438453
"Video must be in one of the following video format [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
439-
" - video/mp4\n",
440-
" - video/mpeg\n",
441-
" - video/mov\n",
442-
" - video/avi\n",
443-
" - video/x-flv\n",
444-
" - video/mpg\n",
445-
" - video/webm\n",
446-
" - video/wmv\n",
447-
" - video/3gpp\n",
454+
" - `video/mp4`\n",
455+
" - `video/mpeg`\n",
456+
" - `video/mov`\n",
457+
" - `video/avi`\n",
458+
" - `video/x-flv`\n",
459+
" - `video/mpg`\n",
460+
" - `video/webm`\n",
461+
" - `video/wmv`\n",
462+
" - `video/3gpp`\n",
448463
"\n",
449464
"The File API service currently extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference.\n",
450465
"\n",
451-
"**NOTE:** The finer details of fast action sequences may be lost at the 1FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality.\n",
466+
"**NOTE:** The finer details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality.\n",
452467
"\n",
453468
"Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video.\n",
454469
"\n",
@@ -468,7 +483,7 @@
468483
"source": [
469484
"### Upload a video file to the File API\n",
470485
"\n",
471-
"**NOTE**: The File API lets you store up to 20GB of files per project, with a per-file maximum size of 2GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but they cannot be downloaded using any API. It is available at no cost in all regions where the Gemini API is available.\n",
486+
"**NOTE**: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but they cannot be downloaded using any API. It is available at no cost in all regions where the Gemini API is available.\n",
472487
"\n",
473488
"The File API accepts video file formats directly. This example uses the short NASA film [\"Jupiter's Great Red Spot Shrinks and Grows\"](https://www.youtube.com/watch?v=JDi4IdtvDVE0). Credit: Goddard Space Flight Center (GSFC)/David Ladd (2018).\n",
474489
"\n",
@@ -520,7 +535,7 @@
520535
"source": [
521536
"### Verify file upload and check state\n",
522537
"\n",
523-
"Verify the API has successfully received the files by calling the `files.get` method.\n",
538+
"Verify the API has successfully received the files by calling the [`files.get`](https://ai.google.dev/api/rest/v1beta/files/get) method.\n",
524539
"\n",
525540
"**NOTE**: Video files have a `State` field in the File API. When a video is uploaded, it will be in the `PROCESSING` state until it is ready for inference. Only `ACTIVE` files can be used for model inference."
526541
]

0 commit comments

Comments
 (0)