-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools #3081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huangjiahua
wants to merge
6
commits into
googleapis:main
Choose a base branch
from
huangjiahua:feat/cloud-storage-source
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1ee10c9
feat(source/cloud-storage): add Cloud Storage source with list_object…
huangjiahua 41fca7c
feat(source/cloud-storage): cap ListObjects page size at 1000 and Rea…
huangjiahua 39be640
feat(source/cloud-storage): raise ReadObject size cap from 1 MiB to 8…
huangjiahua 397baca
refactor(source/cloud-storage): move max-read constant into source pa…
huangjiahua 3b18a6f
test(source/cloud-storage): log teardown iterator errors and cover pa…
huangjiahua 4919821
feat(tool/cloud-storage-read-object): return UTF-8 text, reject binary
huangjiahua File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| title: "Cloud Storage" | ||
| weight: 1 | ||
| --- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| --- | ||
| title: "Cloud Storage Source" | ||
| linkTitle: "Source" | ||
| type: docs | ||
| weight: 1 | ||
| description: > | ||
| Cloud Storage is Google Cloud's managed service for storing unstructured objects (files) in buckets. Toolbox connects at the project level, allowing tools to list, read, and manage objects across any bucket the credentials can access. | ||
| no_list: true | ||
| --- | ||
|
|
||
| ## About | ||
|
|
||
| [Cloud Storage][gcs-docs] is Google Cloud's managed service for storing | ||
| unstructured data (blobs) in containers called *buckets*. Buckets live in a GCP | ||
| project; objects are addressed by `gs://<bucket>/<object>`. | ||
|
|
||
| If you are new to Cloud Storage, you can try the | ||
| [quickstart][gcs-quickstart] to create a bucket and upload your first objects. | ||
|
|
||
| The Cloud Storage source is configured at the **project** level. Individual | ||
| tools take a `bucket` parameter, so a single configured source can operate | ||
| against any bucket the underlying credentials are authorized for. | ||
|
|
||
| [gcs-docs]: https://cloud.google.com/storage/docs | ||
| [gcs-quickstart]: https://cloud.google.com/storage/docs/discover-object-storage-console | ||
|
|
||
| ## Available Tools | ||
|
|
||
| {{< list-tools >}} | ||
|
|
||
| ## Requirements | ||
|
|
||
| ### IAM Permissions | ||
|
|
||
| Cloud Storage uses [Identity and Access Management (IAM)][iam-overview] to | ||
| control access to buckets and objects. Toolbox uses your | ||
| [Application Default Credentials (ADC)][adc] to authorize and authenticate when | ||
| interacting with Cloud Storage. | ||
|
|
||
| In addition to [setting the ADC for your server][set-adc], ensure the IAM | ||
| identity has the appropriate role for the tools being exposed. Common roles: | ||
|
|
||
| - `roles/storage.objectViewer` — read-only access to objects (sufficient for | ||
| `cloud-storage-list-objects` and `cloud-storage-read-object`) | ||
| - `roles/storage.objectUser` — read and write access to objects | ||
| - `roles/storage.admin` — full control, including bucket management | ||
|
|
||
| See [Cloud Storage IAM roles][gcs-iam] for the full list. | ||
|
|
||
| [iam-overview]: https://cloud.google.com/storage/docs/access-control/iam | ||
| [adc]: https://cloud.google.com/docs/authentication#adc | ||
| [set-adc]: https://cloud.google.com/docs/authentication/provide-credentials-adc | ||
| [gcs-iam]: https://cloud.google.com/storage/docs/access-control/iam-roles | ||
|
|
||
| ## Example | ||
|
|
||
| ```yaml | ||
| kind: source | ||
| name: my-gcs-source | ||
| type: "cloud-storage" | ||
| project: "my-project-id" | ||
| ``` | ||
|
|
||
| ## Reference | ||
|
|
||
| | **field** | **type** | **required** | **description** | | ||
| |-----------|:--------:|:------------:|---------------------------------------------------------------------------------| | ||
| | type | string | true | Must be "cloud-storage". | | ||
| | project | string | true | Id of the GCP project the configured source is associated with (e.g. "my-project-id"). | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| title: "Tools" | ||
| weight: 2 | ||
| --- |
56 changes: 56 additions & 0 deletions
56
docs/en/integrations/cloud-storage/tools/cloud-storage-list-objects.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| --- | ||
| title: "cloud-storage-list-objects" | ||
| type: docs | ||
| weight: 1 | ||
| description: > | ||
| A "cloud-storage-list-objects" tool lists objects in a Cloud Storage bucket, with optional prefix filtering and delimiter-based grouping. | ||
| --- | ||
|
|
||
| ## About | ||
|
|
||
| A `cloud-storage-list-objects` tool returns the objects in a | ||
| [Cloud Storage bucket][gcs-buckets]. It supports the usual GCS listing options: | ||
|
|
||
| - `prefix` — filter results to objects whose names begin with the given string. | ||
| - `delimiter` — group results by this character (typically `/`) so subdirectory-like | ||
| "common prefixes" are returned separately from the leaf objects. | ||
| - `max_results` / `page_token` — paginate through large listings. | ||
|
|
||
| The response is a JSON object with `objects` (the full object metadata as | ||
| returned by the Cloud Storage API — fields such as `Name`, `Size`, `ContentType`, | ||
| `Updated`, `StorageClass`, `MD5`, etc.), `prefixes` (the common prefixes when | ||
| `delimiter` is set), and `nextPageToken` (empty when there are no more pages). | ||
|
|
||
| [gcs-buckets]: https://cloud.google.com/storage/docs/buckets | ||
|
|
||
| ## Compatible Sources | ||
|
|
||
| {{< compatible-sources >}} | ||
|
|
||
| ## Parameters | ||
|
|
||
| | **parameter** | **type** | **required** | **description** | | ||
| |---------------|:--------:|:------------:|-------------------------------------------------------------------------------------------------------------------| | ||
| | bucket | string | true | Name of the Cloud Storage bucket to list objects from. | | ||
| | prefix | string | false | Filter results to objects whose names begin with this prefix. | | ||
| | delimiter | string | false | Delimiter used to group object names (typically '/'). When set, common prefixes are returned as `prefixes`. | | ||
| | max_results | integer | false | Maximum number of objects to return per page. A value of 0 uses the API default (1000); the maximum allowed is 1000. | | ||
| | page_token | string | false | A previously-returned page token for retrieving the next page of results. | | ||
|
|
||
| ## Example | ||
|
|
||
| ```yaml | ||
| kind: tool | ||
| name: list_objects | ||
| type: cloud-storage-list-objects | ||
| source: my-gcs-source | ||
| description: Use this tool to list objects in a Cloud Storage bucket. | ||
| ``` | ||
|
|
||
| ## Reference | ||
|
|
||
| | **field** | **type** | **required** | **description** | | ||
| |-------------|:--------:|:------------:|---------------------------------------------------------| | ||
| | type | string | true | Must be "cloud-storage-list-objects". | | ||
| | source | string | true | Name of the Cloud Storage source to list objects from. | | ||
| | description | string | true | Description of the tool that is passed to the LLM. | | ||
58 changes: 58 additions & 0 deletions
58
docs/en/integrations/cloud-storage/tools/cloud-storage-read-object.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| --- | ||
| title: "cloud-storage-read-object" | ||
|
huangjiahua marked this conversation as resolved.
|
||
| type: docs | ||
| weight: 2 | ||
| description: > | ||
| A "cloud-storage-read-object" tool reads the UTF-8 text content of a Cloud Storage object, optionally constrained to a byte range. | ||
| --- | ||
|
|
||
| ## About | ||
|
|
||
| A `cloud-storage-read-object` tool fetches the bytes of a single | ||
| [Cloud Storage object][gcs-objects] and returns them as plain UTF-8 text. | ||
|
|
||
| Only text objects are supported today: if the object bytes (or the requested | ||
| range) are not valid UTF-8 the tool returns an agent-fixable error. This is | ||
| because the MCP tool-result channel currently only carries text; binary | ||
| payloads will be supported once MCP can carry embedded resources. | ||
|
|
||
| Reads are capped at **8 MiB** per call to protect the server's memory and keep | ||
| LLM contexts manageable; objects or ranges larger than that are rejected with | ||
| an agent-fixable error. Use the optional `range` parameter to read a slice of | ||
| a larger object. | ||
|
|
||
| This tool is intended for small-to-medium textual content an LLM can process | ||
| directly. For bulk downloads of large files to the local filesystem, use | ||
| `cloud-storage-download-object` (coming in a follow-up release). | ||
|
|
||
| [gcs-objects]: https://cloud.google.com/storage/docs/objects | ||
|
|
||
| ## Compatible Sources | ||
|
|
||
| {{< compatible-sources >}} | ||
|
|
||
| ## Parameters | ||
|
|
||
| | **parameter** | **type** | **required** | **description** | | ||
| |---------------|:--------:|:------------:|---------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | bucket | string | true | Name of the Cloud Storage bucket containing the object. | | ||
| | object | string | true | Full object name (path) within the bucket, e.g. `path/to/file.txt`. | | ||
| | range | string | false | Optional HTTP byte range, e.g. `bytes=0-999` (first 1000 bytes), `bytes=-500` (last 500 bytes), or `bytes=500-` (from byte 500 to end). Empty reads the full object. | | ||
|
|
||
| ## Example | ||
|
|
||
| ```yaml | ||
| kind: tool | ||
| name: read_object | ||
| type: cloud-storage-read-object | ||
| source: my-gcs-source | ||
| description: Use this tool to read the content of a Cloud Storage object. | ||
| ``` | ||
|
|
||
| ## Reference | ||
|
|
||
| | **field** | **type** | **required** | **description** | | ||
| |-------------|:--------:|:------------:|---------------------------------------------------------| | ||
| | type | string | true | Must be "cloud-storage-read-object". | | ||
| | source | string | true | Name of the Cloud Storage source to read the object from. | | ||
| | description | string | true | Description of the tool that is passed to the LLM. | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.