Skip to content

Commit 3f3e281

Browse files
claude[bot]rubenfiszelhugocasahcourdent
authored
feat: Google Cloud Storage support documentation (#1027)
* feat: Add Google Cloud Storage support documentation Add Google Cloud Storage (GCS) integration documentation alongside existing Azure Blob storage sections: - Created new GCS integration documentation file - Updated core object storage documentation to include GCS - Updated persistent storage documentation to mention GCS - Updated files/binary data documentation to include GCS - Updated integrations list to include Google Cloud Storage - Updated data pipelines documentation to mention GCS GCS resource type: `gcloud_storage` with fields `bucket` and `serviceAccountKey` For workspace: serviceAccountKey references `gcloud` resource For instance: serviceAccountKey is JSON object Resolves #1026 Co-authored-by: Ruben Fiszel <[email protected]> * better * fix: backling --------- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: Ruben Fiszel <[email protected]> Co-authored-by: HugoCasa <[email protected]> Co-authored-by: Henri Courdent <[email protected]>
1 parent 2e9e688 commit 3f3e281

File tree

18 files changed

+178
-99
lines changed

18 files changed

+178
-99
lines changed

changelog/2024-05-31-secondary-storage/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ image: ./secondary_storage.png
77
description: Read and write from a storage that is not your main storage by specifying it in the S3 object as "secondary_storage" with the name of it.
88
features:
99
[
10-
'Add additional storages from S3, Azure Blob, AWS OIDC or Azure Workload Identity.',
10+
'Add additional storages from S3, Azure Blob, AWS OIDC or Google Cloud Storage.',
1111
'From script, specify the secondary storage with an object with properties `s3` (path to the file) and `storage` (name of the secondary storage).'
1212
]
1313
docs: /docs/core_concepts/object_storage_in_windmill#secondary-storage
14-
---
14+
---

docs/advanced/18_instance_settings/index.mdx

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ This setting is only available on [Enterprise Edition](/pricing).
109109

110110
This feature has no overlap with the [Workspace object storage](../../core_concepts/38_object_storage_in_windmill/index.mdx#workspace-object-storage).
111111

112-
You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find a button to test settings from a server or from a worker.
112+
You can choose to use S3, Azure Blob Storage, AWS OIDC or Google Cloud Storage. For each you will find a button to test settings from a server or from a worker.
113113

114114
![S3/Azure for Python/Go cache & large logs](../../core_concepts/20_jobs/s3_azure_cache.png "S3/Azure for Python/Go cache & large logs")
115115

@@ -145,6 +145,14 @@ You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find
145145

146146
This setting is only available on [Enterprise Edition](/pricing).
147147

148+
#### Google Cloud Storage
149+
150+
| Field | Description |
151+
|-------|-------------|
152+
| Bucket | The name of your Google Cloud Storage bucket |
153+
| Service Account Key | The service account key for your Google Cloud Storage bucket in JSON format |
154+
155+
148156
### Private Hub base url
149157

150158
Base url of your [private Hub](../../core_concepts/32_private_hub/index.mdx) instance, without trailing slash.

docs/core_concepts/11_persistent_storage/index.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ There are 4 kinds of persistent storage in Windmill:
2424

2525
1. [Small data](#within-windmill-not-recommended) that is relevant in between script/flow execution and can be persisted on Windmill itself.
2626

27-
2. [Object storage for large data](#large-data-s3-r2-minio-azure-blob) such as S3.
27+
2. [Object storage for large data](#large-data-s3-r2-minio-azure-blob-google-cloud-storage) such as S3.
2828

2929
3. [Big structured SQL data](#structured-sql-data-postgres-supabase-neontech) that is critical to your services and that is stored externally on an SQL Database or Data Warehouse.
3030

@@ -56,20 +56,20 @@ All details at:
5656
/>
5757
</div>
5858

59-
## Large data: S3, R2, MinIO, Azure Blob
59+
## Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage
6060

61-
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) storage are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.
61+
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) and [Google Cloud Storage](https://cloud.google.com/storage) are highly scalable and durable object storage services that provide secure, reliable, and cost-effective storage for a wide range of data types and use cases.
6262

63-
Windmill comes with a [native integration with S3 and Azure Blob](./large_data_files.mdx), making it the recommended storage for large objects like files and binary data.
63+
Windmill comes with a [native integration with S3, Azure Blob, and Google Cloud Storage](./large_data_files.mdx), making them the recommended storage for large objects like files and binary data.
6464

6565
![Workspace object storage Infographic](./s3_infographics.png "Workspace object storage Infographic")
6666

6767
All details at:
6868

6969
<div className="grid grid-cols-2 gap-6 mb-4">
7070
<DocCard
71-
title="Large data: S3, R2, MinIO, Azure Blob"
72-
description="Windmill comes with a native integration with S3 and Azure Blob, making it the recommended storage for large objects like files and binary data."
71+
title="Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage"
72+
description="Windmill comes with a native integration with S3, Azure Blob, and Google Cloud Storage, making them the recommended storage for large objects like files and binary data."
7373
href="/docs/core_concepts/persistent_storage/large_data_files"
7474
/>
7575
</div>

docs/core_concepts/11_persistent_storage/large_data_files.mdx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,17 @@ import DocCard from '@site/src/components/DocCard';
22
import Tabs from '@theme/Tabs';
33
import TabItem from '@theme/TabItem';
44

5-
# Large data: S3, R2, MinIO, Azure Blob
5+
# Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage
66

77
This page is part of our section on [Persistent storage & databases](./index.mdx) which covers where to effectively store and manage the data manipulated by Windmill. Check that page for more options on data storage.
88

9-
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.
9+
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) and [Google Cloud Storage](https://cloud.google.com/storage) are highly scalable and durable object storage services that provide secure, reliable, and cost-effective storage for a wide range of data types and use cases.
1010

11-
Windmill comes with a [native integration with S3 and Azure Blob](../38_object_storage_in_windmill/index.mdx), making it the recommended storage for large objects like files and binary data.
11+
Windmill comes with a [native integration with S3, Azure Blob, and Google Cloud Storage](../38_object_storage_in_windmill/index.mdx), making them the recommended storage for large objects like files and binary data.
1212

1313
## Workspace object storage
1414

15-
Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials.
15+
Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or Google Cloud Storage to enable users to read and write from S3 without having to have access to the credentials.
1616

1717
Windmill S3 bucket browser will not work for buckets containing more than 20 files and uploads are limited to files < 50MB. Consider upgrading to Windmill [Enterprise Edition](/pricing) to use this feature with large buckets.
1818

@@ -21,7 +21,7 @@ Windmill S3 bucket browser will not work for buckets containing more than 20 fil
2121
<div className="grid grid-cols-2 gap-6 mb-4">
2222
<DocCard
2323
title="Workspace object storage"
24-
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
24+
description="Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or Google Cloud Storage to enable users to read and write from S3 without having to have access to the credentials."
2525
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
2626
/>
2727
</div>
@@ -173,14 +173,14 @@ For more info on how Data pipelines in Windmill, see [Data pipelines](../27_data
173173
/>
174174
</div>
175175

176-
## Use Amazon S3, R2, MinIO and Azure Blob directly
176+
## Use Amazon S3, R2, MinIO, Azure Blob, and Google Cloud Storage directly
177177

178-
Amazon S3, Cloudflare R2 and MinIO all follow the same API schema and therefore have a [common Windmill resource type](https://hub.windmill.dev/resource_types/42/). Azure Blob has a slightly different API than S3 but works with Windmill as well using its dedicated [resource type](https://hub.windmill.dev/resource_types/137/)
178+
Amazon S3, Cloudflare R2 and MinIO all follow the same API schema and therefore have a [common Windmill resource type](https://hub.windmill.dev/resource_types/42/). Azure Blob and Google Cloud Storage have slightly different APIs than S3 but work with Windmill as well using their dedicated resource types ([Azure Blob](https://hub.windmill.dev/resource_types/137/), [Google Cloud Storage](https://hub.windmill.dev/resource_types/268))
179179

180180
<div className="grid grid-cols-2 gap-6 mb-4">
181181
<DocCard
182182
title="S3 APIs integrations"
183-
description="Use Amazon S3, Cloudflare R2, MinIO and Azure Blob directly within scripts and flows."
183+
description="Use Amazon S3, Cloudflare R2, MinIO, Azure Blob, and Google Cloud Storage directly within scripts and flows."
184184
href="/docs/integrations/s3"
185185
/>
186186
</div>

docs/core_concepts/18_files_binary_data/index.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,18 @@ import TabItem from '@theme/TabItem';
77
In Windmill, JSON is the primary data format used for representing information.
88
Binary data, such as files, are not easy to handle. Windmill provides two options.
99

10-
1. Have a dedicated storage for binary data: S3 or Azure Blob. Windmill has a first class integration with S3 buckets or Azure Blob containers.
10+
1. Have a dedicated storage for binary data: S3, Azure Blob, or Google Cloud Storage. Windmill has a first class integration with S3 buckets, Azure Blob containers, or Google Cloud Storage buckets.
1111
2. If the above is not an option, there's always the possibility to store the binary as base64 encoded string.
1212

1313
## Workspace object storage
1414

15-
The recommended way to store binary data is to upload it to S3 or Azure Blob Storage leveraging [Windmill's workspace object storage](../38_object_storage_in_windmill/index.mdx).
15+
The recommended way to store binary data is to upload it to S3, Azure Blob Storage, or Google Cloud Storage leveraging [Windmill's workspace object storage](../38_object_storage_in_windmill/index.mdx).
1616

1717
Instance and workspace object storage are different from using [S3 resources](../../integrations/s3.mdx) within scripts, flows, and apps, which is free and unlimited. What is exclusive to the [Enterprise](/pricing) version is using the integration of Windmill with S3 that is a major convenience layer to enable users to read and write from S3 without having to have access to the credentials.
1818

1919
:::info
2020

21-
Windmill's integration with S3 and Azure Blob Storage works exactly the same and the features described below works in both cases. The only difference is that you need to select an `azure_blob` resource when setting up the S3 storage in the Workspace settings.
21+
Windmill's integration with S3, Azure Blob Storage, and Google Cloud Storage works exactly the same and the features described below work in all cases. The only difference is that you need to select an `azure_blob` resource for Azure Blob or a `gcloud_storage` resource for Google Cloud Storage when setting up the storage in the Workspace settings.
2222

2323
:::
2424

@@ -49,7 +49,7 @@ All details on Workspace object storage, and how to [read](../38_object_storage_
4949
<div className="grid grid-cols-2 gap-6 mb-4">
5050
<DocCard
5151
title="Workspace object storage"
52-
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
52+
description="Connect your Windmill workspace to your S3 bucket, your Azure Blob storage or your GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
5353
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
5454
/>
5555
</div>

docs/core_concepts/19_rich_display_rendering/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ Learn more at:
212212
<div className="grid grid-cols-2 gap-6 mb-4">
213213
<DocCard
214214
title="Workspace object storage"
215-
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
215+
description="Connect your Windmill workspace to your S3 bucket, your Azure Blob storage or your GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
216216
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
217217
/>
218218
</div>

docs/core_concepts/20_jobs/index.mdx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ For large logs storage (and display) and cache for distributed Python jobs, you
144144

145145
This feature has no overlap with the [Workspace object storage](../38_object_storage_in_windmill/index.mdx#workspace-object-storage).
146146

147-
You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find a button to test settings from a server or from a worker.
147+
You can choose to use S3, Azure Blob Storage, AWS OIDC or Google Cloud Storage. For each you will find a button to test settings from a server or from a worker.
148148

149149
<div className="grid grid-cols-2 gap-6 mb-4">
150150
<DocCard
@@ -174,4 +174,11 @@ You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find
174174
| Access key | string | The primary or secondary access key for the storage account. This key is used to authenticate and provide access to Azure Blob Storage. |
175175
| Tenant ID | string | (optional) The unique identifier (GUID) for your Azure Active Directory (AAD) tenant. Required if using Azure Active Directory for authentication. |
176176
| Client ID | string | (optional) The unique identifier (GUID) for your application registered in Azure AD. Required if using service principal authentication via Azure AD. |
177-
| Endpoint | string | (optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed. |
177+
| Endpoint | string | (optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed. |
178+
179+
#### Google Cloud Storage
180+
181+
| Field | Description |
182+
|-------|-------------|
183+
| Bucket | The name of your Google Cloud Storage bucket |
184+
| Service Account Key | The service account key for your Google Cloud Storage bucket in JSON format |

docs/core_concepts/27_data_pipelines/index.mdx

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Find all details at:
8282
<div className="grid grid-cols-2 gap-6 mb-4">
8383
<DocCard
8484
title="Workspace object storage"
85-
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
85+
description="Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
8686
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
8787
/>
8888
</div>
@@ -167,6 +167,44 @@ def main(input_file: S3Object):
167167
return S3Object(s3=output_file)
168168
```
169169

170+
</TabItem>
171+
<TabItem value="polars (Google Cloud Storage)" label="Polars (Google Cloud Storage)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
172+
173+
```python
174+
import wmill
175+
from wmill import S3Object
176+
import polars as pl
177+
178+
179+
def main(input_file: S3Object):
180+
# this will default to the workspace Google Cloud Storage resource
181+
endpoint_url = wmill.polars_connection_settings().s3fs_args["endpoint_url"]
182+
storage_options = wmill.polars_connection_settings().storage_options
183+
184+
# this will use the designated resource
185+
# storage_options = wmill.polars_connection_settings("<PATH_TO_S3_RESOURCE>").storage_options
186+
187+
# input is a parquet file, we use read_parquet in lazy mode.
188+
# Polars can read various file types, see
189+
# https://pola-rs.github.io/polars/py-polars/html/reference/io.html
190+
input_uri = "{}/{}".format(endpoint_url, input_file["s3"])
191+
192+
input_df = pl.read_parquet(input_uri, storage_options=storage_options).lazy()
193+
194+
# process the Polars dataframe. See Polars docs:
195+
# for dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
196+
# for lazy dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/lazyframe/index.html
197+
output_df = input_df.collect()
198+
print(output_df)
199+
200+
# To write back the result to Google Cloud Storage, Polars needs an s3fs connection
201+
output_file = "output/result.parquet"
202+
output_uri = "{}/{}".format(endpoint_url, output_file)
203+
output_df.write_parquet(output_uri, storage_options=storage_options)
204+
205+
return S3Object(s3=output_file)
206+
```
207+
170208
</TabItem>
171209
<TabItem value="duckdb (Python / AWS S3)" label="DuckDB (Python / AWS S3)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
172210

docs/core_concepts/38_object_storage_in_windmill/index.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ Additionally, for [instance integration](#instance-object-storage), the Enterpri
1414

1515
## Workspace object storage
1616

17-
Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials.
17+
Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or GCS bucket to enable users to read and write from S3 without having to have access to the credentials.
1818

1919
![Workspace object storage infographic](../11_persistent_storage/s3_infographics.png 'Workspace object storage infographic')
2020

2121
Windmill S3 bucket browser will not work for buckets containing more than 20 files and uploads are limited to files < 50MB. Consider upgrading to Windmill [Enterprise Edition](/pricing) to use this feature with large buckets.
2222

23-
Once you've created an [S3 or Azure Blob resource](../../integrations/s3.mdx) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.
23+
Once you've created an [S3, Azure Blob, or Google Cloud Storage resource](../../integrations/s3.mdx) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.
2424

2525
![S3 storage workspace settings](../11_persistent_storage/workspace_settings.png)
2626

@@ -314,7 +314,7 @@ For more info on how to use files and S3 files in Windmill, see [Handling files
314314

315315
Read and write from a storage that is not your main storage by specifying it in the S3 object as "secondary_storage" with the name of it.
316316

317-
From the workspace settings, in tab "S3 Storage", just click on "Add secondary storage", give it a name, and pick a resource from type "S3", "Azure Blob", "AWS OIDC" or "Azure Workload Identity". You can save as many additional storages as you want as long as you give them a different name.
317+
From the workspace settings, in tab "S3 Storage", just click on "Add secondary storage", give it a name, and pick a resource from type "S3", "Azure Blob", "Google Cloud Storage", "AWS OIDC" or "Azure Workload Identity". You can save as many additional storages as you want as long as you give them a different name.
318318

319319
Then from script, you can specify the secondary storage with an object with properties `s3` (path to the file) and `storage` (name of the secondary storage).
320320

@@ -380,7 +380,7 @@ Under [Enterprise Edition](/pricing), instance object storage offers advanced fe
380380

381381
![Instance object storage infographic](./instance_object_storage_infographic.png 'Instance object storage infographic')
382382

383-
This can be configured from the [instance settings](../../advanced/18_instance_settings/index.mdx#instance-object-storage), with configuration options for S3, Azure Blob or AWS OIDC.
383+
This can be configured from the [instance settings](../../advanced/18_instance_settings/index.mdx#instance-object-storage), with configuration options for S3, Azure Blob, Google Cloud Storage, or AWS OIDC.
384384

385385
![S3/Azure for Python/Go cache & large logs](../../core_concepts/20_jobs/s3_azure_cache.png "S3/Azure for Python/Go cache & large logs")
386386

0 commit comments

Comments
 (0)