Skip to content

Commit 72f40e0

Browse files
new doc- Configure Multiple Storage Accounts and update - Storage Configuration
1 parent dedbb80 commit 72f40e0

12 files changed

+300
-208
lines changed

docs/.pages

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ nav:
99
- Architecture: architecture.md
1010
- HA Deployment: ha_deployment.md
1111
- Environment Variables: environment-variables.md
12-
- Storage: storage.md
12+
- Data Management: data-management
1313
- User Guide: user-guide
1414
- Operator Guide: operator-guide
1515
- Performance Optimization: performance.md

docs/data-management/.pages

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
nav:
2+
3+
- Data Management Overview: index.md
4+
- Storage Configuration: storage.md
5+
- Configure Multiple Object Storage Accounts: configure-multiple-storage-accounts.md
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
This guide explains how to configure OpenObserve to store data in multiple object storage accounts. It also covers how to set environment variables to specify which stream should be stored in each account.
2+
3+
## Why Use Multiple Object Storage Accounts
4+
5+
- Using multiple object storage accounts or buckets in OpenObserve helps you:
6+
- Store data across different regions or providers.
7+
- Separate critical and non-critical data into different accounts or buckets.
8+
- Optimize storage cost and meet compliance requirements.
9+
- Improve write throughput by avoiding single-bucket performance limits. For example, some providers such as AWS S3 impose a throughput limit per bucket. Using multiple buckets helps distribute the load and increase overall ingestion throughput.
10+
- Use flexible combinations of accounts and buckets:
11+
12+
- A single account with multiple buckets.
13+
- Multiple accounts with one or more buckets.
14+
- A combination of specific accounts and buckets for each stream.
15+
16+
This setup is useful for organizations that manage large-scale log ingestion across diverse environments.
17+
18+
## Steps to Configure Multiple Object Storage Accounts
19+
20+
Let us say, you want to store logs based on their importance:
21+
22+
- Critical logs should go to an AWS S3 bucket in us-east-1
23+
- Internal logs should go to a MinIO bucket in us-west-1
24+
25+
Follow these steps to configure multiple object storage accounts in OpenObserve:
26+
27+
28+
### Prerequisites
29+
30+
- At least two S3-compatible object storage accounts.
31+
- Valid credentials, regions, and bucket names for each.
32+
33+
### Step 1: Set Storage Account Environment Variables
34+
Define the environment variables required to connect to each object storage account:
35+
```
36+
ZO_S3_ACCOUNTS="acc1,acc2"
37+
ZO_S3_PROVIDER="aws,minio"
38+
ZO_S3_SERVER_URL="https://s3.amazonaws.com,https://minio.example.com"
39+
ZO_S3_REGION_NAME="us-east-1,us-west-1"
40+
ZO_S3_ACCESS_KEY="key1,key2"
41+
ZO_S3_SECRET_KEY="secret1,secret2"
42+
ZO_S3_BUCKET_NAME="critical-logs,internal-logs"
43+
ZO_S3_BUCKET_PREFIX="logs/"
44+
```
45+
46+
!!! info "Important"
47+
- The first account is treated as the default account. Configure the old account as the first one.
48+
- All variables must contain the same number of comma-separated values.
49+
50+
### Step 2: Configure Stream Strategy
51+
Set the `ZO_S3_STREAM_STRATEGY` environment variable to control how streams are assigned to accounts.
52+
53+
You can choose one of the following strategies:
54+
55+
**Use Default Account** <br>
56+
```
57+
ZO_S3_STREAM_STRATEGY=""
58+
```
59+
All streams are stored in the first configured account.
60+
61+
**File Name Hashing** <br>
62+
```
63+
ZO_S3_STREAM_STRATEGY="file_hash"
64+
```
65+
The storage account is selected based on a hash of the file name.
66+
67+
**Stream Name Hashing** <br>
68+
```
69+
ZO_S3_STREAM_STRATEGY="stream_hash"
70+
```
71+
The storage account is selected based on a hash of the stream name.
72+
73+
**Static Stream-to-Account Mapping** <br>
74+
```
75+
ZO_S3_STREAM_STRATEGY="payments:acc1,operations:acc1,internal:acc2"
76+
```
77+
Each stream is mapped explicitly to a storage account.
78+
79+
### Step 3: Verify Configuration
80+
81+
1. Restart OpenObserve.
82+
2. Ingest test logs.
83+
3. Confirm logs are written to the correct storage buckets.

docs/data-management/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
The Data Management section helps you configure how OpenObserve stores ingested stream data and metadata. It also includes guidance on optimizing storage across different object storage accounts or providers.
2+
3+
**Learn more:**
4+
5+
- [Storage Configuration](storage.md)
6+
- [Configure Multiple Object Storage Accounts](configure-multiple-storage-accounts.md)

docs/data-management/storage.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
This guide explains how to configure data and metadata storage in OpenObserve. The information applies to both the open-source and enterprise versions.
2+
3+
## Overview
4+
There are 2 primary items that need to be stored in OpenObserve.
5+
6+
- Ingested stream data
7+
- Metadata for ingested stream data
8+
9+
By default:
10+
11+
- Metadata is always stored on disk using **SQLite** in **Local mode**.
12+
- Metadata is always stored on disk using **postgres** in **Cluster mode**.
13+
- Stream data can be stored on disk or S3-compatible object storage such as Amazon S3, minIO, Google GCS, Alibaba OSS, or Tencent COS.
14+
15+
## Storage Modes
16+
17+
- OpenObserve runs in **Local mode** by default.
18+
- To enable **Cluster mode**, set the environment variable `LOCAL_MODE=false`.
19+
- In **Local mode**, stream data can be stored in S3 by setting `ZO_LOCAL_MODE_STORAGE=s3`.
20+
- GCS and OSS support the S3 SDK and can be treated as S3-compatible storages. Azure Blob storage is also supported.
21+
22+
## Data Storage Format
23+
24+
Stream data is stored in Parquet format. Parquet is columnar storage format optimized for storage efficiency and query performance.
25+
26+
## Stream Data Storage Options
27+
28+
### Disk
29+
30+
Disk is default storage place for stream data. **Ensure that sufficient disk space is available for storing stream data.**
31+
32+
33+
### Amazon S3
34+
35+
To use Amazon S3 for storing stream data:
36+
37+
1. Create the bucket in S3 first.
38+
2. Provide AWS credentials through one of the supported AWS SDK mechanisms:
39+
40+
- Set environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. This is not recommended due to security concerns.
41+
- Use AWS CLI credentials in `~/.aws/credentials`.
42+
- Use EC2 instance metadata for instances with IAM roles, or assign IAM roles directly to ECS or Fargate tasks. These roles are accessed through the Instance Metadata Service (IMDS or IMDSv2). ECS is not recommended for stateful workloads.
43+
- Use IAM Roles for service Accounts in Amazon EKS.
44+
45+
46+
### MinIO
47+
To use MinIO for storing stream data, first create the bucket in MinIO.
48+
Then set the following environment variables:
49+
50+
| Environment Variable | Value | Description |
51+
| -------------------- | ----- | ----------------------------------------------- |
52+
| ZO_S3_SERVER_URL | - | MinIO server address |
53+
| ZO_S3_REGION_NAME | - | Region name, such as `us-west-1` |
54+
| ZO_S3_ACCESS_KEY | - | Access key |
55+
| ZO_S3_SECRET_KEY | - | Secret key |
56+
| ZO_S3_BUCKET_NAME | - | Bucket name |
57+
| ZO_S3_PROVIDER | minio | Used to specify settings like `force_style=true` |
58+
59+
60+
### Openstack Swift
61+
To use OpenStack Swift for storing stream data, first create the bucket in Swift.
62+
Then set the following environment variables:
63+
64+
| Environment Variable | Value | Description |
65+
| ------------------------- | ----- | ----------------------------------------------- |
66+
| ZO_S3_SERVER_URL | - | Swift server address, such as `https://us-west-1.example.com` |
67+
| ZO_S3_REGION_NAME | - | Region name, such as `us-west-1` |
68+
| ZO_S3_ACCESS_KEY | - | Access key |
69+
| ZO_S3_SECRET_KEY | - | Secret key |
70+
| ZO_S3_BUCKET_NAME | - | Bucket name |
71+
| ZO_S3_FEATURE_HTTP1_ONLY | true | Enables compatibility with Swift |
72+
| ZO_S3_PROVIDER | s3 | Enables S3-compatible API |
73+
| AWS_EC2_METADATA_DISABLED | true | Disables EC2 metadata access, which is not supported by Swift |
74+
75+
76+
### Google GCS
77+
To use GCS for storing stream data, first create the bucket in GCS.
78+
79+
**Using the S3-compatible API:**
80+
81+
| Environment Variable | Value | Description |
82+
| ------------------------ | -------| --------------------------------------------------------------- |
83+
| ZO_S3_SERVER_URL | - | GCS server address. Should be sent to `https://storage.googleapis.com` |
84+
| ZO_S3_REGION_NAME | - | GCS region name, or set to `auto` |
85+
| ZO_S3_ACCESS_KEY | - | Access key |
86+
| ZO_S3_SECRET_KEY | - | Secret key |
87+
| ZO_S3_BUCKET_NAME | - | Bucket name |
88+
| ZO_S3_FEATURE_HTTP1_ONLY | true | Required for compatibility |
89+
| ZO_S3_PROVIDER | s3 | Enables S3-compatible API |
90+
91+
Refer to [GCS AWS migration documentation]((https://cloud.google.com/storage/docs/aws-simple-migration)) for more information.
92+
93+
**Using GCS directly:**
94+
95+
| Environment Variable | Value | Description |
96+
| ------------------------ | -------| ----------------------------------------------------------------------- |
97+
| ZO_S3_SERVER_URL | - | GCS server address. should be: `https://storage.googleapis.com` |
98+
| ZO_S3_REGION_NAME | - | region name, gcs region name, or: `auto` |
99+
| ZO_S3_ACCESS_KEY | - | Path to gcp json private key if not available through instance metadata |
100+
| ZO_S3_BUCKET_NAME | - | bucket name |
101+
| ZO_S3_PROVIDER | gcs | Use GCS API |
102+
103+
OpenObserve uses the [object_store crate](https://docs.rs/object_store/0.10.1/object_store/gcp/struct.GoogleCloudStorageBuilder.html) to initialize the storage configuration. It calls the with_env() function by default. If the ZO_S3_ACCESS_KEY variable is set, OpenObserve additionally uses the with_service_account_path() function to load the GCP service account key.
104+
105+
### Alibaba OSS (aliyun)
106+
To use Alibaba OSS for storing stream data, first create the bucket in Alibaba Cloud.
107+
Then set the following environment variables:
108+
109+
| Environment Variable | Value | Description |
110+
| ------------------------------ | ----- | --------------------------------------------------------------- |
111+
| ZO_S3_SERVER_URL | - | OSS endpoint, such as `https://bucketname.oss-ap-southeast-1.aliyuncs.com` |
112+
| ZO_S3_REGION_NAME | - | OSS region name, such as `oss-cn-beijing`. |
113+
| ZO_S3_BUCKET_NAME | - | Bucket name |
114+
| ZO_S3_ACCESS_KEY | - | Access key |
115+
| ZO_S3_SECRET_KEY | - | Secret key |
116+
| ZO_S3_FEATURE_FORCE_HOSTED_STYLE | true | Enables hosted-style addressing |
117+
118+
Refer to [Alibaba OSS region and endpoint documentation](https://help.aliyun.com/zh/oss/user-guide/regions-and-endpoints).
119+
120+
### Tencent COS
121+
To use Tencent COS for storing stream data, first create the bucket in Tencent Cloud.
122+
Then set the following environment variables:
123+
124+
| Environment Variable | Value | Description |
125+
| -------------------- | ----- | ---------------------------- |
126+
| ZO_S3_SERVER_URL | - | COS endpoint address |
127+
| ZO_S3_REGION_NAME | - | COS region name |
128+
| ZO_S3_ACCESS_KEY | - | Access key |
129+
| ZO_S3_SECRET_KEY | - | Secret key |
130+
| ZO_S3_BUCKET_NAME | - | Bucket name |
131+
132+
Refer to [Tencent COS documentation](https://cloud.tencent.com/document/product/436/37421).
133+
134+
### UCloud US3
135+
To use UCloud US3 for storing stream data, first create the bucket in UCloud.
136+
Then set the following environment variables:
137+
138+
| Environment Variable | Value | Description |
139+
| -------------------- | ----- | ---------------------------------------------------- |
140+
| ZO_S3_SERVER_URL | - | US3 endpoint, such as `http://internal.s3-sg.ufileos.com` |
141+
| ZO_S3_ACCESS_KEY | - | Access key |
142+
| ZO_S3_SECRET_KEY | - | Secret key |
143+
| ZO_S3_BUCKET_NAME | - | Bucket name |
144+
| ZO_S3_FEATURE_HTTP1_ONLY | true | Required for HTTP1 compatibility |
145+
146+
Refer to [UCloud S3 documentation](https://docs.ucloud.cn/ufile/s3/s3_introduction).
147+
148+
### Baidu BOS
149+
To use Baidu BOS for storing stream data, first create the bucket in Baidu Cloud.
150+
Then set the following environment variables:
151+
152+
| Environment Variable | Value | Description |
153+
| -------------------- | ----- | ---------------------------------------------------- |
154+
| ZO_S3_SERVER_URL | - | BOS endpoint, such as `https://s3.bj.bcebos.com` |
155+
| ZO_S3_REGION_NAME | - | BOS region name, such as `bj` |
156+
| ZO_S3_ACCESS_KEY | - | Access key |
157+
| ZO_S3_SECRET_KEY | - | Secret key |
158+
| ZO_S3_BUCKET_NAME | - | Bucket name |
159+
160+
Refer to [Baidu BOS documentation](https://cloud.baidu.com/doc/BOS/s/xjwvyq9l4).
161+
162+
### Azure Blob
163+
164+
OpenObserve can use azure blob for storing stream data. Following environment variables needs to be setup:
165+
166+
| Environment Variable | Value | Description |
167+
| -------------------------- | -------------------- | -------------------------------------------- |
168+
| ZO_S3_PROVIDER | azure | Enables Azure Blob storage support |
169+
| ZO_LOCAL_MODE_STORAGE | s3 | Required only if running in single node mode |
170+
| AZURE_STORAGE_ACCOUNT_NAME | Storage account name | Need to provide mandatorily |
171+
| AZURE_STORAGE_ACCOUNT_KEY | Access key | Need to provide mandatorily |
172+
| ZO_S3_BUCKET_NAME | Blob Container name | Need to provide mandatorily |
173+
174+
175+
## Metadata Storage
176+
177+
OpenObserve supports multiple metadata store backends, configurable using the `ZO_META_STORE` environment variable.
178+
179+
### SQLite
180+
- Set `ZO_META_STORE=sqlite`.
181+
- No additional configuration is required.
182+
- Suitable for single-node installations.
183+
- This is generally not recommended as losing the SQLite data will make OpenObserve inoperable.
184+
185+
### PostgreSQL
186+
- Set `ZO_META_STORE=postgres`.
187+
- Recommended for production deployments due to reliability and scalability.
188+
- The default Helm chart (after February 23, 2024) uses [cloudnative-pg](https://cloudnative-pg.io/) to create a postgres cluster (primary + replica) which is used as the meta store. These instances provide high availability and backup support.
189+
190+
### etcd (Deprecated)
191+
- Set `ZO_META_STORE=etcd`.
192+
- While etcd is used as the cluster coordinator, it was also the default metadata store in Helm charts released before 23 February 2024. This configuration is now deprecated. Helm charts released after 23 February 2024 use PostgreSQL as the default metadata store.
193+
194+
### MySQL (Deprecated)
195+
- Set `ZO_META_STORE=mysql`.
196+
- Deprecated.
197+
- Use PostgreSQL instead.

docs/integration/cloudflare.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ To stream Cloudflare logs, log in to OpenObserve and follow these steps:
3131
1. From the left menu, select **Data Sources > Custom > Logs > Curl**.
3232
2. Extract the following details from the sample curl command:
3333

34-
![Extract endpoint and credentials](../../docs/images/extract-creds-from-data-sources.png)
34+
![Extract endpoint and credentials](images/extract-creds-from-data-sources.png)
3535

3636
- **Endpoint**: `https://api.openobserve.ai/api/<organization_name>/<stream_name>/_json`.
3737
Replace `organization_name` with the organization name shown at the top right corner of the screen. Replace the `stream_name` with **cloudflare_logs**.
@@ -71,9 +71,9 @@ return new Response("Hello from Cloudflare!", {
7171
4. Select **Deploy** and note the generated URL, such as `log-generator.example-subdomain.workers.dev`.
7272
5. You may optionally add a route under **Workers Routes**, such as `example.com/log/*`.
7373
6. Generate traffic by visiting the Worker URL, linking it on a webpage, or enabling access through a known route.
74-
![Generate traffic visiting the Worker URL](../../docs/images/cloudflare-worker-setup.gif)
74+
![Generate traffic visiting the Worker URL](images/cloudflare-worker-setup.gif)
7575

76-
![Cloudflare worker.js](../../docs/images/cloudflare-worker-js.png)
76+
![Cloudflare worker.js](images/cloudflare-worker-js.png)
7777

7878
**Option 3: Existing traffic on Business or Enterprise plans**
7979

@@ -281,7 +281,7 @@ You should see entries similar to the following:
281281
"requests": 1
282282
}
283283
```
284-
![Verify the Cloudflare and OpenObserve integration](../../docs/images/cloudflare-verify-ingestion.gif)
284+
![Verify the Cloudflare and OpenObserve integration](images/cloudflare-verify-ingestion.gif)
285285

286286
Logs appear immediately when using Logpush. If you are testing with GraphQL, the simulated data appears first. Real logs are usually visible within 5 to 10 minutes.
287287

0 commit comments

Comments
 (0)