Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions charts/openobserve/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# OpenObserve helm chart
# OpenObserve Helm Chart

## Amazon EKS

Expand All @@ -11,7 +11,7 @@ You must set a minimum of 2 values:
1. IAM role for the serviceAccount to gain AWS IAM credentials to access s3
- serviceAccount.annotations."eks.amazonaws.com/role-arn"

## Install
## Installation

Install the Cloud Native PostgreSQL Operator. This is a prerequisite for openobserve helm chart. This helm chart sets up a postgres database cluster (1 primary + 1 replica) and uses it as metadata store of OpenObserve.
```shell
Expand Down
40 changes: 40 additions & 0 deletions charts/openobserve/templates/ingester-statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,46 @@ spec:
successThreshold: {{ .Values.probes.ingester.config.readinessProbe.successThreshold | default 1 }}
failureThreshold: {{ .Values.probes.ingester.config.readinessProbe.failureThreshold | default 3 }}
{{- end }}
{{- if .Values.autoscaling.ingester.enabled }}
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# Get credentials from environment variables
USER_EMAIL="$ZO_ROOT_USER_EMAIL"
USER_PASSWORD="$ZO_ROOT_USER_PASSWORD"

# Create base64 encoded credentials for Authorization header
AUTH_HEADER=$(echo -n "${USER_EMAIL}:${USER_PASSWORD}" | base64)

# Disable the node first
echo "Disabling ingester node..."
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}"

# returns 200 if successful and "true" if the node is disabled

# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}"

# returns 200 if successful and "true" if the node is flushed
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl commands lack error handling. If the API calls fail, the script continues without knowing if the operations succeeded, which could lead to data loss.

Suggested change
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}"
# returns 200 if successful and "true" if the node is disabled
# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}"
# returns 200 if successful and "true" if the node is flushed
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}")
if [ "$RESPONSE" -ne 200 ]; then
echo "Error: Failed to disable ingester node. HTTP response code: $RESPONSE"
exit 1
fi
# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}")
if [ "$RESPONSE" -ne 200 ]; then
echo "Error: Failed to flush data from ingester. HTTP response code: $RESPONSE"
exit 1
fi

Copilot uses AI. Check for mistakes.


# We need another API to check if all the data has been moved to s3 or /flush should become async and move files to s3 as well
# e.g /node/wal_status
# Need to build this API. Until then, we will wait for 900 seconds.

# Wait for 900 seconds after flush to ensure data is moved to s3
# 15 minutes for now, since file movement to s3 may take up to 10 minutes
echo "Waiting 900 seconds to flush data..."
sleep 900
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep duration of 900 seconds is a magic number that should be made configurable through values.yaml to allow for environment-specific tuning.

Copilot uses AI. Check for mistakes.


echo "Pre-stop hook completed"
{{- end }}
resources:
{{- toYaml .Values.resources.ingester | nindent 12 }}
envFrom:
Expand Down
4 changes: 2 additions & 2 deletions charts/openobserve/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1015,14 +1015,14 @@ probes:
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 1200 # 20 minutes for now, since we are using pre-stop hook to flush data andit takes up to 10 minutes to flush data to s3
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 1200 # 20 minutes for now, since we are using pre-stop hook to flush data andit takes up to 10 minutes to flush data to s3
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a spelling error in the comment: 'andit' should be 'and it'.

Copilot uses AI. Check for mistakes.

querier:
enabled: false
config:
Expand Down