Skip to content

Conversation

prabhatsharma
Copy link
Contributor

add preStop hook for autoscaling down

@prabhatsharma prabhatsharma requested a review from Copilot July 20, 2025 21:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a preStop hook for ingester pods to support graceful autoscaling down by allowing data to be properly flushed before pod termination. The key changes include adding a preStop lifecycle hook that disables the node, flushes data from memory to WAL, and waits for S3 synchronization, along with extending termination grace periods to accommodate the longer shutdown process.

  • Adds preStop lifecycle hook to ingester pods for graceful shutdown during autoscaling
  • Extends terminationGracePeriodSeconds from 30 seconds to 20 minutes for data flushing
  • Minor documentation improvements to chart README

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
charts/openobserve/values.yaml Increases termination grace periods to allow time for data flushing
charts/openobserve/templates/ingester-statefulset.yaml Implements preStop hook with node disable, data flush, and wait logic
charts/openobserve/README.md Minor formatting improvements to headings

Comment on lines 1018 to 1025
terminationGracePeriodSeconds: 1200 # 20 minutes for now, since we are using pre-stop hook to flush data andit takes up to 10 minutes to flush data to s3
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 1200 # 20 minutes for now, since we are using pre-stop hook to flush data andit takes up to 10 minutes to flush data to s3
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a spelling error in the comment: 'andit' should be 'and it'.

Copilot uses AI. Check for mistakes.

# Wait for 900 seconds after flush to ensure data is moved to s3
# 15 minutes for now, since file movement to s3 may take up to 10 minutes
echo "Waiting 900 seconds to flush data..."
sleep 900
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep duration of 900 seconds is a magic number that should be made configurable through values.yaml to allow for environment-specific tuning.

Copilot uses AI. Check for mistakes.

Comment on lines 141 to 151
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}"

# returns 200 if successful and "true" if the node is disabled

# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}"

# returns 200 if successful and "true" if the node is flushed
Copy link
Preview

Copilot AI Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl commands lack error handling. If the API calls fail, the script continues without knowing if the operations succeeded, which could lead to data loss.

Suggested change
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}"
# returns 200 if successful and "true" if the node is disabled
# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
curl -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}"
# returns 200 if successful and "true" if the node is flushed
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/enable?value=false" \
-H "Authorization: Basic ${AUTH_HEADER}")
if [ "$RESPONSE" -ne 200 ]; then
echo "Error: Failed to disable ingester node. HTTP response code: $RESPONSE"
exit 1
fi
# Flush all data from memory to WAL. This does not flush data from ingester to s3.
echo "Flushing data from ingester..."
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "http://localhost:{{ .Values.config.ZO_HTTP_PORT }}/node/flush" \
-H "Authorization: Basic ${AUTH_HEADER}")
if [ "$RESPONSE" -ne 200 ]; then
echo "Error: Failed to flush data from ingester. HTTP response code: $RESPONSE"
exit 1
fi

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants