Skip to content

Fix: Implement backup_to_bucket_ and delete_after_backup for aws-s3 polling mode#49734

Open
MichaelKatsoulis wants to merge 1 commit intoelastic:mainfrom
MichaelKatsoulis:fix/s3-poller-backup-delete
Open

Fix: Implement backup_to_bucket_ and delete_after_backup for aws-s3 polling mode#49734
MichaelKatsoulis wants to merge 1 commit intoelastic:mainfrom
MichaelKatsoulis:fix/s3-poller-backup-delete

Conversation

@MichaelKatsoulis
Copy link
Copy Markdown
Contributor

When Filebeat aws-s3 input runs in polling mode (e.g. non_aws_bucket_name for MinIO/S3-compatible), it can fully process an object (events published and registry state stored) but never perform backup_to_bucket_* and delete_after_backup.

User-visible symptom

  • The object is parsed and events are published.
  • The registry shows the object state as stored: true.
  • S3 traces show no CopyObject / DeleteObject calls for the processed object.

Root cause (bug)

Object finalization (backup/delete) is implemented in (*s3ObjectProcessor).FinalizeS3Object() in s3_objects.go.

  • In the SQS execution path (sqs_s3_event.go), FinalizeS3Object() is collected and executed after successful processing / ACK.
  • In the poller execution path (s3_input.go), the ACK callback used to only checkpoint state (in.registry.AddState(state)) and increment metrics. It did not call FinalizeS3Object().

As a result, polling mode could mark objects as completed in the registry without ever copying/deleting them.

Fix

In s3_input.go (poller worker loop), call FinalizeS3Object() inside the ACK callback after all events for the object are ACKed, and only when processing succeeded (state.Stored == true).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

None. It fixes the problem.

How to test this PR locally

  • Configure filebeat input
filebeat.inputs:
  - type: aws-s3
    # Polling mode (AWS S3)
    bucket_arn: "arn:aws:s3:::my-source-bucket"
    region: "us-east-1"
    number_of_workers: 5
    bucket_list_interval: 60s
    bucket_list_prefix: ""
    access_key_id: "*******"
    secret_access_key: "*******"
    ignore_older: 168h
    # Backup + delete after ACK
    backup_to_bucket_arn: "arn:aws:s3:::my-backup-bucket"
    backup_to_bucket_prefix: "processed/"
    delete_after_backup: true
  • Confirm the source object is removed and a copy exists under the configured backup prefix.

Related issues

Use cases

Screenshots

Logs

@MichaelKatsoulis MichaelKatsoulis requested a review from a team as a code owner March 27, 2026 13:29
@MichaelKatsoulis MichaelKatsoulis added the backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches label Mar 27, 2026
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@MichaelKatsoulis MichaelKatsoulis added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Mar 27, 2026
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 27, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

The AWS S3 input's worker loop now integrates S3 object finalization into the existing ACK callback mechanism. Previously, the ACK callback only checkpointed state and updated metrics. With these changes, when events from an S3 object are successfully processed and acknowledged, the finalization step (backup to destination bucket and conditional deletion) executes after all events are ACKed. Finalization errors are logged and reported as degraded status, but state checkpointing proceeds regardless of finalization outcome. Tests validate the finalization flow with mocked S3 operations.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR addresses the core requirement from #46672: finalization (CopyObject and DeleteObject) now executes in the polling path's ACK callback after successful processing, matching SQS behavior.
Out of Scope Changes check ✅ Passed All changes are scoped to the finalization bug: production code calls FinalizeS3Object in the ACK callback; tests verify backup/delete execution and prevent mock panics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
x-pack/filebeat/input/awss3/s3_input.go (1)

238-256: ⚠️ Potential issue | 🔴 Critical

Unconditional state checkpoint after finalize failure causes permanent backup/delete skip on shutdown.

The ACK callback at lines 241–259 runs asynchronously (acks.go:97) after the context is canceled during shutdown. When finalize() fails with "context canceled" at line 245, the error is logged but AddState() at line 253 still executes unconditionally, marking the object as processed. This prevents re-processing on restart, so the backup/delete is permanently skipped. Only call AddState() if finalization succeeds or is skipped (when state.Stored is false).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@x-pack/filebeat/input/awss3/s3_input.go` around lines 238 - 256, The ACK
callback registered with acks.Add currently calls in.registry.AddState
unconditionally even when finalize() failed (e.g., due to context canceled),
causing objects to be marked done and skipped on restart; update the callback in
the acks.Add closure to only call in.registry.AddState(state) when finalization
either was not required (state.Stored == false) or when finalize() returned nil
(success). Locate the closure referencing finalize, state.Stored and
in.registry.AddState and wrap the AddState call in a conditional that skips
checkpointing when finalize() returned an error (but keep the existing error
logs and status.UpdateStatus calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@x-pack/filebeat/input/awss3/s3_input.go`:
- Around line 238-256: The ACK callback registered with acks.Add currently calls
in.registry.AddState unconditionally even when finalize() failed (e.g., due to
context canceled), causing objects to be marked done and skipped on restart;
update the callback in the acks.Add closure to only call
in.registry.AddState(state) when finalization either was not required
(state.Stored == false) or when finalize() returned nil (success). Locate the
closure referencing finalize, state.Stored and in.registry.AddState and wrap the
AddState call in a conditional that skips checkpointing when finalize() returned
an error (but keep the existing error logs and status.UpdateStatus calls).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4073ab27-93cb-4fe2-8798-4a2e2633f4ea

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7d034 and 0aa5693.

📒 Files selected for processing (2)
  • x-pack/filebeat/input/awss3/s3_input.go
  • x-pack/filebeat/input/awss3/s3_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches bugfix Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filebeat/input aws-s3: processed objects not backed up and not deleted

2 participants