Skip to content

[filebeat][ABS] - Fix CSV decoder JSON escaping in azure-blob-storage input#50097

Merged
ShourieG merged 4 commits intoelastic:mainfrom
ShourieG:bugfic/abs_ValueDecoder
Apr 14, 2026
Merged

[filebeat][ABS] - Fix CSV decoder JSON escaping in azure-blob-storage input#50097
ShourieG merged 4 commits intoelastic:mainfrom
ShourieG:bugfic/abs_ValueDecoder

Conversation

@ShourieG
Copy link
Copy Markdown
Contributor

@ShourieG ShourieG commented Apr 13, 2026

Type of change

  • Bug

Proposed commit message

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

NOTE

We changed an existing test case because after adding decodeValue it builds a map[string]any, then calls json.Marshal. Go's json.Marshal sorts map keys alphabetically so the expected test results needed to align accordingly.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

None

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

…e input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Add a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly
@ShourieG ShourieG self-assigned this Apr 13, 2026
@ShourieG ShourieG requested a review from a team as a code owner April 13, 2026 16:08
@ShourieG ShourieG added Filebeat Filebeat bugfix Team:Security-Service Integrations Security Service Integrations Team input:azure-blob-storage Team:Security-Cloud Services Label for the Security Data Experience - Cloud Services team. labels Apr 13, 2026
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 13, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@ShourieG ShourieG added the backport-active-all Automated backport with mergify to all the active branches label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 698ec0e5-78cb-4161-bd77-af97a10ffbc5

📥 Commits

Reviewing files that changed from the base of the PR and between 15d1529 and 44a6169.

📒 Files selected for processing (1)
  • x-pack/filebeat/input/azureblobstorage/job.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • x-pack/filebeat/input/azureblobstorage/job.go

📝 Walkthrough

Walkthrough

Fixes a CSV-decoding bug in the Azure Blob Storage input that produced malformed JSON when CSV fields contained double quotes. The decoder now escapes values via json.Marshal. job.decode gains a decoder.ValueDecoder branch that iterates with Next()/DecodeValue(), creates events with createEvent(string(msg), offset) and sets the last flag from !dec.More(). Adds a csv_quoted_values test and txn_quoted.json fixture; updates mock CSV JSON record field ordering.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

@ShourieG
Copy link
Copy Markdown
Contributor Author

/test

@ShourieG ShourieG merged commit 34634a3 into elastic:main Apr 14, 2026
31 of 34 checks passed
@ShourieG ShourieG deleted the bugfic/abs_ValueDecoder branch April 14, 2026 07:19
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 8.19 9.3 9.4

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 14, 2026

backport 8.19 9.3 9.4

✅ Backports have been created

Details

Cherry-pick of 34634a3 has failed:

On branch mergify/bp/8.19/pr-50097
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit 34634a3e8.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   changelog/fragments/1776095974-fix-csv-decoder-json-escaping-abs-input.yaml
	modified:   x-pack/filebeat/input/azureblobstorage/decoding_test.go
	modified:   x-pack/filebeat/input/azureblobstorage/mock/data_files.go
	new file:   x-pack/filebeat/input/azureblobstorage/testdata/txn_quoted.csv
	new file:   x-pack/filebeat/input/azureblobstorage/testdata/txn_quoted.json

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   x-pack/filebeat/input/azureblobstorage/job.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify bot pushed a commit that referenced this pull request Apr 14, 2026
… input (#50097)

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

(cherry picked from commit 34634a3)

# Conflicts:
#	x-pack/filebeat/input/azureblobstorage/job.go
mergify bot pushed a commit that referenced this pull request Apr 14, 2026
… input (#50097)

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

(cherry picked from commit 34634a3)
mergify bot pushed a commit that referenced this pull request Apr 14, 2026
… input (#50097)

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

(cherry picked from commit 34634a3)
ShourieG added a commit that referenced this pull request Apr 14, 2026
… input (#50097) (#50110)

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

(cherry picked from commit 34634a3)

Co-authored-by: Shourie Ganguly <shourie.ganguly@elastic.co>
ShourieG added a commit that referenced this pull request Apr 14, 2026
… input (#50097) (#50109)

x-pack/filebeat/input/azureblobstorage: Fix CSV decoder JSON escaping in azure-blob-storage input

The azure-blob-storage input's decode path only matched the
decoder.Decoder interface, calling Decode() which builds JSON via
string concatenation without escaping field values. CSV values
containing double quotes (e.g. RFC 2045 MIME type parameters)
produce malformed JSON, causing downstream ingest pipeline failures.

Added a decoder.ValueDecoder switch case ahead of decoder.Decoder,
matching the pattern already used by the GCS input. ValueDecoder's
DecodeValue() uses json.Marshal which handles escaping correctly.

(cherry picked from commit 34634a3)

Co-authored-by: Shourie Ganguly <shourie.ganguly@elastic.co>
ShourieG added a commit that referenced this pull request Apr 14, 2026
…ng in azure-blob-storage input (#50108)

* [filebeat][ABS] - Fix CSV decoder JSON escaping in azure-blob-storage input (#50097)

Fix CSV decoder type references and broken decodeValue in azure-blob-storage input

The azure-blob-storage input's CSV decode path builds JSON via string
concatenation without escaping field values. CSV values containing double
quotes (e.g. RFC 2045 MIME type parameters) produce malformed JSON, causing
downstream ingest pipeline failures. Add a valueDecoder switch case ahead of
the plain decoder case so the CSV decoder's decodeValue() method is used,
which serializes via json.Marshal for correct escaping. Also fix decodeValue()
itself, which previously cleared its state then called decode(), causing a
'decode called before next' error.
---------

Co-authored-by: Shourie Ganguly <shourie.ganguly@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bugfix Filebeat Filebeat input:azure-blob-storage Team:Security-Cloud Services Label for the Security Data Experience - Cloud Services team. Team:Security-Service Integrations Security Service Integrations Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants