-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Fix allow_duplicates edge case bug in append processor #134319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix allow_duplicates edge case bug in append processor #134319
Conversation
allow_duplicates: false should work the same whether there is an existing list (with some values), an empty list, or no list at all.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @joegallo, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I would agree that this is a bug.
I also agree this is a bug, but can you describe the issue in the PR body so that it's searchable? |
💔 Backport failed
You can use sqren/backport to manually backport by running |
All the backport PRs are up, so I'm dropping |
#134377 is an automatically generated backport PR that was merged without CI actually having run. It turns out that there are changes in my newly added tests there that aren't compatible with the 9.1 branch. That would have shaken out if CI had actually run, but it didn't, so I'm fixing it in post. This is tangentially related to #134319.
This PR fixes a discrepancy in the way the
append
processor handles the"allow_duplicates": false
option (note: reminder that the default for theappend
processor is that"allow_duplicates"
istrue
).Specifically, the handling of the 'data to be appended' varies according to value of the 'data to be appended to'. To wit, if there are duplicates in the data to be appended (on its own, regardless of the data to be appended to) then:
null
-ish), duplicates in the data to be appended will NOT be removed (<-- this is the edge case / bug).The code is pretty understandably written as if the case we're handling is for example when the data to be appended to is
["foo", "bar"]
and the data to be appended is["bar", "baz"]
(or["bar", "baz", "bar"]
), but it mistakenly shortcuts in the case of a non-existent/null
data to be appended to and just copies the entirety of the data to be appended which means that it correctly handles["bar", "baz"]
but not["bar", "baz", "bar"]
(and so duplicate items sneak through even thoughallow_duplicates
is set tofalse
).Anyway, this PR unifies those cases so that duplicates are processed the same way regardless of the presence/absence of the data to be appended to list.
I discovered this odd behavior in the
append
processor'sallow_duplicates
handling while reviewing #105718, and it seemed important to handle it separately from that PR: first, to call it out as a bug, and second, (since it's a bug) to put up a PR that fixes it and can be backported to all the currently-maintained branches.Personally, I think this PR is best reviewed one commit at a time (the better to see the expected behavior, and then the fix) but you can do it how you'd like.
Oh, and one more thing... I'm of the opinion that this is 'just' a bug (and one that warrants fixing), but in full https://www.hyrumslaw.com fashion, you could argue with me that actually this is a breaking change.