-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Improve docs and tests for convert processor
#133160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve docs and tests for convert processor
#133160
Conversation
🔍 Preview links for changed docs |
061e6a4 to
dd5ab72
Compare
This improves the documentation and test coverage for the conversions between numeric types (including numberic format strings, and the `auto` target type) in the `convert` ingest processor. Relates elastic#133153
dd5ab72 to
e7c3bac
Compare
|
Pinging @elastic/es-docs (Team:Docs) |
|
Pinging @elastic/es-data-management (Team:Data Management) |
| Specifying `float` supports inputs which are `Integer` values, `Long` values (conversions from either `Integer` or | ||
| `Long` may lose precision for absolute values greater than 2^24), `Float` values, `Double` values (may lose precision), | ||
| `String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, | ||
| `"1.23e2"`, or `"0x1.ecp6"`) or an integer (conversions from `String` may lose precision, and will give positive or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. I have been vague about which integer formats are accepted. The current situation is that it accepts decimal but not hex, but I was reluctant to document that since it feels like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any indication on how the Beats or Logstash processors handle this? Might indicate if it's a bug or intended behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're going to follow up with them, and hopefully we'll be able to clarify the docs one way or another then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the documentation and test coverage for the convert ingest processor, specifically focusing on numeric type conversions including string representations of numbers and the auto target type.
- Adds comprehensive test matrix covering all numeric conversion scenarios
- Enhances documentation with detailed conversion behavior explanations
- Removes unused variables from existing test methods
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ConvertProcessorTests.java | Adds extensive test matrix class for numeric conversions and cleans up existing test code |
| convert-processor.md | Expands documentation with detailed conversion behavior for each numeric type |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
lukewhiting
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid stuff and a big help to give users certainty over the behaviour. LGTM 👍🏻
| Specifying `float` supports inputs which are `Integer` values, `Long` values (conversions from either `Integer` or | ||
| `Long` may lose precision for absolute values greater than 2^24), `Float` values, `Double` values (may lose precision), | ||
| `String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, | ||
| `"1.23e2"`, or `"0x1.ecp6"`) or an integer (conversions from `String` may lose precision, and will give positive or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any indication on how the Beats or Logstash processors handle this? Might indicate if it's a bug or intended behaviour.
Typo fix from copilot Co-authored-by: Copilot <[email protected]>
shainaraskas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of drive-by comments
|
|
||
| The supported types include: `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. | ||
|
|
||
| Specifying a target `type` of `integer` supports inputs which are `Integer` values, `Long` values in 32-bit signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider making this a table that indicates the supported inputs with notes. e.g.
| Type | Supported input | Notes |
|---|---|---|
| Integer | Long | 32-bit signed integer range |
| String | Must represent an integer in a 32-bit signed ... |
or alternatively lists with sub-items
reason: reading this in the preview is very very very dense. see screenshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... As it happens, I did initially try putting the new content in a table, but then I realized there was a disconnect between the new content (for the numeric types) and the existing content (boolean, ip, and auto types). Would you suggest moving the content for all the types into a table?
Incidentally, I found the markdown kind of horrible for tables with long paragraphs of text, although luckily my IDE took care of the worst of it. I assume that's something we live with, on the basis that the readability of the output is far more important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that we refactor the content for all of the types for consistency.
totally agree about the pain of formatting in markdown tables. this is a pain that we're currently living with, and I would prioritize a big increase in user readability over raw content readability because hopefully this will not need a TON of regular maintenance. if this was something we needed to tweak all the time, we might think twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've pushed a commit which moves most of this content into a table. Please let me know what you think.
I've kept the paragraph about auto out of the table. This is because that content is a little different: for the other types, we are describing the accepted inputs (and the provisos attached), whereas for the auto type all inputs are accepted and we're describing the logic around figuring out the output type. It also has the benefit of avoiding the widest of the columns as that's the longest paragraph.
Use a `:::{tip}` as suggested by @shainaraskas.
Co-authored-by: shainaraskas <[email protected]>
shainaraskas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple small additional recommendations for you :) table looks great.
| Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type. For example, | ||
| a field whose value is `"true"` will be converted to its respective boolean type: `true`. A string representing an | ||
| integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a | ||
| 32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may | ||
| lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value). A string | ||
| representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or | ||
| `"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out | ||
| of range for a 32-bit floating point value). If a provided field is either not a `String` or a `String` which cannot be | ||
| converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field` | ||
| will be updated with the unconverted field value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breaking this out into its own section outside of the table is perfect. I was actually going to suggest it but then didn't for some reason, so you read my mind
this could be helped with a couple of bullets. could refine even more more but this is a lightweight way to make it easier to scan.
| Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type. For example, | |
| a field whose value is `"true"` will be converted to its respective boolean type: `true`. A string representing an | |
| integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a | |
| 32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may | |
| lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value). A string | |
| representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or | |
| `"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out | |
| of range for a 32-bit floating point value). If a provided field is either not a `String` or a `String` which cannot be | |
| converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field` | |
| will be updated with the unconverted field value. | |
| Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type. | |
| For example: | |
| * A field whose value is `"true"` will be converted to its respective boolean type: `true` | |
| * A string representing an | |
| integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a | |
| 32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may | |
| lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value). | |
| * A string | |
| representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or | |
| `"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out | |
| of range for a 32-bit floating point value). | |
| If a provided field is either not a `String` or a `String` which cannot be | |
| converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field` | |
| will be updated with the unconverted field value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I've pushed a commit with a version of this. I've reworded slightly to avoid the 'For example' since it's actually an exhaustive list.
|
|
||
| | Target `type` | Supported input values | | ||
| |---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `integer` | `Integer` values, `Long` values in 32-bit signed integer range, or `String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could consider splicing these up with <br><br> instead of , for a touch more readability, e.g.
| | `integer` | `Integer` values, `Long` values in 32-bit signed integer range, or `String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) | | |
| | `integer` | `Integer` values<br><br>`Long` values in 32-bit signed integer range<br><br>`String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, that works. I would have used bullet lists here but I don't think you can do that in a table. Good old-fashioned line breaks are the next best thing.
| |---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `integer` | `Integer` values<br><br>`Long` values in 32-bit signed integer range<br><br>`String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) | | ||
| | `long` | `Integer` values<br><br>`Long` values<br><br>`String` values representing an integer in 64-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) | | ||
| | `float` | `Integer` values (may lose precision for absolute values greater than 2^24)<br><br>`Long` values (may lose precision for absolute values greater than 2^24)<br><br>`Float` values<br><br>`Double` values (may lose precision)<br><br>`String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or `"0x1.ecp6"`) or an integer (may lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shainaraskas Do you know how superscripts work in the flavour of markdown we're using? This seems to be rendered oddly, with everything between the first and second ^ signs being rendered as superscript in the docs preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I think I found it in the syntax guide. (Took longer than it might have as search wasn't working...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, just checked the preview 2^24^ worked.
shainaraskas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last small suggestion for the road. otherwise looks great from a style pov
Co-authored-by: shainaraskas <[email protected]>
Co-authored-by: shainaraskas <[email protected]>
Thanks again! |
This improves the documentation and test coverage for the conversions between numeric types (including numberic format strings, and the `auto` target type) in the `convert` ingest processor. Relates elastic#133153
This improves the documentation and test coverage for the conversions between numeric types (including numberic format strings, and the
autotarget type) in theconvertingest processor.Relates #133153