Skip to content

Conversation

@PeteGillinElastic
Copy link
Member

@PeteGillinElastic PeteGillinElastic commented Aug 19, 2025

This improves the documentation and test coverage for the conversions between numeric types (including numberic format strings, and the auto target type) in the convert ingest processor.

Relates #133153

@PeteGillinElastic PeteGillinElastic marked this pull request as draft August 19, 2025 20:03
@PeteGillinElastic PeteGillinElastic added >docs General docs changes >test Issues or PRs that are addressing/adding tests :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v9.2.0 and removed v9.2.0 labels Aug 19, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 19, 2025

🔍 Preview links for changed docs

@PeteGillinElastic PeteGillinElastic force-pushed the convert-processor-tests-and-docs branch 2 times, most recently from 061e6a4 to dd5ab72 Compare August 20, 2025 09:10
This improves the documentation and test coverage for the conversions
between numeric types (including numberic format strings, and the
`auto` target type) in the `convert` ingest processor.

Relates elastic#133153
@PeteGillinElastic PeteGillinElastic force-pushed the convert-processor-tests-and-docs branch from dd5ab72 to e7c3bac Compare August 20, 2025 09:15
@PeteGillinElastic PeteGillinElastic marked this pull request as ready for review August 20, 2025 10:31
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team labels Aug 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Specifying `float` supports inputs which are `Integer` values, `Long` values (conversions from either `Integer` or
`Long` may lose precision for absolute values greater than 2^24), `Float` values, `Double` values (may lose precision),
`String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`,
`"1.23e2"`, or `"0x1.ecp6"`) or an integer (conversions from `String` may lose precision, and will give positive or
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. I have been vague about which integer formats are accepted. The current situation is that it accepts decimal but not hex, but I was reluctant to document that since it feels like a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any indication on how the Beats or Logstash processors handle this? Might indicate if it's a bug or intended behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we're going to follow up with them, and hopefully we'll be able to clarify the docs one way or another then.

@lukewhiting lukewhiting requested a review from Copilot August 20, 2025 12:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the documentation and test coverage for the convert ingest processor, specifically focusing on numeric type conversions including string representations of numbers and the auto target type.

  • Adds comprehensive test matrix covering all numeric conversion scenarios
  • Enhances documentation with detailed conversion behavior explanations
  • Removes unused variables from existing test methods

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ConvertProcessorTests.java Adds extensive test matrix class for numeric conversions and cleans up existing test code
convert-processor.md Expands documentation with detailed conversion behavior for each numeric type

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid stuff and a big help to give users certainty over the behaviour. LGTM 👍🏻

Specifying `float` supports inputs which are `Integer` values, `Long` values (conversions from either `Integer` or
`Long` may lose precision for absolute values greater than 2^24), `Float` values, `Double` values (may lose precision),
`String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`,
`"1.23e2"`, or `"0x1.ecp6"`) or an integer (conversions from `String` may lose precision, and will give positive or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any indication on how the Beats or Logstash processors handle this? Might indicate if it's a bug or intended behaviour.

Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of drive-by comments


The supported types include: `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`.

Specifying a target `type` of `integer` supports inputs which are `Integer` values, `Long` values in 32-bit signed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making this a table that indicates the supported inputs with notes. e.g.

Type Supported input Notes
Integer Long 32-bit signed integer range
String Must represent an integer in a 32-bit signed ...

or alternatively lists with sub-items

reason: reading this in the preview is very very very dense. see screenshot.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks... As it happens, I did initially try putting the new content in a table, but then I realized there was a disconnect between the new content (for the numeric types) and the existing content (boolean, ip, and auto types). Would you suggest moving the content for all the types into a table?

Incidentally, I found the markdown kind of horrible for tables with long paragraphs of text, although luckily my IDE took care of the worst of it. I assume that's something we live with, on the basis that the readability of the output is far more important?

Copy link
Contributor

@shainaraskas shainaraskas Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that we refactor the content for all of the types for consistency.

totally agree about the pain of formatting in markdown tables. this is a pain that we're currently living with, and I would prioritize a big increase in user readability over raw content readability because hopefully this will not need a TON of regular maintenance. if this was something we needed to tweak all the time, we might think twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've pushed a commit which moves most of this content into a table. Please let me know what you think.

I've kept the paragraph about auto out of the table. This is because that content is a little different: for the other types, we are describing the accepted inputs (and the provisos attached), whereas for the auto type all inputs are accepted and we're describing the logic around figuring out the output type. It also has the benefit of avoiding the widest of the columns as that's the longest paragraph.

Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple small additional recommendations for you :) table looks great.

Comment on lines 24 to 33
Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type. For example,
a field whose value is `"true"` will be converted to its respective boolean type: `true`. A string representing an
integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a
32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may
lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value). A string
representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or
`"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out
of range for a 32-bit floating point value). If a provided field is either not a `String` or a `String` which cannot be
converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field`
will be updated with the unconverted field value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking this out into its own section outside of the table is perfect. I was actually going to suggest it but then didn't for some reason, so you read my mind

this could be helped with a couple of bullets. could refine even more more but this is a lightweight way to make it easier to scan.

Suggested change
Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type. For example,
a field whose value is `"true"` will be converted to its respective boolean type: `true`. A string representing an
integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a
32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may
lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value). A string
representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or
`"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out
of range for a 32-bit floating point value). If a provided field is either not a `String` or a `String` which cannot be
converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field`
will be updated with the unconverted field value.
Specifying `auto` will attempt to convert a string-valued `field` into the closest non-string, non-IP type.
For example:
* A field whose value is `"true"` will be converted to its respective boolean type: `true`
* A string representing an
integer in decimal or hex format (e.g. `"123"` or `"0x7b"`) will be converted to an `Integer` if the number fits in a
32-bit signed integer, else to a `Long` if it fits in a 64-bit signed integer, else to a `Float` (in which case it may
lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value).
* A string
representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or
`"0x1.ecp6"`) will be converted to a `Float` (and may lose precision, and will give positive or negative infinity if out
of range for a 32-bit floating point value).
If a provided field is either not a `String` or a `String` which cannot be
converted, the processor will still process successfully and leave the field value as-is. In such a case, `target_field`
will be updated with the unconverted field value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I've pushed a commit with a version of this. I've reworded slightly to avoid the 'For example' since it's actually an exhaustive list.


| Target `type` | Supported input values |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `integer` | `Integer` values, `Long` values in 32-bit signed integer range, or `String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could consider splicing these up with <br><br> instead of , for a touch more readability, e.g.

Suggested change
| `integer` | `Integer` values, `Long` values in 32-bit signed integer range, or `String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) |
| `integer` | `Integer` values<br><br>`Long` values in 32-bit signed integer range<br><br>`String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that works. I would have used bullet lists here but I don't think you can do that in a table. Good old-fashioned line breaks are the next best thing.

|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `integer` | `Integer` values<br><br>`Long` values in 32-bit signed integer range<br><br>`String` values representing an integer in 32-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) |
| `long` | `Integer` values<br><br>`Long` values<br><br>`String` values representing an integer in 64-bit signed integer range in either decimal format (without a decimal point) or hex format (e.g. `"123"` or `"0x7b"`) |
| `float` | `Integer` values (may lose precision for absolute values greater than 2^24)<br><br>`Long` values (may lose precision for absolute values greater than 2^24)<br><br>`Float` values<br><br>`Double` values (may lose precision)<br><br>`String` values representing a floating point number in decimal, scientific, or hex format (e.g. `"123.0"`, `"123.45"`, `"1.23e2"`, or `"0x1.ecp6"`) or an integer (may lose precision, and will give positive or negative infinity if out of range for a 32-bit floating point value) |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shainaraskas Do you know how superscripts work in the flavour of markdown we're using? This seems to be rendered oddly, with everything between the first and second ^ signs being rendered as superscript in the docs preview.

Image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think I found it in the syntax guide. (Took longer than it might have as search wasn't working...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, just checked the preview 2^24^ worked.

Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last small suggestion for the road. otherwise looks great from a style pov

@PeteGillinElastic
Copy link
Member Author

one last small suggestion for the road. otherwise looks great from a style pov

Thanks again!

@PeteGillinElastic PeteGillinElastic merged commit f973aa2 into elastic:main Aug 22, 2025
34 checks passed
@PeteGillinElastic PeteGillinElastic deleted the convert-processor-tests-and-docs branch August 22, 2025 07:55
pabloem pushed a commit to pabloem/elasticsearch that referenced this pull request Aug 22, 2025
This improves the documentation and test coverage for the conversions
between numeric types (including numberic format strings, and the
`auto` target type) in the `convert` ingest processor.

Relates elastic#133153
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >docs General docs changes Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team >test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants