Implement document length tracking for DataInput-backed JSON parsers via subclass#1575
Conversation
…fast from JsonFactory Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>
…c-length tracking Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>
…via subclass Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>
|
I would want to see performance testing (esp. for non-validating case) since this seems it could wreck performance (overrides and indirection) -- f.ex with https://github.com/FasterXML/jackson-benchmarks/ . |
|
@cowtowncoder I'll put together a benchmark but would expect it to have quite an impact. The parser itself is already not very efficient becuase it reads byte by byte and I would recommend that users try to get an InputStream instead of a DataInput instance to parse if it at all possible.
|
I would not assume that without measurements, due to inlining effects. As to failing, configurability -- while possible, I don't think it's necessary at this point. |
|
@cowtowncoder I created a benchmark suite and compared the v2.21.1 parser with one that has the overridable shared readUnsignedByte method and a 3rd parser that applies the size limit. The results for all 3 parsers are almost the same. So close that I nearly suspect that I have a mistake in the test suite. I tested with different JSON sizes and Java versions and in no run was there any significant diff in perf between the 3 parsers. I'll double check my test suite logic over the next few days. |
|
@cowtowncoder this change seems to work ok. I see a small overhead when the max doc len is enabled but it is only a few percent. One example run (Limited is the benchmark with the check enabled) |
|
@pjfanning Sounds good so far. Could you also run test against unchanged build from 3.x (or official 3.1.0 which should be about the same)? |
|
@cowtowncoder You want to create a new benchmark that uses Jackson 3 instead of Jackson 2? |
I thought this PR is for 3.x? Wasn't thinking we implement this functionality for 2.x, only 3.x But if we are talking about changes 2.x, comparison to unmodified version (2.21.1 would be fine) vs new versions. |
|
The jackson3 based benchmarks behave similarly to the jackson2 ones. |
|
@pjfanning Does "benchDataInput" refer to pre-changes version from 3.x, and "benchDataInputLimited" / "benchDataInputNew" to modified version with and withou limitations? If so, differences do indeed look insignificant. |
@cowtowncoder your interpretation of the names is correct |
|
@pjfanning I confirmed your findings wrt baseline using https://github.com/FasterXML/jackson-benchmarks/ and so I'm confident there is no performance degradation on JDK 17. The only odd part that I now recall -- unrelated to this change -- is that JDK 8 runs DataInput test almost 3 (!) times faster (throughput of 320k vs 110k). That does not really matter here, just thought it odd how I'll now review this PR and hopefully get it merged. |
|
@pjfanning Thank you again! It's good that performance for the use case of no constraints use is unchanged -- and likely even use with constraints is not drastically worse. |
One possible way to support max document len on DataInput use case.
The count and validation overhead only kicks in if you set a non-negative max document len - which is not the default case.