-
Notifications
You must be signed in to change notification settings - Fork 207
Improve performance of JSONDecoder and JSONEncoder for large apps #1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of JSONDecoder and JSONEncoder for large apps #1481
Conversation
Benchmarking resultsCollected using this command Comparing results between 'old_coders' and 'Current_run'JSONBenchmarksCanada-decodeFromJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Canada-encodeToJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Twitter-decodeFromJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Twitter-encodeToJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
|
|
Updated benchmarks after last commit: JSONBenchmarksCanada-decodeFromJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Canada-encodeToJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Twitter-decodeFromJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
Twitter-encodeToJSON metricsTime (total CPU): results within specified thresholds, fold down for details.
Throughput (# / s): results within specified thresholds, fold down for details.
|
|
@jmschonfeld can you please run CI tests? I don't have permission for that. Thanks in advance! |
|
@swift-ci please test |
|
@jmschonfeld all CI checks were successful. Do you find the solution satisfactory? If so, could you please advise on the next steps for getting the PR merged? |
|
@jmschonfeld I've removed unnecessary diff and made identical Can we run CI checks again? Does this approach meet your expectations? If so, I'd appreciate your guidance on how we can proceed with merging the PR. |
|
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change seems ok to me on a read through, but I'll let @kperryua do a more in-depth review since he's had more experience investigating performance in this area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory this looks fine, but I'd concerned about the not-spectacular benchmark results. You mentioned your benchmark performs around 35% better. Do you have any explanation for the discrepancy?
| return .number(decimal.description) | ||
| } else if let encodable = value as? _JSONStringDictionaryEncodableMarker { | ||
| } else if !options.keyEncodingStrategy.isDefault, let encodable = value as? _JSONStringDictionaryEncodableMarker { | ||
| return try self.wrap(encodable as! [String:Encodable], for: additionalKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to use _specializingCast here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no, we can't use it here. _specializingCast internally performs type equality check, and here we use as? to check protocol conformance. so _specializingCast is inapplicable here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inline(__always)
internal func _specializingCast<Input, Output>(_ value: Input, to type: Output.Type) -> Output? {
guard Input.self == Output.self else { return nil } // this check will always fail when we check for protocol conformance
return _identityCast(value, to: type)
}
@kperryua so the reason why we can't observe boost I've mentioned in my thread on forums.swift.org when running benchmarks is quite simple: when running benchmarks - we run the same decoding/encoding 1.000.000 times without relaunching. So the catch is that That's why I've created my own benchmark to illustrate how massive overhead can cause Here you can see benchmark results, I'll paste results here also: JSONDecoderIn this benchmark I've measured performance in 4 variations:
JSONEncoderIn this benchmark I've measured performance in 4 variations:
So you can see how devastating is overhead from And about 35% boost. I was comparing the latest version of this PR with previous one where I've been using Also, I've merged the same optimisation to ZippyJSON and to ReerJSON. |
|
@ChrisBenua I think this is another interesting tradeoff point, similar to the direct-array optimization (though to a lesser degree). The question becomes whether the It seems clear that in the average long-running case, the benchmarks suggest that the In your opinion, does this change achieve the optimal balance between these two use cases? |
@kperryua Thanks for sharing your thoughts on this subject! You're absolutely right! There is minimal regression in performance while running benchmarks, and, for sure, it can affect some users of But if we optimise first decoding/encoding we can achieve better performance in mobile apps startup scenarios! For example, my team and I have measured performance of MeasurementsWe have 80k measurements from different devices. ~40k with optimized
And for
And now custom But we have very large app with over 150k protocol conformance descriptors, so for small apps there will be slightly less boost. Most application loading scenarios follow a similar pattern:
By implementing the proposed changes, we can significantly reduce the time required to parse results during the initial load. In my view, this optimization has the potential to enhance loading performance across a wide range of applications, offering substantial benefits with minimal drawbacks. I presented this solution at a recent local offline conference, and I am pleased to report that at least two companies have already adopted the optimized version of One relatively small application achieved a 25% improvement using the modified As a performance engineer, I am genuinely excited by the extent to which these relatively minor changes can yield significant optimizations across numerous applications. I would greatly appreciate your support and alignment with this perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thank you for the useful discussion. Your comments about application loading time are especially compelling.
In short:
T.self is _JSONStringDictionaryDecodableMarker.Type,value as? _JSONStringDictionaryEncodableMarkerandvalue as? _JSONDirectArrayEncodableare really slow operations. And the bigger binary gets the slower this check works for the first time for each pair ofclass/struct/enumand protocol.But checking whether current type conforms to
_JSONStringDictionaryDecodableMarkerand_JSONStringDictionaryEncodableMarkeris needed only when we use custom keyDecoding/EncodingStrategy. Comments in code confirm this:So we can easily skip this checks when default keyDecoding/EncodingStrategy is used.
For details: see this issue: #1480