Improve performance of JSONDecoder and JSONEncoder for large apps #1481

ChrisBenua · 2025-08-27T14:25:19Z

In short: T.self is _JSONStringDictionaryDecodableMarker.Type, value as? _JSONStringDictionaryEncodableMarker and value as? _JSONDirectArrayEncodable are really slow operations. And the bigger binary gets the slower this check works for the first time for each pair of class/struct/enum and protocol.

But checking whether current type conforms to _JSONStringDictionaryDecodableMarker and _JSONStringDictionaryEncodableMarker is needed only when we use custom keyDecoding/EncodingStrategy. Comments in code confirm this:

/// A marker protocol used to determine whether a value is a `String`-keyed `Dictionary`
/// containing `Decodable` values (in which case it should be exempt from key conversion strategies).
///
/// The marker protocol also provides access to the type of the `Decodable` values,
/// which is needed for the implementation of the key conversion strategy exemption.
private protocol _JSONStringDictionaryDecodableMarker {
    static var elementType: Decodable.Type { get }
}

/// A marker protocol used to determine whether a value is a `String`-keyed `Dictionary`
/// containing `Encodable` values (in which case it should be exempt from key conversion strategies).
private protocol _JSONStringDictionaryEncodableMarker { }

So we can easily skip this checks when default keyDecoding/EncodingStrategy is used.

For details: see this issue: #1480

ChrisBenua · 2025-08-27T14:27:32Z

Benchmarking results

Collected using this command swift package --allow-writing-to-package-directory benchmark baseline compare new_coders --target JSONBenchmarks --format markdown

Comparing results between 'old_coders' and 'Current_run'

Host 'Christians-MacBook-Pro.local' with 8 'arm64' processors with 16 GB memory, running:
Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:34 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8103

JSONBenchmarks

Canada-decodeFromJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (μs) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	27	27	27	27	27	27	28	112
Current_run	27	27	28	28	28	28	28	109
Δ	0	0	1	1	1	1	0	-3
Improvement %	0	0	-4	-4	-4	-4	0	-3

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	38	38	37	37	37	37	36	112
Current_run	37	36	36	36	36	36	36	109
Δ	-1	-2	-1	-1	-1	-1	0	-3
Improvement %	-3	-5	-3	-3	-3	-3	0	-3

Canada-encodeToJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (μs) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	54	54	54	54	55	55	55	56
Current_run	54	54	54	54	55	55	55	56
Δ	0	0	0	0	0	0	0	0
Improvement %	0	0	0	0	0	0	0	0

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	19	18	18	18	18	18	18	56
Current_run	19	18	18	18	18	18	18	56
Δ	0	0	0	0	0	0	0	0
Improvement %	0	0	0	0	0	0	0	0

Twitter-decodeFromJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (ns) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	2150	2163	2169	2198	2224	2292	2448	1374
Current_run	2152	2165	2167	2175	2204	2247	2477	1379
Δ	2	2	-2	-23	-20	-45	29	5
Improvement %	0	0	0	1	1	2	-1	5

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	465	463	461	456	450	435	389	1374
Current_run	465	463	462	460	454	446	404	1379
Δ	0	0	1	4	4	11	15	5
Improvement %	0	0	0	1	1	3	4	5

Twitter-encodeToJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (ns) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	1078	1083	1085	1089	1094	1155	1314	2752
Current_run	1085	1091	1093	1095	1100	1135	1280	2738
Δ	7	8	8	6	6	-20	-34	-14
Improvement %	-1	-1	-1	-1	-1	2	3	-14

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	928	924	922	920	915	864	629	2752
Current_run	922	918	917	914	910	882	782	2738
Δ	-6	-6	-5	-6	-5	18	153	-14
Improvement %	-1	-1	-1	-1	-1	2	24	-14

ChrisBenua · 2025-09-15T20:36:54Z

Updated benchmarks after last commit:

Host 'Christians-MacBook-Pro.local' with 8 'arm64' processors with 16 GB memory, running:
Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:34 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8103

JSONBenchmarks

Canada-decodeFromJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (μs) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	27	27	27	27	27	27	28	112
Current_run	26	26	26	26	27	27	27	114
Δ	-1	-1	-1	-1	0	0	-1	2
Improvement %	4	4	4	4	0	0	4	2

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	38	38	37	37	37	37	36	112
Current_run	38	38	38	38	37	37	37	114
Δ	0	0	1	1	0	0	1	2
Improvement %	0	0	3	3	0	0	3	2

Canada-encodeToJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (μs) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	54	54	54	54	55	55	55	56
Current_run	54	54	54	54	54	56	56	56
Δ	0	0	0	0	-1	1	1	0
Improvement %	0	0	0	0	2	-2	-2	0

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	19	18	18	18	18	18	18	56
Current_run	19	18	18	18	18	18	18	56
Δ	0	0	0	0	0	0	0	0
Improvement %	0	0	0	0	0	0	0	0

Twitter-decodeFromJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (ns) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	2150	2163	2169	2198	2224	2292	2448	1374
Current_run	2168	2175	2179	2181	2191	2255	2451	1374
Δ	18	12	10	-17	-33	-37	3	0
Improvement %	-1	-1	0	1	1	2	0	0

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	465	463	461	456	450	435	389	1374
Current_run	462	460	459	459	457	444	408	1374
Δ	-3	-3	-2	3	7	9	19	0
Improvement %	-1	-1	0	1	2	2	5	0

Twitter-encodeToJSON metrics

Time (total CPU): results within specified thresholds, fold down for details.

Time (total CPU) (ns) *	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	1078	1083	1085	1089	1094	1155	1314	2752
Current_run	1090	1096	1099	1102	1106	1134	1265	2723
Δ	12	13	14	13	12	-21	-49	-29
Improvement %	-1	-1	-1	-1	-1	2	4	-29

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K)	p0	p25	p50	p75	p90	p99	p100	Samples
old_coders	928	924	922	920	915	864	629	2752
Current_run	918	913	911	908	906	883	791	2723
Δ	-10	-11	-11	-12	-9	19	162	-29
Improvement %	-1	-1	-1	-1	-1	2	26	-29

Sources/FoundationEssentials/JSON/JSONEncoder.swift

ChrisBenua · 2025-09-16T19:20:25Z

@jmschonfeld can you please run CI tests? I don't have permission for that.

Thanks in advance!

jmschonfeld · 2025-09-16T19:37:35Z

@swift-ci please test

ChrisBenua · 2025-09-17T22:51:44Z

@jmschonfeld all CI checks were successful.

Do you find the solution satisfactory? If so, could you please advise on the next steps for getting the PR merged?

ChrisBenua · 2025-09-23T21:02:20Z

@jmschonfeld I've removed unnecessary diff and made identical isDefault computed-property in both files.

Can we run CI checks again?

Does this approach meet your expectations? If so, I'd appreciate your guidance on how we can proceed with merging the PR.

jmschonfeld · 2025-09-26T18:26:59Z

@swift-ci please test

jmschonfeld

Change seems ok to me on a read through, but I'll let @kperryua do a more in-depth review since he's had more experience investigating performance in this area

kperryua

In theory this looks fine, but I'd concerned about the not-spectacular benchmark results. You mentioned your benchmark performs around 35% better. Do you have any explanation for the discrepancy?

kperryua · 2025-09-26T18:42:17Z

Sources/FoundationEssentials/JSON/JSONEncoder.swift

            return .number(decimal.description)
-        } else if let encodable = value as? _JSONStringDictionaryEncodableMarker {
+        } else if !options.keyEncodingStrategy.isDefault, let encodable = value as? _JSONStringDictionaryEncodableMarker {
            return try self.wrap(encodable as! [String:Encodable], for: additionalKey)


Are we able to use _specializingCast here as well?

Unfortunately, no, we can't use it here. _specializingCast internally performs type equality check, and here we use as? to check protocol conformance. so _specializingCast is inapplicable here

@inline(__always) internal func _specializingCast<Input, Output>(_ value: Input, to type: Output.Type) -> Output? { guard Input.self == Output.self else { return nil } // this check will always fail when we check for protocol conformance return _identityCast(value, to: type) }

ChrisBenua · 2025-09-29T15:56:53Z

In theory this looks fine, but I'd concerned about the not-spectacular benchmark results. You mentioned your benchmark performs around 35% better. Do you have any explanation for the discrepancy?

@kperryua so the reason why we can't observe boost I've mentioned in my thread on forums.swift.org when running benchmarks is quite simple: when running benchmarks - we run the same decoding/encoding 1.000.000 times without relaunching. So the catch is that swift_conformsToProtocol method works really slow only for the first time for each pair of arguments (class/enum/struct and protocol)! So overhead from swift_conformsToProtocol on first iteration is barely noticeable when averaging 1.000.000 iterations results.

That's why I've created my own benchmark to illustrate how massive overhead can cause swift_conformsToProtocol. I've described it in my thread on forums.swift.org, here it is. Shortly, I've generated 10k codable classes, they're united in 2500 groups of 4 classes. So benchmark is simple: I decode the same 320 bytes json 2500 times but to different classes (to A1, A5... A9997) exactly once to make sure swift_conformsToProtocol does not hit its internal in-memory-cache.

Here you can see benchmark results, I'll paste results here also:

JSONDecoder

In this benchmark I've measured performance in 4 variations:

standard JSONDecoder
standard JSONDecoder + String as CodingKey
optimized JSONDecoder
optimized JSONDecoder + String as CodingKey

quantile	0.25	0.5	0.75
standard JSONDecoder	5.81 s	5.826 s	5.86 s
standard JSONDecoder + String as CodingKey	3.24 s (↑44%)	3.26 s (↑44%)	3.29 s (↑43.9%)
optimized JSONDecoder	2.64 s (↑55%)	2.65 s (↑55%)	2.66 s (↑54.6%)
optimized JSONDecoder + String as CodingKey	0.113 s (↑98%)	0.114 s (↑98%)	0.116 s (↑98%)

JSONEncoder

In this benchmark I've measured performance in 4 variations:

standard JSONEncoder
standard JSONEncoder + String as CodingKey
optimized JSONEncoder
optimized JSONEncoder + String as CodingKey

quantile	0.25	0.5	0.75
standard JSONEncoder	8.06 s	8.08 s	8.12 s
standard JSONEncoder + String as CodingKey	5.49 s (↑32%)	5.52 s (↑32%)	5.55 s (↑32%)
optimized JSONEncoder	2.67 s (↑67%)	2.68 s (↑67%)	2.69 s (↑67%)
optimized JSONEncoder + String as CodingKey	0.148 s (↑98.1%)	0.149 s (↑98.2%)	0.151 s (↑98.1%)

So you can see how devastating is overhead from swift_conformsToProtocol when decoding or encoding for the first time.

And about 35% boost. I was comparing the latest version of this PR with previous one where I've been using as? instead of _specializingCast in _asDirectArrayEncodable function. And I was using String as CodingKeys to see remove any other Swift Runtime overhead. And _specializingCast version performed 35% better then as? version of _asDirectArrayEncodable function.

Also, I've merged the same optimisation to ZippyJSON and to ReerJSON.

kperryua · 2025-09-29T18:05:58Z

@ChrisBenua
Thank you for the summary.

I think this is another interesting tradeoff point, similar to the direct-array optimization (though to a lesser degree). The question becomes whether the swift_conformsToProtocol cache is worthwhile or not. Of course, if a client is using many different types, or only performing a small number of encode or decode operations per process lifetime, the cache is pure unwanted overhead. However, there will definitely be clients that run for long periods of time for which the cost of the cache is amortized away into irrelevance.

It seems clear that in the average long-running case, the benchmarks suggest that the swift_conformsToProtocol solution is preferred. However, I does seem like the regression for this case is minimal, and the potential benefit for clients that do run for short lifetimes could see a more significant benefit.

In your opinion, does this change achieve the optimal balance between these two use cases?

ChrisBenua · 2025-09-29T20:52:49Z

@ChrisBenua Thank you for the summary.

I think this is another interesting tradeoff point, similar to the direct-array optimization (though to a lesser degree). The question becomes whether the swift_conformsToProtocol cache is worthwhile or not. Of course, if a client is using many different types, or only performing a small number of encode or decode operations per process lifetime, the cache is pure unwanted overhead. However, there will definitely be clients that run for long periods of time for which the cost of the cache is amortized away into irrelevance.

It seems clear that in the average long-running case, the benchmarks suggest that the swift_conformsToProtocol solution is preferred. However, I does seem like the regression for this case is minimal, and the potential benefit for clients that do run for short lifetimes could see a more significant benefit.

In your opinion, does this change achieve the optimal balance between these two use cases?

@kperryua Thanks for sharing your thoughts on this subject!

You're absolutely right! There is minimal regression in performance while running benchmarks, and, for sure, it can affect some users of JSONDecoder/JSONEncoder which decode/encode the same types many-many times. But I'm sure that it will be barely noticeable difference in this case.

But if we optimise first decoding/encoding we can achieve better performance in mobile apps startup scenarios! For example, my team and I have measured performance of Foundation.JSONDecoder and our version which is pretty similar with this PR - and we've received massive improvements: total time spent on JSONDecoder.decode reduced for 50%, the same for JSONEncoder.encode. I'll post detailed measurements below:

Measurements

We have 80k measurements from different devices. ~40k with optimized JSONDecoder and JSONEncoder and ~40k with standard JSONDecoder and JSONEncoder with duration logging.

quantile	0.1	0.25	0.5	0.75	0.9
standard JSONDecoder	198 ms	282 ms	422 ms	667 ms	1017 ms
optimized JSONDecoder	100 ms	133 ms	200 ms	322 ms	528 ms
Difference	↑49.5%	↑52.8%	↑52.6%	↑51.7%	↑48.1%

And for JSONEncoder:

quantile	0.1	0.25	0.5	0.75	0.9
standard JSONEncoder	59 ms	94 ms	159 ms	289 ms	547 ms
optimized JSONEncoder	14 ms	30 ms	73 ms	135 ms	220 ms
Difference	↑76%	↑68%	↑54%	↑53.2%	↑59.8%

And now custom JSONDecoder/Encoder with changes from this PR is used by 95% users of our app.

But we have very large app with over 150k protocol conformance descriptors, so for small apps there will be slightly less boost.

Most application loading scenarios follow a similar pattern:

Executing URLSession requests
Parsing the results (typically involving distinct model sets for each screen)
Displaying the majority of the data on-screen, while caching the remainder for future use

By implementing the proposed changes, we can significantly reduce the time required to parse results during the initial load. In my view, this optimization has the potential to enhance loading performance across a wide range of applications, offering substantial benefits with minimal drawbacks.

I presented this solution at a recent local offline conference, and I am pleased to report that at least two companies have already adopted the optimized version of JSONDecoder/Encoder and deployed it to production. They subsequently contacted me to share comparable performance improvements.

One relatively small application achieved a 25% improvement using the modified JSONDecoder/Encoder from this pull request. Another application, similar in size to ours, reported a 40% boost.

As a performance engineer, I am genuinely excited by the extent to which these relatively minor changes can yield significant optimizations across numerous applications. I would greatly appreciate your support and alignment with this perspective.

kperryua

Ok. Thank you for the useful discussion. Your comments about application loading time are especially compelling.

Improve performance of JSONDecoder and JSONEncoder for large apps

1152aa4

ChrisBenua mentioned this pull request Aug 27, 2025

Boosting JSONDecoder/Encoder performance for large apps #1480

Closed

use switch instead of casting to protocol

38659e6

jmschonfeld reviewed Sep 15, 2025

View reviewed changes

Sources/FoundationEssentials/JSON/JSONEncoder.swift Outdated Show resolved Hide resolved

replace as with specializingCast

45a07ab

ChrisBenua requested a review from jmschonfeld September 16, 2025 19:33

ChrisBenua added 2 commits September 23, 2025 23:57

fix styling and remove extra diff

a0f84ef

make extensions fileprivate

b278932

jmschonfeld approved these changes Sep 26, 2025

View reviewed changes

jmschonfeld requested a review from kperryua September 26, 2025 18:27

kperryua reviewed Sep 26, 2025

View reviewed changes

ChrisBenua requested a review from kperryua September 30, 2025 11:18

kperryua approved these changes Sep 30, 2025

View reviewed changes

kperryua merged commit 97b4581 into swiftlang:main Sep 30, 2025
19 checks passed

Improve performance of JSONDecoder and JSONEncoder for large apps #1481

Improve performance of JSONDecoder and JSONEncoder for large apps #1481

Conversation

ChrisBenua commented Aug 27, 2025

Uh oh!

ChrisBenua commented Aug 27, 2025

Benchmarking results

Comparing results between 'old_coders' and 'Current_run'

JSONBenchmarks

Canada-decodeFromJSON metrics

Canada-encodeToJSON metrics

Twitter-decodeFromJSON metrics

Twitter-encodeToJSON metrics

Uh oh!

ChrisBenua commented Sep 15, 2025

JSONBenchmarks

Canada-decodeFromJSON metrics

Canada-encodeToJSON metrics

Twitter-decodeFromJSON metrics

Twitter-encodeToJSON metrics

Uh oh!

Uh oh!

ChrisBenua commented Sep 16, 2025

Uh oh!

jmschonfeld commented Sep 16, 2025

Uh oh!

ChrisBenua commented Sep 17, 2025

Uh oh!

ChrisBenua commented Sep 23, 2025

Uh oh!

jmschonfeld commented Sep 26, 2025

Uh oh!

jmschonfeld left a comment

Choose a reason for hiding this comment

Uh oh!

kperryua left a comment

Choose a reason for hiding this comment

Uh oh!

kperryua Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisBenua Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisBenua Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChrisBenua commented Sep 29, 2025

JSONDecoder

JSONEncoder

Uh oh!

kperryua commented Sep 29, 2025

Uh oh!

ChrisBenua commented Sep 29, 2025

Uh oh!

kperryua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChrisBenua Sep 30, 2025 •

edited

Loading