- 
                Notifications
    
You must be signed in to change notification settings  - Fork 207
 
Description
Table of Contents
- 
JSONDecoder/Encoder Performance Problem
 - 
JSONDecoder Performance Flaws
 - 
Proposed Optimizations
 - 
Optimizations Results
 - 
Apple Benchmark Overview
 - 
Apple Benchmark Flaws
 - 
My Benchmark
 
JSONDecoder/Encoder Performance Problem
Introduction
swift_conformsToProtocolMaybeInstantiateSuperclasses method is slow, because it traverses all protocol-conformance-descriptors in whole app when gets called first time for pair (class/enum/struct, protocol).
EmergeTools have great article about poor performance of swift_conformsToProtocolMaybeInstantiateSuperclasses.
Briefly, the more protocol-conformance your app has, the slower is swift_conformsToProtocolMaybeInstantiateSuperclasses. Our app has more than 150k of protocol conformances. It can be easily measured using this bash one-liner.
otool -l path/to/your/binary | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }'
We take size of __swift5_proto section and divide it by 4 (4-byte integer offsets are stored here).
When swift_conformsToProtocol is called
In short, there are 3 ways to trigger this method:
T.self is SomeProtocol.Typeas?/as!/as (in switch statement) SomeProtocol- Generic-classes with type-generic-constraints
swift_conformsToProtocolis triggered because class metadata contains GenericParameterVector. And GenericParameterVector has to contain protocol-witness-tables for each protocol that generic parameter conforms.
 
JSONDecoder Performance Flaws
unwrap function
The first place in JSONDecoder where swift_conformsToProtocolMaybeInstantiateSuperclasses is used is unwrap function
func unwrap<T: Decodable>(_ mapValue: JSONMap.Value, as type: T.Type, for codingPathNode: _CodingPathNode, _ additionalKey: (some CodingKey)? = nil) throws -> T {
    ...
    if T.self is _JSONStringDictionaryDecodableMarker.Type {
        return try self.unwrapDictionary(from: mapValue, as: type, for: codingPathNode, additionalKey)
    }
    ...
}KeyedDecodingContainer
KeyedDecodingContainer has type-generic-constraint: K: CodingKey. It is the second place where swift_conformsToProtocol gets called.
JSONDecoder swift_conformsToProtocol Performance Impact
swift_conformsToProtocol consumes at least 84% of all JSONDecoder.decode time in our app startup scenario.
JSONEncoder Performance Flaws
wrapGeneric function
The first place in JSONEncoder where swift_conformsToProtocolMaybeInstantiateSuperclasses is used is wrapGeneric function
func wrapGeneric<T: Encodable>(_ value: T, for additionalKey: (some CodingKey)? = _CodingKey?.none) throws -> JSONEncoderValue? {
    ...
    else if let encodable = value as? _JSONStringDictionaryEncodableMarker {
        return try self.wrap(encodable as! [String:Encodable], for: additionalKey)
    } else if let array = value as? _JSONDirectArrayEncodable {
        ...
    }
    ...
}KeyedEncodingContainer
KeyedEncodingContainer has type-generic-constraint: K: CodingKey. It is the second place where swift_conformsToProtocol gets called.
JSONEncoder swift_conformsToProtocol Performance Impact
swift_conformsToProtocol consumes at least 84% of all JSONEncoder.encode time in out app startup scenario.
Proposed Optimizations
Firstly ABI/API break-free optimizations will be covered:
№1 JSONDecoder unwrap optimization
_JSONStringDictionaryDecodableMarker is used to make String-keyed Dictionaries exempt from key conversion. So if there is no key-conversion we can skip this slow check:
switch options.keyDecodingStrategy {
  case .useDefaultKeys:    
    break
  case .convertFromSnakeCase, .custom:    
    if T.self is _JSONStringDictionaryDecodableMarker.Type {        
       return try unwrapDictionary(...)    
    }
}
return try self.with(value: mapValue, path: codingPathNode.appending(additionalKey)) {
    try type.init(from: self)
}instead of
if T.self is _JSONStringDictionaryDecodableMarker.Type {
    return try self.unwrapDictionary(from: mapValue, as: type, for: codingPathNode, additionalKey)
}
return try self.with(value: mapValue, path: codingPathNode.appending(additionalKey)) {
    try type.init(from: self)
}So this optimization is suitable only for .useDefaultKeys strategy.
№2 JSONEncoder wrapGeneric optimization
There are two ways to attempt optimization of this function.
- If we believe that 
as? _JSONDirectArrayEncodabledeals more benefit than harm to performance (at least in our app and in this benchmark it does more harm), then we will optimize only_JSONStringDictionaryEncodableMarkercheck the same way we did it forJSONDecoderand_JSONStringDictionaryDecodableMarker - If not it's better to remove 
as? _JSONDirectArrayEncodablecheck at all 
Here is _JSONStringDictionaryEncodableMarker check optimization:
switch options.keyEncodingStrategy {
  case .useDefaultKeys:    
    break
  case .convertToSnakeCase, .custom:    
    if let encodable = value as? _JSONStringDictionaryEncodableMarker {        
      return try wrap(encodable as! [String: Encodable], for: additionalKey) 
    }
}So this optimization is suitable only for .useDefaultKeys strategy.
Optimization №1 and №2 are implemented in FastCoders library.
№3 Possibly ABI/API breaking optimizations
So here we will try to solve performance issue with KeyedDecodingContainer and KeyedEncodingContainer type-generic-constraints.
The problem is not about calling KeyedDecodingContainer or KeyedEncodingContainer init, it is about referencing type with specified generic-type:
For example, take this code:
import Foundation
struct A: Codable {
    let a: Int
}Its init(from: Decoder) throws method SIL has line like
%5 = alloc_stack [lexical] [var_decl] $KeyedDecodingContainer<A.CodingKeys>, scope 22 
And its IR is:
  %4 = call ptr @__swift_instantiateConcreteTypeFromMangledName(ptr @"demangling cache variable for type metadata for Swift.KeyedDecodingContainer<output.A.(CodingKeys in _60494E8B9C642A7C4A26F3A3B6CECEB9)>") #2, !dbg !194
Internally __swift_instantiateConcreteTypeFromMangledName triggers swift_conformsToProtocol in this scenario.
So we mention type KeyedDecodingContainer with specific type A.CodingKeys.
func encode(to: Encoder) throws has the same flaw.
There are two possible ways to tackle them:
- Change 
KeyedDecodingContainerandKeyedEncodingContainertype signature to avoid type generic constraints (wasn't implemented in this repository) - Use the same 
CodingKeyinCodable/Decodable/Encodableconformance auto-generated code. For example,String. 
№3.1 Changing type signature
So the trick is to get rid of K: CodingKey type-generic-constraint in type-declaration and move it to extension. So there will be no need for GenericParameterVector to contain protocol-witness-table and there will be no swift_conformsToProtocol call when generic-type is mentioned or instantiated.
Before:
public struct KeyedDecodingContainer<K: CodingKey> :
  KeyedDecodingContainerProtocol
{
  public typealias Key = K
  /// The container for the concrete decoder.
  internal var _box: _KeyedDecodingContainerBase
  /// Creates a new instance with the given container.
  ///
  /// - parameter container: The container to hold.
  public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
  ) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
  }
  /// The path of coding keys taken to get to this point in decoding.
  public var codingPath: [any CodingKey] {
    return _box.codingPath
  }
  // continue to conform to KeyedDecodingContainerProtocol protocol
  ...
}After:
public struct KeyedDecodingContainer<K>
{
  /// The container for the concrete decoder.
  internal var _box: _KeyedDecodingContainerBase
  /// Creates a new instance with the given container.
  ///
  /// - parameter container: The container to hold.
  public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
  ) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
  }
}
extension KeyedDecodingContainer: KeyedDecodingContainerProtocol where K: CodingKey {
  public typealias Key = K
  /// The path of coding keys taken to get to this point in decoding.
  public var codingPath: [any CodingKey] {
    return _box.codingPath
  }
  // continue to conform to KeyedDecodingContainerProtocol protocol
  ...
}Same trick can be applied to KeyedEncodingContainer.
Note: despite _KeyedDecodingContainerBox has type-generic-constraint it seems like we can avoid rewriting code to avoid it because of the way it gets called:
public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
}In this scenario, in IR-code there is reference to protocol-witness-table of Container implementing KeyedDecodingContainerProtocol:
define protected swiftcc ptr @"output.KeyedDecodingContainerV2.init<A where A == A1.Key, A1: Swift.KeyedDecodingContainerProtocol>(A1) -> output.KeyedDecodingContainerV2<A>"(ptr noalias %0, ptr %K, ptr %Container, ptr %Container.KeyedDecodingContainerProtocol) #0 !dbg !84
and there is no __swift_instantiateConcreteTypeFromMangledName call.
№3.2 Use String as CodingKey
Why this would be faster:
swift_conformsToProtocolworks slowly only when it gets called for the first time for each (class/enum/struct, protocol) pair.- So if we will use 
StringasCodingKey,swift_conformsToProtocolwill be called with the same types:StringandCodingKey - And only first call will be slow. All subsequent calls are going to be much-much faster, because 
ConcurrentReadableHashMapis used for caching inswift_conformsToProtocol. 
How String can conform CodingKey
extension String: CodingKey {    
  public init?(stringValue: String) { 
    self = stringValue 
  }    
  public init?(intValue: Int) { nil }    
  public var intValue: Int? { nil }    
  public var stringValue: String { 
    self 
  }
}How this can be implemented
We can introduce experimental flag. When flag is enabled, we don't auto-generate enum CodingKeys for our struct/enum and use raw String as CodingKeys in init(from: Decoder) throws and encode(to: Encoder) throws.
Additional advantages
Each auto-generated enum CodingKeys adds 5 protocol-conformance-descriptors. godbolt:
CodingKeyHashableEquatableCustomDebugStringConvertibleCustomStringConvertible
Also, it each CodingKey adds around 1.8 kb to app size (measured on the same 10k Codable structures):
- codable-benchmark-package-no-coding-keys - where 
Stringis used asCodingKeybut there areCodingKeysto match__swift5_protosection size- 49 mb
 
 - codable-benchmark-package-no-coding-keys-measure-size  - where 
Stringis used asCodingKeyand there are noCodingKeys- 31.1 mb
 
 - So each 
CodingKeyadds around 1.8 kb to application binary size. 
So if shared CodingKey is implemented we could:
- Optimize application size
 - Optimize overall application performance due to boosting 
swift_conformsToProtocolmethod by__swift5_protosection size reduction.- codable-benchmark-package-no-coding-keys has 70321 protocol conformance descriptos
 - codable-benchmark-package-no-coding-keys-measure-size has only 20321 protocol conformance descriptos
 
 
Optimizations results
Measurements in our app
In our app we applied only JSONDecoder.unwrap and JSONEncoder.wrapGeneric optimizations without using String as CodingKeys.
We've measured all JSONDecoder.decode and JSONEncoder.encode durations and added them together.
We have 80k measurements from different devices. ~40k with optimized JSONDecoder and JSONEncoder and ~40k with standard JSONDecoder and JSONEncoder with duration logging.
| quantile | 0.1 | 0.25 | 0.5 | 0.75 | 0.9 | 
|---|---|---|---|---|---|
| standard JSONDecoder | 198 ms | 282 ms | 422 ms | 667 ms | 1017 ms | 
| optimized JSONDecoder | 100 ms | 133 ms | 200 ms | 322 ms | 528 ms | 
| Difference | ↑49.5% | ↑52.8% | ↑52.6% | ↑51.7% | ↑48.1% | 
And for JSONEncoder:
| quantile | 0.1 | 0.25 | 0.5 | 0.75 | 0.9 | 
|---|---|---|---|---|---|
| standard JSONEncoder | 59 ms | 94 ms | 159 ms | 289 ms | 547 ms | 
| optimized JSONEncoder | 14 ms | 30 ms | 73 ms | 135 ms | 220 ms | 
| Difference | ↑76% | ↑68% | ↑54% | ↑53.2% | ↑59.8% | 
Briefly, new JSONDecoder became as twice as fast as standard JSONDecoder and JSONEncoder is at least twice as fast as standard JSONEncoder.
My benchmark measurements
I've implemented my own benchmark for JSONDecoder/Encoder: https://github.com/ChrisBenua/JSONDecoderEncoderBenchmarks?tab=readme-ov-file#proposed-optimizations
JSONDecoder
In this benchmark I've measured performance in 4 variations:
- standard 
JSONDecoder - standard 
JSONDecoder+StringasCodingKey - optimized 
JSONDecoder - optimized 
JSONDecoder+StringasCodingKey 
| quantile | 0.25 | 0.5 | 0.75 | 
|---|---|---|---|
| standard JSONDecoder | 5.81 s | 5.826 s | 5.86 s | 
| standard JSONDecoder + String as CodingKey | 3.24 s (↑44%) | 3.26 s (↑44%) | 3.29 s (↑43.9%) | 
| optimized JSONDecoder | 2.64 s (↑55%) | 2.65 s (↑55%) | 2.66 s (↑54.6%) | 
| optimized JSONDecoder + String as CodingKey | 0.113 s (↑98%) | 0.114 s (↑98%) | 0.116 s (↑98%) | 
JSONEncoder
In this benchmark I've measured performance in 4 variations:
- standard 
JSONEncoder - standard 
JSONEncoder+StringasCodingKey - optimized 
JSONEncoder - optimized 
JSONEncoder+StringasCodingKey 
| quantile | 0.25 | 0.5 | 0.75 | 
|---|---|---|---|
| standard JSONEncoder | 8.06 s | 8.08 s | 8.12 s | 
| standard JSONEncoder + String as CodingKey | 5.49 s (↑32%) | 5.52 s (↑32%) | 5.55 s (↑32%) | 
| optimized JSONEncoder | 2.67 s (↑67%) | 2.68 s (↑67%) | 2.69 s (↑67%) | 
| optimized JSONEncoder + String as CodingKey | 0.148 s (↑98.1%) | 0.149 s (↑98.2%) | 0.151 s (↑98.1%) | 
My benchmark illustrates how big Swift Runtime slows down JSONDecoder and JSONEncoder.
Apple Benchmark
Swift-foundation repository has some JSONDecoder/Encoder benchmarking logic: JSONBenchmark.swift.
Apple Benchmark Flaws
- It decodes/encode the same models for 1 bln times without relaunching app
- This way all 
swift_conformsToProtocoloverhead is disguised, becauseswift_conformsToProtocolis slow only on first iteration. - Small binary size and small 
__swift5_protosection 
 - This way all 
 
My benchmark
Structure
- Library 
FastCoderscontains optimized realizations ofJSONDecoder/JSONEncoder RegularModelscontains 10k Codable models with standard Codable implementation. These 10k Codable models can be semantically splitted to 2.5k groups of 4.StringCodingKeyModelscontains same 10k Codable models with manually implementedCodablewithStringasCodingKeycodable-benchmark-package- target where 2.5k decodings and encodings ofRegularModelsduration is measuredcodable-benchmark-package-no-coding-keys- target where 2.5k decodings and encodings ofStringCodingKeyModelsduration is measured.codable-benchmark-packageandcodable-benchmark-package-no-coding-keysuseA1_Hierarchy.jsonfile for decoding. Its size is only 319 bytes.
Notes:
- To match size of 
__swift5_protoincodable-benchmark-package-no-coding-keysmatch size of__swift5_protoincodable-benchmark-packageI've generated CodingKeys enum in each class but it is not used inencode(to: Encoder)ordecode(from: Decoder). 
Building
Use ./build.sh for building and stripping codable-benchmark-package and codable-benchmark-package-no-coding-key.
Checking __swift5_proto size
To get amount of protocol-conformance-descriptors in binary use this script:
otool -l .build/arm64-apple-macosx/release/codable-benchmark-package | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }'outputs 70320.otool -l .build/arm64-apple-macosx/release/codable-benchmark-package-no-coding-keys | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }'outputs 70321.- So in case of 
swift_conformsToProtocolperformance both binaries are pretty similar. 
Running
codable-benchmark-packageandcodable-benchmark-package-no-coding-keyhas 4 modes:decode- measures decoding using standardJSONDecoderdecode_new- measure decoding using optimizedJSONDecoderencode- measure encoding using standardJSONEncoderencode_new- measure encoding using standardJSONEncoder
I've used run_bench.py script to run binary for each mode. It measures each binary and each mode 100 times. It takes a while to run. You can easiliy adjust amount of repetitions in run_bench.py.

