Skip to content

Conversation

@XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan commented Dec 3, 2023

This PR interns field names of user-defined dataclasses. This interning operation only occurs on the type creation. This operation is a one-time operation and the overhead is relatively small. We already do similar improvements for namedtuple.

typename = _sys.intern(str(typename))

field_names = tuple(map(_sys.intern, field_names))

Resolves #112653

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is

to improve performance

can you please share some numbers about it?

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Dec 3, 2023

can you please share some numbers about it?

Here is an example use case:

import dataclasses


@dataclasses.dataclass
class Foo:
    a: int = dataclasses.field(default=0)
    b: int = dataclasses.field(default=0)
    c: int = dataclasses.field(default=0)
    d: int = dataclasses.field(default=0)
    e: int = dataclasses.field(default=0)
    f: int = dataclasses.field(default=0)
    g: int = dataclasses.field(default=0)
    h: int = dataclasses.field(default=0)
    i: int = dataclasses.field(default=0)
    j: int = dataclasses.field(default=0)
    k: int = dataclasses.field(default=0)
    l: int = dataclasses.field(default=0)
    m: int = dataclasses.field(default=0)
    n: int = dataclasses.field(default=0)
    o: int = dataclasses.field(default=0)
    p: int = dataclasses.field(default=0)
    q: int = dataclasses.field(default=0)
    r: int = dataclasses.field(default=0)
    s: int = dataclasses.field(default=0)
    t: int = dataclasses.field(default=0)
    u: int = dataclasses.field(default=0)
    v: int = dataclasses.field(default=0)
    w: int = dataclasses.field(default=0)
    x: int = dataclasses.field(default=0)
    y: int = dataclasses.field(default=0)
    z: int = dataclasses.field(default=0)

Benchmark results:

>>> %timeit Foo()
Before:
1.5 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
After:
1.36 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> foo = Foo()
>>> %timeit dataclasses.asdict(foo)
Before:
9.67 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
9.08 µs ± 60.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The patched version constantly runs faster and has a smaller variance in running time.

@XuehaiPan XuehaiPan requested a review from sobolevn December 3, 2023 14:38
@sobolevn
Copy link
Member

sobolevn commented Dec 3, 2023

What about dataclasses with reasonable amount of fields? Like 0, 1, 5? Can you also test these use-cases?

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Dec 3, 2023

I do some benchmarks. Interning the field names always improves the performance of dataclasses.asdict(). The performance gain is vary from 3% (1 field) to 5% (64 fields).

However, interning the default factory names may have a negative impact during the instantiation. I removed this in the last commit.

Script
import textwrap
import timeit

NUMBER = 1_000_000
REPEAT = 5

for num_fields in [1, 2, 3, 4, 5, 8, 16, 32, 64]:
    SETUP = textwrap.dedent(
        f"""
        import dataclasses

        Foo = dataclasses.make_dataclass(
            'Foo',
            fields=[
                (f'field_xxx_{{i}}', int, dataclasses.field(default=0))
                for i in range({num_fields})
            ],
        )

        foo = Foo()
        """
    ).strip()

    ctor_time = (
        min(
            timeit.repeat(
                'Foo()',
                setup=SETUP,
                number=NUMBER,
                repeat=REPEAT,
            )
        )
        / NUMBER
    )
    asdict_time = (
        min(
            timeit.repeat(
                'dataclasses.asdict(foo)',
                setup=SETUP,
                number=NUMBER,
                repeat=REPEAT,
            )
        )
        / NUMBER
    )

    print(
        f'num_fields: {num_fields:<2d}    '
        f'ctor: {ctor_time * 1e6:5.3f}us    '
        f'asdict: {asdict_time * 1e6:5.3f}us'
    )

Results (I run these on another device, macOS with M2 Pro):

# Baseline
num_fields: 1     ctor: 0.120us    asdict: 0.646us
num_fields: 2     ctor: 0.144us    asdict: 0.795us
num_fields: 3     ctor: 0.175us    asdict: 0.924us
num_fields: 4     ctor: 0.188us    asdict: 1.071us
num_fields: 5     ctor: 0.207us    asdict: 1.210us
num_fields: 8     ctor: 0.252us    asdict: 1.603us
num_fields: 16    ctor: 0.444us    asdict: 2.768us
num_fields: 32    ctor: 1.067us    asdict: 5.092us
num_fields: 64    ctor: 2.139us    asdict: 9.756us

# Intern field names
num_fields: 1     ctor: 0.121us    asdict: 0.627us
num_fields: 2     ctor: 0.144us    asdict: 0.769us
num_fields: 3     ctor: 0.173us    asdict: 0.896us
num_fields: 4     ctor: 0.189us    asdict: 1.020us
num_fields: 5     ctor: 0.206us    asdict: 1.141us
num_fields: 8     ctor: 0.254us    asdict: 1.540us
num_fields: 16    ctor: 0.417us    asdict: 2.620us
num_fields: 32    ctor: 0.936us    asdict: 4.850us
num_fields: 64    ctor: 2.015us    asdict: 9.249us

# Intern field names and default factory names
num_fields: 1     ctor: 0.120us    asdict: 0.630us
num_fields: 2     ctor: 0.144us    asdict: 0.771us
num_fields: 3     ctor: 0.171us    asdict: 0.891us
num_fields: 4     ctor: 0.188us    asdict: 1.023us
num_fields: 5     ctor: 0.242us    asdict: 1.211us
num_fields: 8     ctor: 0.312us    asdict: 1.587us
num_fields: 16    ctor: 0.450us    asdict: 2.703us
num_fields: 32    ctor: 1.024us    asdict: 5.060us
num_fields: 64    ctor: 2.020us    asdict: 9.463us

@ericvsmith
Copy link
Member

I don't think this performance gain is worth it, but I'm open to other opinions.

@XuehaiPan XuehaiPan closed this Mar 27, 2025
@XuehaiPan XuehaiPan deleted the intern-dataclass-field-names branch March 29, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intern dataclass field names to improve performance

3 participants