Skip to content

NewDataModule performance when loading large amounts of data values #953

@emh-jump

Description

@emh-jump

Hello!

We use YTT for some templating and given the size of our templates and the number of times we generate with different inputs, YTT rendering performance is something we pay attention to. I did some quick profiling, and I believe I’ve identified some big performance wins.

I noticed the CPU use for NewDataModule was unusually high (close to 90% of all samples in an internal benchmark), mostly spent converting the yamlmeta.Document argument into a starlark.Value. Since NewDataModule is called for each file evaluation, this adds up quickly when many files are evaluated for a template even though the document doesn't change.

I believe it is safe to do this conversion once within the TemplateLoader and then reuse the result for each new data module. A quick ~10 line proof of concept of this change passes repo tests and our internal tests.

With GOGC=off I see the following stats on my dev machine when running an internal benchmark:

  • Mainline develop:
    5.26user 0.11system 0:05.49elapsed 97%CPU (0avgtext+0avgdata 353880maxresident)k
  • Convert in TemplateLoader:
    0.55user 0.03system 0:00.60elapsed 97%CPU (0avgtext+0avgdata 247556maxresident)k

I'll plan to open a PR with the proposed change shortly, but wanted to check for any concerns

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions