Skip to content

Serialization performance tuning how-to? #322

@brent-statsig

Description

@brent-statsig

I'm working through writing a service that downloads n files concurrently - then writes them to a single avro file.

  • Is there a recommended way to parallel serialize data? I see Writer calls maybe_write_header in all public append APIs, along with into_inner, which makes it hard to just get the raw bytes of serialized rows without the header attached. It doesn't look like the raw Serializer impl in ser.rs is public either. What is the recommended way to split serialization work across cores?
  • Schema validation per-value appended is expensive - it would be really nice to have compile flags around it so it can be stripped out for production, or have a sampling rate attached to it to retain some runtime safety?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions