Skip to content

Memory leak when parquet metadata cache is enabled with duckdb-rs 1.2.1Β #476

@szarnyasg

Description

@szarnyasg

Discussed in duckdb/duckdb#16835

Originally posted by ndchandar March 26, 2025
Hello,
I am observing an issue where duckdb-rs is leaking memory when parquet metadata caching is enabled. Our parquet files use hive paritioning scheme and we produce thousands of small parquet files on any given day. Each file is about ~8 MB to ~10 MB with 100K row group size and there are about ~1M records in each parquet file. We typically query over a three day period (about ~75GB compressed data). They parquet files themselves are generated through DuckDB COPY command. I was wondering if this is a known issue. We don't seem to see this issue when parquet metadata cache is disabled.

I have attached heaptrack summary where it thinks there is about ~6.6 GB of leaked memory. We are using DuckDB 1.2.1. and the service is running in Kubernetes. I am currently limiting DuckDB max memory to 8GB and have enabled spill to disk (by setting temp directory). Appreciate any help/pointers on this

heaptrack.20250325_043042_modified.txt

FlameGraph Memory Consumption Summary

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions