-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Discussed in duckdb/duckdb#16835
Originally posted by ndchandar March 26, 2025
Hello,
I am observing an issue where duckdb-rs is leaking memory when parquet metadata caching is enabled. Our parquet files use hive paritioning scheme and we produce thousands of small parquet files on any given day. Each file is about ~8 MB to ~10 MB with 100K row group size and there are about ~1M records in each parquet file. We typically query over a three day period (about ~75GB compressed data). They parquet files themselves are generated through DuckDB COPY command. I was wondering if this is a known issue. We don't seem to see this issue when parquet metadata cache is disabled.
I have attached heaptrack summary where it thinks there is about ~6.6 GB of leaked memory. We are using DuckDB 1.2.1. and the service is running in Kubernetes. I am currently limiting DuckDB max memory to 8GB and have enabled spill to disk (by setting temp directory). Appreciate any help/pointers on this
heaptrack.20250325_043042_modified.txt