Skip to content

Large performance gap on Windows between Python installs (Store vs. official installer) when exporting Parquet via DuckDB #107

@hmilbi

Description

@hmilbi

What happens?

Summary
Exporting a ~90,000,000-row table to Parquet (~300 MB) shows a large performance discrepancy depending on which Windows Python build is used. The Microsoft Store Python is dramatically faster than the official Python.org installer. Similar slowdowns appear across other workloads and languages.

Observed behavior

Python (official installer, python-3.13.7-amd64.exe): ~130 s

Python (Microsoft Store): ~18 s

Other workloads are also faster with the Microsoft Store Python.

CLI, Rust, C++, and C# implementations match the slower Python performance.

Disabling Windows Defender real-time protection makes no difference.

Data size

Table: ~90,000,000 rows

Resulting Parquet file: ~300 MB

Question
What could explain the consistent performance advantage of the Microsoft Store Python over the official Python.org build on Windows for this workload? Any guidance on likely causes or what to check next is appreciated.

To Reproduce

Repro snippet

import duckdb
import pytz

dbfile = "C:\\temp\\test\\db.duckdb"
output_path = "C:\\temp\\test\\test.parquet";

with duckdb.connect(database=dbfile) as conn:
  c = conn.execute("select count(*) from large_table;").fetchall()

  print(f"  exporting {c[0][0]} rows")
  
  copy_start_time = conn.execute("select now();").fetchone()[0] # needs pytz

  conn.sql(f"""
    COPY (
      select
        *
      from large_table
      order by col1,col2,col3,col4
    )
    TO '{output_path}'
    WITH (FORMAT PARQUET);
  """)

  copy_end_time = conn.execute("select now();").fetchone()[0]
  copy_duration = copy_end_time - copy_start_time;

  print(f"  exported in {copy_duration}")

OS:

Windows 11 Pro 24H2

DuckDB Package Version:

1.4.0

Python Version:

3.13.7

Full Name:

H M

Affiliation:

None

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions