-
Notifications
You must be signed in to change notification settings - Fork 32
Description
What happens?
Summary
Exporting a ~90,000,000-row table to Parquet (~300 MB) shows a large performance discrepancy depending on which Windows Python build is used. The Microsoft Store Python is dramatically faster than the official Python.org installer. Similar slowdowns appear across other workloads and languages.
Observed behavior
Python (official installer, python-3.13.7-amd64.exe): ~130 s
Python (Microsoft Store): ~18 s
Other workloads are also faster with the Microsoft Store Python.
CLI, Rust, C++, and C# implementations match the slower Python performance.
Disabling Windows Defender real-time protection makes no difference.
Data size
Table: ~90,000,000 rows
Resulting Parquet file: ~300 MB
Question
What could explain the consistent performance advantage of the Microsoft Store Python over the official Python.org build on Windows for this workload? Any guidance on likely causes or what to check next is appreciated.
To Reproduce
Repro snippet
import duckdb
import pytz
dbfile = "C:\\temp\\test\\db.duckdb"
output_path = "C:\\temp\\test\\test.parquet";
with duckdb.connect(database=dbfile) as conn:
c = conn.execute("select count(*) from large_table;").fetchall()
print(f" exporting {c[0][0]} rows")
copy_start_time = conn.execute("select now();").fetchone()[0] # needs pytz
conn.sql(f"""
COPY (
select
*
from large_table
order by col1,col2,col3,col4
)
TO '{output_path}'
WITH (FORMAT PARQUET);
""")
copy_end_time = conn.execute("select now();").fetchone()[0]
copy_duration = copy_end_time - copy_start_time;
print(f" exported in {copy_duration}")
OS:
Windows 11 Pro 24H2
DuckDB Package Version:
1.4.0
Python Version:
3.13.7
Full Name:
H M
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have