You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/stable/dev/building/python.md
+61-52Lines changed: 61 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,26 +5,26 @@ redirect_from:
5
5
title: Python
6
6
---
7
7
8
-
The DuckDB Python package lives in the main [DuckDB source on Github](https://github.com/duckdb/duckdb/) under the `/tools/pythonpkg/` folder. It uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.
8
+
The DuckDB Python package lives in the main [DuckDB source on GitHub](https://github.com/duckdb/duckdb/) under the `/tools/pythonpkg/` folder. It uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.
9
9
10
10
## Prerequisites
11
11
12
12
For everything described on this page we make the following assumptions:
13
13
14
-
1. You have a working copy of the duckdb source (including the git tags) and you run commands from the root of the source
15
-
2. You have a suitable Python installation available in a dedicated virtual env
14
+
1. You have a working copy of the DuckDB source (including the git tags) and you run commands from the root of the source.
15
+
2. You have a suitable Python installation available in a dedicated virtual environment.
16
16
17
-
### 1. DuckDB code
17
+
### 1. DuckDB Repository
18
18
19
19
Make sure you have checked out the [DuckDB source](https://github.com/duckdb/duckdb/) and that you are in its root. E.g.:
20
20
21
21
```batch
22
-
$ git clone https://github.com/duckdb/duckdb.git
22
+
git clone https://github.com/duckdb/duckdb
23
23
...
24
-
$ cd duckdb
24
+
cd duckdb
25
25
```
26
26
27
-
If you've _forked_ DuckDB you may run into trouble when building the Python package when you haven't pulled in the tags.
27
+
If you've _forked_ DuckDB, you may run into trouble when building the Python package when you haven't pulled in the tags.
28
28
29
29
```batch
30
30
# Check your remotes
@@ -38,51 +38,51 @@ git fetch --tags upstream
38
38
git push --tags
39
39
```
40
40
41
-
### 2. Python Virtual Env
41
+
### 2. Python Virtual Environment
42
42
43
43
For everything described here you will need a suitable Python installation. While you technically might be able to use your system Python, we **strongly** recommend you use a Python virtual environment. A virtual environment isolates dependencies and, depending on the tooling you use, gives you control over which Python interpreter you use. This way you don't pollute your system-wide Python with the different packages you need for your projects.
44
44
45
-
While we use Python's built-in `venv` module in our examples below, and technically this might (or migth not!) work for you, we also **strongly** recommend use a tool like [astral uv](https://docs.astral.sh/uv/) (or Poetry, conda, etc) that allows you to manage _both_ Python interpreter versions and virtual environments.
45
+
While we use Python's built-in `venv` module in our examples below, and technically this might (or might not!) work for you, we also **strongly** recommend use a tool like [astral uv](https://docs.astral.sh/uv/) (or Poetry, conda, etc.) that allows you to manage _both_ Python interpreter versions and virtual environments.
46
46
47
47
Create and activate a virtual env as follows:
48
48
49
49
```batch
50
50
# Create a virtual environment in the .venv folder (in the duckdb source root)
51
-
$ python3 -m venv --prompt duckdb .venv
51
+
python3 -m venv --prompt duckdb .venv
52
52
53
53
# Activate the virtual env
54
-
$ source .venv/bin/activate
54
+
source .venv/bin/activate
55
55
```
56
56
57
-
Make sure you have a modern enough version of pip available in your virtual env:
57
+
Make sure you have a modern enough version of `pip` available in your virtual env:
58
58
59
59
```batch
60
60
# Print pip's help
61
-
$ python3 -m pip install --upgrade pip
61
+
python3 -m pip install --upgrade pip
62
62
```
63
63
64
64
If that fails with `No module named pip` and you use `uv`, then run:
65
65
66
66
```batch
67
67
# Install pip
68
-
$ uv pip install pip
68
+
uv pip install pip
69
69
```
70
70
71
-
## Building From Source
71
+
## Building from Source
72
72
73
-
Below are a number of options to build the python library from source, with or without debug symbols, and with a default or custom set of [extensions]({% link docs/stable/extensions/overview.md %}). Make sure to check out the [DuckDB build documentation]({% link docs/stable/dev/building/overview.md %}) if you run into trouble building the DuckDB main library.
73
+
Below are a number of options to build the Python library from source, with or without debug symbols, and with a default or custom set of [extensions]({% link docs/stable/extensions/overview.md %}). Make sure to check out the [DuckDB build documentation]({% link docs/stable/dev/building/overview.md %}) if you run into trouble building the DuckDB main library.
74
74
75
-
### Default release, debug build or cloud storage
75
+
### Default Release, Debug Build or Cloud Storage
76
76
77
-
The following will build the package with the default set of extensions (json, parquet, icu and core_function).
77
+
The following will build the package with the default set of extensions (json, parquet, icu and core_functions).
78
78
79
-
#### Release build
79
+
#### Release Build
80
80
81
81
```batch
82
82
GEN=ninja BUILD_PYTHON=1 make release
83
83
```
84
84
85
-
#### Debug build
85
+
#### Debug Build
86
86
87
87
```batch
88
88
GEN=ninja BUILD_PYTHON=1 make debug
@@ -94,17 +94,18 @@ GEN=ninja BUILD_PYTHON=1 make debug
Before thinking about statically linking extensions you should know that the Python package currently doesn't handle linked in extensions very well. If you don't really need to have an extension baked in than the advice is to just stick to [installing them at runtime]({% link docs/stable/extensions/installing_extensions.md %}). See `tools/pythonpkg/duckdb_extension_config.cmake` for the default list of extensions that are built with the python package. Any other extension should be considered problematic.
99
+
Before thinking about statically linking extensions you should know that the Python package currently doesn't handle linked in extensions very well. If you don't really need to have an extension baked in than the advice is to just stick to [installing them at runtime]({% link docs/stable/extensions/installing_extensions.md %}). See `tools/pythonpkg/duckdb_extension_config.cmake` for the default list of extensions that are built with the Python package. Any other extension should be considered problematic.
100
100
101
101
Having said that, if you do want to give it a try, here's how.
102
102
103
-
For more details on building DuckDB extensions look at the [documentation]({% link docs/stable/dev/building/building_extensions.md %}).
103
+
> For more details on building DuckDB extensions look at the [documentation]({% link docs/stable/dev/building/building_extensions.md %}).
104
104
105
105
The DuckDB build process follows the following logic for building extensions:
106
-
1. First compose the complete set of extensions that might be included in the build
107
-
1. Then compose the complete set of extensions that should be excluded from the build
106
+
107
+
1. First compose the complete set of extensions that might be included in the build.
108
+
1. Then compose the complete set of extensions that should be excluded from the build.
108
109
1. Assemble the final set of extensions to be compiled by subtracting the set of excluded extensions from the set of included extensions.
109
110
110
111
The following mechanisms add to the set of **_included_ extensions**:
@@ -131,7 +132,7 @@ The following mechanisms add to the set of **_excluded_ extensions**:
131
132
132
133
---
133
134
134
-
### Show all installed extensions
135
+
### Show All Installed Extensions
135
136
136
137
```batch
137
138
python3 -c "import duckdb; print(duckdb.sql('SELECT extension_name, installed, description FROM duckdb_extensions();'))"
@@ -155,18 +156,20 @@ GEN=ninja BUILD_PYTHON=1 PYTHON_DEV=1 make debug
155
156
```
156
157
157
158
This will take care of the following:
158
-
* Builds both the main duckdb library and the python library with debug symbols
159
-
* Generates a `compile-commands.json` file that includes CPython and pybind11 headers so that intellisense and clang-tidy checks work in your IDE
160
-
* Installs the required Python dependencies in your virtual env
159
+
160
+
* Builds both the main DuckDB library and the Python library with debug symbols.
161
+
* Generates a `compile-commands.json` file that includes CPython and pybind11 headers so that intellisense and clang-tidy checks work in your IDE.
162
+
* Installs the required Python dependencies in your virtual env.
161
163
162
164
Once the build completes, do a sanity check to make sure everything works:
To debug, the basic recipe is to start `lldb` with your virtual env's Python interpreter and your script, then set a breakpoint and run your script.
170
+
### Debugging
169
171
172
+
The basic recipe is to start `lldb` with your virtual env's Python interpreter and your script, then set a breakpoint and run your script.
170
173
For example, given a script `dataframe.df` with the following contents:
171
174
172
175
```python
@@ -178,7 +181,9 @@ The following should work:
178
181
179
182
```batch
180
183
lldb -- .venv/bin/python3 my_script.py
181
-
...
184
+
```
185
+
186
+
```batch
182
187
# Set a breakpoint
183
188
(lldb) br s -n duckdb::DuckDBPyRelation::FetchDF
184
189
Breakpoint 1: no locations (pending).
@@ -207,63 +212,67 @@ You should be able to get debugging going in an IDE that support `lldb`. Below a
207
212
208
213
The following CMake profile enables Intellisense and clang-tidy by generating a `compile-commands.json` file so your IDE knows how to inspect the source code, and makes sure that the Python package will be built and installed in your Python virtual env.
209
214
210
-
Under `Settings | Build, Execution, Deployment | CMake`, add a profile and set the fields as follows:
215
+
Under **Settings** | **Build, Execution, Deployment** | **CMake**, add a profile and set the fields as follows:
Under **Run** | **Edit Configurations...** create a new **CMake Application**. Use the following values:
224
232
225
-
Under Run -> Edit Configurations... create a new CMake Application. Use the following values:
226
233
***Name**: Python Debug
227
234
***Target**: `All targets`
228
235
***Executable**: `[ABS_PATH_TO_YOUR_VENV]/bin/python3` (careful: this is a symlink and sometimes an IDE might try and follow it and fill in the path to the actual executable, but that will not work)
229
236
***Program arguments**: `$FilePath$`
230
237
***Working directory**: `$ProjectFileDir$`
231
238
***Before Launch**: `Build` (this should already be set)
232
239
233
-
That should be enough: Save and close.
240
+
That should be enough: save and close.
234
241
235
242
Now you can set a breakpoint in a C++ file. You then open your Python script in your editor and use this config and run `Python Debug` in debug mode.
236
243
237
244
### Development and Stubs
238
245
239
246
`*.pyi` stubs in `duckdb-stubs` are manually maintained. The connection-related stubs are generated using dedicated scripts in `tools/pythonpkg/scripts/`:
247
+
240
248
*`generate_connection_stubs.py`
241
249
*`generate_connection_wrapper_stubs.py`
242
250
243
251
These stubs are important for autocomplete in many IDEs, as static-analysis based language servers can't introspect `duckdb`'s binary module.
244
252
245
253
To verify the stubs match the actual implementation:
254
+
246
255
```batch
247
256
python3 -m pytest tests/stubs
248
257
```
249
258
250
259
If you add new methods to the DuckDB Python API, you'll need to manually add corresponding type hints to the stub files.
251
260
252
-
### What are py::objects and a py::handles??
261
+
### What are py::objects and a py::handles?
253
262
254
-
These are classes provided by pybind11, the library we use to manage our interaction with the python environment.
255
-
py::handle is a direct wrapper around a raw PyObject* and does not manage any references.
256
-
py::object is similar to py::handle but it can handle refcounts.
263
+
These are classes provided by pybind11, the library we use to manage our interaction with the Python environment.
264
+
`py::handle` is a direct wrapper around a raw PyObject* and does not manage any references.
265
+
`py::object` is similar to py::handle but it can handle refcounts.
257
266
258
-
I say *can* because it doesn't have to, using `py::reinterpret_borrow<py::object>(...)` we can create a non-owning py::object, this is essentially just a py::handle but py::handle can't be used if the prototype requires a py::object.
267
+
I say *can* because it doesn't have to, using `py::reinterpret_borrow<py::object>(...)` we can create a non-owning `py::object`, this is essentially just a py::handle but py::handle can't be used if the prototype requires a `py::object`.
259
268
260
-
`py::reinterpret_steal<py::object>(...)` creates an owning py::object, this will increase the refcount of the python object and will decrease the refcount when the py::object goes out of scope.
269
+
`py::reinterpret_steal<py::object>(...)` creates an owning `py::object`, this will increase the refcount of the python object and will decrease the refcount when the `py::object` goes out of scope.
261
270
262
271
When directly interacting with python functions that return a `PyObject*`, such as `PyDateTime_DATE_GET_TZINFO`, you should generally wrap the call in `py::reinterpret_steal` to take ownership of the returned object.
263
272
264
273
## Troubleshooting
265
274
266
-
### Pip fails with `No names found, cannot describe anything`
275
+
### Pip Fails with `No names found, cannot describe anything`
267
276
268
277
If you've forked DuckDB you may run into trouble when building the Python package when you haven't pulled in the tags.
269
278
@@ -279,7 +288,7 @@ git fetch --tags upstream
279
288
git push --tags
280
289
```
281
290
282
-
### Building with the httpfs extension Fails
291
+
### Building with the httpfs Extension Fails
283
292
284
293
The build fails on OSX when both the [`httpfs` extension]({% link docs/stable/extensions/httpfs/overview.md %}) and the Python package are included:
285
294
@@ -293,17 +302,17 @@ make: *** [release] Error 1
293
302
294
303
Linking in the httpfs extension is problematic. Please install it at runtime, if you can.
295
304
296
-
### Importing duckdb fails with `symbol not found in flat namespace`
305
+
### Importing DuckDB Fails with `symbol not found in flat namespace`
297
306
298
307
If you seen an error that looks like this:
299
308
300
309
```console
301
310
ImportError: dlopen(/usr/bin/python3/site-packages/duckdb/duckdb.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_MD5_Final'
302
311
```
303
312
304
-
... then you've probably tried to link in a problematic extension. As mentioned above: `tools/pythonpkg/duckdb_extension_config.cmake` contains the default list of extensions that are built with the python package. Any other extension might cause problems.
313
+
... then you've probably tried to link in a problematic extension. As mentioned above: `tools/pythonpkg/duckdb_extension_config.cmake` contains the default list of extensions that are built with the Python package. Any other extension might cause problems.
305
314
306
-
### Python fails with `No module named 'duckdb.duckdb'`
315
+
### Python Fails with `No module named 'duckdb.duckdb'`
307
316
308
317
If you're in `tools/pythonpkg` and try to `import duckdb` you might see:
0 commit comments