Skip to content

Commit f084653

Browse files
committed
Python build instruction adjustments
1 parent c7f27f9 commit f084653

File tree

1 file changed

+61
-52
lines changed

1 file changed

+61
-52
lines changed

docs/stable/dev/building/python.md

Lines changed: 61 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,26 @@ redirect_from:
55
title: Python
66
---
77

8-
The DuckDB Python package lives in the main [DuckDB source on Github](https://github.com/duckdb/duckdb/) under the `/tools/pythonpkg/` folder. It uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.
8+
The DuckDB Python package lives in the main [DuckDB source on GitHub](https://github.com/duckdb/duckdb/) under the `/tools/pythonpkg/` folder. It uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.
99

1010
## Prerequisites
1111

1212
For everything described on this page we make the following assumptions:
1313

14-
1. You have a working copy of the duckdb source (including the git tags) and you run commands from the root of the source
15-
2. You have a suitable Python installation available in a dedicated virtual env
14+
1. You have a working copy of the DuckDB source (including the git tags) and you run commands from the root of the source.
15+
2. You have a suitable Python installation available in a dedicated virtual environment.
1616

17-
### 1. DuckDB code
17+
### 1. DuckDB Repository
1818

1919
Make sure you have checked out the [DuckDB source](https://github.com/duckdb/duckdb/) and that you are in its root. E.g.:
2020

2121
```batch
22-
$ git clone https://github.com/duckdb/duckdb.git
22+
git clone https://github.com/duckdb/duckdb
2323
...
24-
$ cd duckdb
24+
cd duckdb
2525
```
2626

27-
If you've _forked_ DuckDB you may run into trouble when building the Python package when you haven't pulled in the tags.
27+
If you've _forked_ DuckDB, you may run into trouble when building the Python package when you haven't pulled in the tags.
2828

2929
```batch
3030
# Check your remotes
@@ -38,51 +38,51 @@ git fetch --tags upstream
3838
git push --tags
3939
```
4040

41-
### 2. Python Virtual Env
41+
### 2. Python Virtual Environment
4242

4343
For everything described here you will need a suitable Python installation. While you technically might be able to use your system Python, we **strongly** recommend you use a Python virtual environment. A virtual environment isolates dependencies and, depending on the tooling you use, gives you control over which Python interpreter you use. This way you don't pollute your system-wide Python with the different packages you need for your projects.
4444

45-
While we use Python's built-in `venv` module in our examples below, and technically this might (or migth not!) work for you, we also **strongly** recommend use a tool like [astral uv](https://docs.astral.sh/uv/) (or Poetry, conda, etc) that allows you to manage _both_ Python interpreter versions and virtual environments.
45+
While we use Python's built-in `venv` module in our examples below, and technically this might (or might not!) work for you, we also **strongly** recommend use a tool like [astral uv](https://docs.astral.sh/uv/) (or Poetry, conda, etc.) that allows you to manage _both_ Python interpreter versions and virtual environments.
4646

4747
Create and activate a virtual env as follows:
4848

4949
```batch
5050
# Create a virtual environment in the .venv folder (in the duckdb source root)
51-
$ python3 -m venv --prompt duckdb .venv
51+
python3 -m venv --prompt duckdb .venv
5252
5353
# Activate the virtual env
54-
$ source .venv/bin/activate
54+
source .venv/bin/activate
5555
```
5656

57-
Make sure you have a modern enough version of pip available in your virtual env:
57+
Make sure you have a modern enough version of `pip` available in your virtual env:
5858

5959
```batch
6060
# Print pip's help
61-
$ python3 -m pip install --upgrade pip
61+
python3 -m pip install --upgrade pip
6262
```
6363

6464
If that fails with `No module named pip` and you use `uv`, then run:
6565

6666
```batch
6767
# Install pip
68-
$ uv pip install pip
68+
uv pip install pip
6969
```
7070

71-
## Building From Source
71+
## Building from Source
7272

73-
Below are a number of options to build the python library from source, with or without debug symbols, and with a default or custom set of [extensions]({% link docs/stable/extensions/overview.md %}). Make sure to check out the [DuckDB build documentation]({% link docs/stable/dev/building/overview.md %}) if you run into trouble building the DuckDB main library.
73+
Below are a number of options to build the Python library from source, with or without debug symbols, and with a default or custom set of [extensions]({% link docs/stable/extensions/overview.md %}). Make sure to check out the [DuckDB build documentation]({% link docs/stable/dev/building/overview.md %}) if you run into trouble building the DuckDB main library.
7474

75-
### Default release, debug build or cloud storage
75+
### Default Release, Debug Build or Cloud Storage
7676

77-
The following will build the package with the default set of extensions (json, parquet, icu and core_function).
77+
The following will build the package with the default set of extensions (json, parquet, icu and core_functions).
7878

79-
#### Release build
79+
#### Release Build
8080

8181
```batch
8282
GEN=ninja BUILD_PYTHON=1 make release
8383
```
8484

85-
#### Debug build
85+
#### Debug Build
8686

8787
```batch
8888
GEN=ninja BUILD_PYTHON=1 make debug
@@ -94,17 +94,18 @@ GEN=ninja BUILD_PYTHON=1 make debug
9494
python3 -c "import duckdb; print(duckdb.sql('SELECT 42').fetchall())"
9595
```
9696

97-
### Adding extensions
97+
### Adding Extensions
9898

99-
Before thinking about statically linking extensions you should know that the Python package currently doesn't handle linked in extensions very well. If you don't really need to have an extension baked in than the advice is to just stick to [installing them at runtime]({% link docs/stable/extensions/installing_extensions.md %}). See `tools/pythonpkg/duckdb_extension_config.cmake` for the default list of extensions that are built with the python package. Any other extension should be considered problematic.
99+
Before thinking about statically linking extensions you should know that the Python package currently doesn't handle linked in extensions very well. If you don't really need to have an extension baked in than the advice is to just stick to [installing them at runtime]({% link docs/stable/extensions/installing_extensions.md %}). See `tools/pythonpkg/duckdb_extension_config.cmake` for the default list of extensions that are built with the Python package. Any other extension should be considered problematic.
100100

101101
Having said that, if you do want to give it a try, here's how.
102102

103-
For more details on building DuckDB extensions look at the [documentation]({% link docs/stable/dev/building/building_extensions.md %}).
103+
> For more details on building DuckDB extensions look at the [documentation]({% link docs/stable/dev/building/building_extensions.md %}).
104104
105105
The DuckDB build process follows the following logic for building extensions:
106-
1. First compose the complete set of extensions that might be included in the build
107-
1. Then compose the complete set of extensions that should be excluded from the build
106+
107+
1. First compose the complete set of extensions that might be included in the build.
108+
1. Then compose the complete set of extensions that should be excluded from the build.
108109
1. Assemble the final set of extensions to be compiled by subtracting the set of excluded extensions from the set of included extensions.
109110

110111
The following mechanisms add to the set of **_included_ extensions**:
@@ -131,7 +132,7 @@ The following mechanisms add to the set of **_excluded_ extensions**:
131132

132133
---
133134

134-
### Show all installed extensions
135+
### Show All Installed Extensions
135136

136137
```batch
137138
python3 -c "import duckdb; print(duckdb.sql('SELECT extension_name, installed, description FROM duckdb_extensions();'))"
@@ -155,18 +156,20 @@ GEN=ninja BUILD_PYTHON=1 PYTHON_DEV=1 make debug
155156
```
156157

157158
This will take care of the following:
158-
* Builds both the main duckdb library and the python library with debug symbols
159-
* Generates a `compile-commands.json` file that includes CPython and pybind11 headers so that intellisense and clang-tidy checks work in your IDE
160-
* Installs the required Python dependencies in your virtual env
159+
160+
* Builds both the main DuckDB library and the Python library with debug symbols.
161+
* Generates a `compile-commands.json` file that includes CPython and pybind11 headers so that intellisense and clang-tidy checks work in your IDE.
162+
* Installs the required Python dependencies in your virtual env.
161163

162164
Once the build completes, do a sanity check to make sure everything works:
163165

164166
```batch
165167
python3 -c "import duckdb; print(duckdb.sql('SELECT 42').fetchall())"
166168
```
167169

168-
To debug, the basic recipe is to start `lldb` with your virtual env's Python interpreter and your script, then set a breakpoint and run your script.
170+
### Debugging
169171

172+
The basic recipe is to start `lldb` with your virtual env's Python interpreter and your script, then set a breakpoint and run your script.
170173
For example, given a script `dataframe.df` with the following contents:
171174

172175
```python
@@ -178,7 +181,9 @@ The following should work:
178181

179182
```batch
180183
lldb -- .venv/bin/python3 my_script.py
181-
...
184+
```
185+
186+
```batch
182187
# Set a breakpoint
183188
(lldb) br s -n duckdb::DuckDBPyRelation::FetchDF
184189
Breakpoint 1: no locations (pending).
@@ -207,63 +212,67 @@ You should be able to get debugging going in an IDE that support `lldb`. Below a
207212

208213
The following CMake profile enables Intellisense and clang-tidy by generating a `compile-commands.json` file so your IDE knows how to inspect the source code, and makes sure that the Python package will be built and installed in your Python virtual env.
209214

210-
Under `Settings | Build, Execution, Deployment | CMake`, add a profile and set the fields as follows:
215+
Under **Settings** | **Build, Execution, Deployment** | **CMake**, add a profile and set the fields as follows:
211216

212217
* **Name**: Debug
213218
* **Build type**: Debug
214219
* **Generator**: Ninja
215220
* **CMake Options** (on a single line):
216-
```console
217-
-DCMAKE_PREFIX_PATH=$CMakeProjectDir$/.venv;$CMAKE_PREFIX_PATH
218-
-DPython3_EXECUTABLE=$CMakeProjectDir$/.venv/bin/python3
219-
-DBUILD_PYTHON=1
220-
-DPYTHON_DEV=1
221-
```
222221

223-
#### Create a run config for debugging
222+
```console
223+
-DCMAKE_PREFIX_PATH=$CMakeProjectDir$/.venv;$CMAKE_PREFIX_PATH
224+
-DPython3_EXECUTABLE=$CMakeProjectDir$/.venv/bin/python3
225+
-DBUILD_PYTHON=1
226+
-DPYTHON_DEV=1
227+
```
228+
229+
#### Create a Run Config for Debugging
230+
231+
Under **Run** | **Edit Configurations...** create a new **CMake Application**. Use the following values:
224232

225-
Under Run -> Edit Configurations... create a new CMake Application. Use the following values:
226233
* **Name**: Python Debug
227234
* **Target**: `All targets`
228235
* **Executable**: `[ABS_PATH_TO_YOUR_VENV]/bin/python3` (careful: this is a symlink and sometimes an IDE might try and follow it and fill in the path to the actual executable, but that will not work)
229236
* **Program arguments**: `$FilePath$`
230237
* **Working directory**: `$ProjectFileDir$`
231238
* **Before Launch**: `Build` (this should already be set)
232239

233-
That should be enough: Save and close.
240+
That should be enough: save and close.
234241

235242
Now you can set a breakpoint in a C++ file. You then open your Python script in your editor and use this config and run `Python Debug` in debug mode.
236243

237244
### Development and Stubs
238245

239246
`*.pyi` stubs in `duckdb-stubs` are manually maintained. The connection-related stubs are generated using dedicated scripts in `tools/pythonpkg/scripts/`:
247+
240248
* `generate_connection_stubs.py`
241249
* `generate_connection_wrapper_stubs.py`
242250

243251
These stubs are important for autocomplete in many IDEs, as static-analysis based language servers can't introspect `duckdb`'s binary module.
244252

245253
To verify the stubs match the actual implementation:
254+
246255
```batch
247256
python3 -m pytest tests/stubs
248257
```
249258

250259
If you add new methods to the DuckDB Python API, you'll need to manually add corresponding type hints to the stub files.
251260

252-
### What are py::objects and a py::handles??
261+
### What are py::objects and a py::handles?
253262

254-
These are classes provided by pybind11, the library we use to manage our interaction with the python environment.
255-
py::handle is a direct wrapper around a raw PyObject* and does not manage any references.
256-
py::object is similar to py::handle but it can handle refcounts.
263+
These are classes provided by pybind11, the library we use to manage our interaction with the Python environment.
264+
`py::handle` is a direct wrapper around a raw PyObject* and does not manage any references.
265+
`py::object` is similar to py::handle but it can handle refcounts.
257266

258-
I say *can* because it doesn't have to, using `py::reinterpret_borrow<py::object>(...)` we can create a non-owning py::object, this is essentially just a py::handle but py::handle can't be used if the prototype requires a py::object.
267+
I say *can* because it doesn't have to, using `py::reinterpret_borrow<py::object>(...)` we can create a non-owning `py::object`, this is essentially just a py::handle but py::handle can't be used if the prototype requires a `py::object`.
259268

260-
`py::reinterpret_steal<py::object>(...)` creates an owning py::object, this will increase the refcount of the python object and will decrease the refcount when the py::object goes out of scope.
269+
`py::reinterpret_steal<py::object>(...)` creates an owning `py::object`, this will increase the refcount of the python object and will decrease the refcount when the `py::object` goes out of scope.
261270

262271
When directly interacting with python functions that return a `PyObject*`, such as `PyDateTime_DATE_GET_TZINFO`, you should generally wrap the call in `py::reinterpret_steal` to take ownership of the returned object.
263272

264273
## Troubleshooting
265274

266-
### Pip fails with `No names found, cannot describe anything`
275+
### Pip Fails with `No names found, cannot describe anything`
267276

268277
If you've forked DuckDB you may run into trouble when building the Python package when you haven't pulled in the tags.
269278

@@ -279,7 +288,7 @@ git fetch --tags upstream
279288
git push --tags
280289
```
281290

282-
### Building with the httpfs extension Fails
291+
### Building with the httpfs Extension Fails
283292

284293
The build fails on OSX when both the [`httpfs` extension]({% link docs/stable/extensions/httpfs/overview.md %}) and the Python package are included:
285294

@@ -293,17 +302,17 @@ make: *** [release] Error 1
293302

294303
Linking in the httpfs extension is problematic. Please install it at runtime, if you can.
295304

296-
### Importing duckdb fails with `symbol not found in flat namespace`
305+
### Importing DuckDB Fails with `symbol not found in flat namespace`
297306

298307
If you seen an error that looks like this:
299308

300309
```console
301310
ImportError: dlopen(/usr/bin/python3/site-packages/duckdb/duckdb.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_MD5_Final'
302311
```
303312

304-
... then you've probably tried to link in a problematic extension. As mentioned above: `tools/pythonpkg/duckdb_extension_config.cmake` contains the default list of extensions that are built with the python package. Any other extension might cause problems.
313+
... then you've probably tried to link in a problematic extension. As mentioned above: `tools/pythonpkg/duckdb_extension_config.cmake` contains the default list of extensions that are built with the Python package. Any other extension might cause problems.
305314

306-
### Python fails with `No module named 'duckdb.duckdb'`
315+
### Python Fails with `No module named 'duckdb.duckdb'`
307316

308317
If you're in `tools/pythonpkg` and try to `import duckdb` you might see:
309318

0 commit comments

Comments
 (0)