Skip to content
Closed
Show file tree
Hide file tree
Changes from 79 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
fb4598f
chore: Add 3.14 and 3.14t builds: update GHA matrix, bump uv and cibu…
paultiq Sep 14, 2025
34d20e1
chore: remove pandas 3.0 warnings -> instead, disable pandas for 3.14…
paultiq Sep 14, 2025
68f30ec
test: Disable Pandas for 3.14
paultiq Sep 14, 2025
275b6f2
test: disable failing test "Windows fatal exception: access violation"
paultiq Sep 15, 2025
23a2f9b
tests: skip, don't xfail
paultiq Sep 15, 2025
699d9ab
exclude Windows
paultiq Sep 15, 2025
43007a8
tests: revert the skip since we're excluding Windows 3.14t builds ent…
paultiq Sep 15, 2025
4db9da1
revert: import that was added, no longer needed
paultiq Sep 15, 2025
2374987
revert: exactly to original
paultiq Sep 15, 2025
25fd097
test: Mark test xfail
paultiq Sep 15, 2025
556e32d
test: mark test xfail
paultiq Sep 15, 2025
f5ad9d5
Merge branch 'main' into ci314t
paultiq Sep 15, 2025
b617188
chore: Add comments and todo's for workflow changes
paultiq Sep 15, 2025
03d5eca
chore: Remove unused section for Windows 3.14t builds.
paultiq Sep 15, 2025
70e70ae
chore: Add version check to only allow no-Pandas for 3.14, plus a TODO
paultiq Sep 15, 2025
0c20b89
chore: enable sccache for builds and disable unneeded actions
paultiq Sep 15, 2025
d276412
fix: disable coverage_test, not packaging_test
paultiq Sep 15, 2025
b81bcba
direct install
paultiq Sep 15, 2025
635d859
fix: set ARCH for sccache download
paultiq Sep 15, 2025
15d1e76
fix: set CMAKE_C_COMPILER_LAUNCHER to avoid double sccache
paultiq Sep 15, 2025
54f1e5e
chore: Add pytest modules
paultiq Sep 15, 2025
e28cb93
disable unused uv cache
paultiq Sep 15, 2025
4f40596
windows settings
paultiq Sep 15, 2025
456cb39
enable cp313
paultiq Sep 15, 2025
5c33c44
feat: Move global state into a module state object, initialized via m…
paultiq Sep 15, 2025
f0fab76
all branches
paultiq Sep 15, 2025
69e08ea
add workflow dispatch
paultiq Sep 15, 2025
591f812
continue-on-error: true
paultiq Sep 15, 2025
f479b85
fix: missing PyErr_Clear()
paultiq Sep 16, 2025
ecae808
feat: py::mod_gil_not_used() - indicating that the extension is FT safe.
paultiq Sep 15, 2025
91fe9c2
feat: Initial free threading support, some more work needed on defaul…
paultiq Sep 16, 2025
3939ff1
add a comment
paultiq Sep 16, 2025
91bbb33
feat: add module state to connection to reduce lookups
paultiq Sep 16, 2025
fb67fe0
feat: Move global state into a module state object, initialized via m…
paultiq Sep 15, 2025
2784947
fix: missing PyErr_Clear()
paultiq Sep 16, 2025
554d950
feat: add module state to connection to reduce lookups
paultiq Sep 16, 2025
c7bc632
Merge branch 'ft2' of https://github.com/paultiq/duckdb-pythonf into …
paultiq Sep 16, 2025
3d253e4
chore: Add 3.14 and 3.14t builds: update GHA matrix, bump uv and cibu…
paultiq Sep 14, 2025
e315d53
chore: remove pandas 3.0 warnings -> instead, disable pandas for 3.14…
paultiq Sep 14, 2025
617b7bf
test: Disable Pandas for 3.14
paultiq Sep 14, 2025
88d3c15
test: disable failing test "Windows fatal exception: access violation"
paultiq Sep 15, 2025
650a10e
tests: skip, don't xfail
paultiq Sep 15, 2025
c4d4a27
exclude Windows
paultiq Sep 15, 2025
6889cc1
tests: revert the skip since we're excluding Windows 3.14t builds ent…
paultiq Sep 15, 2025
b5812b2
revert: import that was added, no longer needed
paultiq Sep 15, 2025
f909841
revert: exactly to original
paultiq Sep 15, 2025
7617f08
test: Mark test xfail
paultiq Sep 15, 2025
a18a1d3
test: mark test xfail
paultiq Sep 15, 2025
f636b37
chore: Add comments and todo's for workflow changes
paultiq Sep 15, 2025
3c22d7a
chore: Remove unused section for Windows 3.14t builds.
paultiq Sep 15, 2025
a0904f2
chore: Add version check to only allow no-Pandas for 3.14, plus a TODO
paultiq Sep 15, 2025
448cab4
chore: enable sccache for builds and disable unneeded actions
paultiq Sep 15, 2025
3f2c19c
fix: disable coverage_test, not packaging_test
paultiq Sep 15, 2025
7ea230b
direct install
paultiq Sep 15, 2025
9532ef4
fix: set ARCH for sccache download
paultiq Sep 15, 2025
d951f76
fix: set CMAKE_C_COMPILER_LAUNCHER to avoid double sccache
paultiq Sep 15, 2025
3546435
chore: Add pytest modules
paultiq Sep 15, 2025
08fba8c
disable unused uv cache
paultiq Sep 15, 2025
0b5b0ec
windows settings
paultiq Sep 15, 2025
bac920f
enable cp313
paultiq Sep 15, 2025
418f780
all branches
paultiq Sep 15, 2025
d29ba60
add workflow dispatch
paultiq Sep 15, 2025
1cf9ede
continue-on-error: true
paultiq Sep 15, 2025
f79d189
add a comment
paultiq Sep 16, 2025
1608a72
Merge branch 'ci314t_sccache' of https://github.com/paultiq/duckdb-py…
paultiq Sep 16, 2025
a3ff0ae
Make uv export quiet
paultiq Sep 16, 2025
149d58e
Remove randomly.
paultiq Sep 16, 2025
88354a1
xdist: Use tmp_path to avoid races.
paultiq Sep 16, 2025
bd5b24f
xdist: use tmp_path_factory
paultiq Sep 16, 2025
3185c0d
feat: use a static for now - too many references
paultiq Sep 16, 2025
13ce4ed
feat: py::mod_gil_not_used() - indicating that the extension is FT safe.
paultiq Sep 15, 2025
fa6af78
rebase
paultiq Sep 17, 2025
a77c15f
merge
paultiq Sep 17, 2025
a19570d
builds: set python_gil=0 for ft builds
paultiq Sep 17, 2025
54c7d00
fix: fix imports, clear,
paultiq Sep 17, 2025
2595cff
Merge branch 'ci314t_sccache' into free_threading_cl
paultiq Sep 17, 2025
a7e3830
update
paultiq Sep 17, 2025
50823cf
Merge branch 'sccache_ci' into free_threading_cl
paultiq Sep 17, 2025
4f57719
feat: move import cache to a direct object due to frequency of hits
paultiq Sep 18, 2025
a79bee4
ci: improve
paultiq Sep 18, 2025
7fb001d
revert errclear
paultiq Sep 19, 2025
36d6b11
add threading / concurrency tests
paultiq Sep 19, 2025
a63d296
add a set of threading test cases
paultiq Sep 19, 2025
445008a
tests: thread safety
paultiq Sep 19, 2025
6ea9e5e
dont force debug
paultiq Sep 19, 2025
2be3020
test: clean up tests
paultiq Sep 19, 2025
e0e398c
restore back to direct object
paultiq Sep 19, 2025
dc9d0f2
make default_connection_ptr private
paultiq Sep 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ minimum-version = "0.10"
cmake.version = ">=3.29.0"
ninja.version = ">=1.10"
ninja.make-fallback = false
cmake.build-type = "Debug"
metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"

[tool.scikit-build.wheel]
Expand Down Expand Up @@ -106,11 +107,11 @@ cmake.define.CMAKE_C_FLAGS = "--coverage -O0"
cmake.define.CMAKE_SHARED_LINKER_FLAGS = "--coverage"

# Override: if we're in editable mode then make sure a build dir is set. Note that COVERAGE runs have their own
# build-dir, and we don't want to interfere with that. We're also disabling unity builds to help with debugging.
# build-dir, and we don't want to interfere with that.
[[tool.scikit-build.overrides]]
if.state = "editable"
if.env.COVERAGE = false
build-dir = "build/debug/"
build-dir = "build/$UV_PYTHON/"
editable.rebuild = true
editable.mode = "redirect"
cmake.build-type = "Debug"
Expand Down Expand Up @@ -379,4 +380,4 @@ manylinux-x86_64-image = "manylinux_2_28"
manylinux-pypy_x86_64-image = "manylinux_2_28"
manylinux-aarch64-image = "manylinux_2_28"
manylinux-pypy_aarch64-image = "manylinux_2_28"
enable = ["cpython-freethreading", "cpython-prerelease"]

1 change: 1 addition & 0 deletions src/duckdb_py/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ add_library(python_src OBJECT
duckdb_python.cpp
importer.cpp
map.cpp
module_state.cpp
path_like.cpp
pyconnection.cpp
pyexpression.cpp
Expand Down
35 changes: 31 additions & 4 deletions src/duckdb_py/duckdb_python.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "duckdb_python/pybind11/conversions/python_udf_type_enum.hpp"
#include "duckdb_python/pybind11/conversions/python_csv_line_terminator_enum.hpp"
#include "duckdb/common/enums/statement_type.hpp"
#include "duckdb_python/module_state.hpp"

#include "duckdb.hpp"

Expand All @@ -31,6 +32,16 @@ namespace py = pybind11;

namespace duckdb {

// Private function to initialize module state
void InitializeModuleState(py::module_ &m) {
auto state_ptr = new DuckDBPyModuleState();
SetModuleState(state_ptr);

// https://pybind11.readthedocs.io/en/stable/advanced/misc.html#module-destructors
auto capsule = py::capsule(state_ptr, [](void *p) { delete static_cast<DuckDBPyModuleState *>(p); });
m.attr("__duckdb_state") = capsule;
}

enum PySQLTokenType : uint8_t {
PY_SQL_TOKEN_IDENTIFIER = 0,
PY_SQL_TOKEN_NUMERIC_CONSTANT,
Expand Down Expand Up @@ -1007,7 +1018,22 @@ static void RegisterExpectedResultType(py::handle &m) {
expected_return_type.export_values();
}

PYBIND11_MODULE(DUCKDB_PYTHON_LIB_NAME, m) { // NOLINT
// Only mark mod_gil_not_used for 3.14t or later
// This is to not add support for 3.13t
// Py_GIL_DISABLED check is not strictly necessary
#if defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030e0000
PYBIND11_MODULE(DUCKDB_PYTHON_LIB_NAME, m, py::mod_gil_not_used(),
py::multiple_interpreters::not_supported()) { // NOLINT
#else
PYBIND11_MODULE(DUCKDB_PYTHON_LIB_NAME, m,
py::multiple_interpreters::not_supported()) { // NOLINT
#endif

// Initialize module state completely during initialization
// PEP 489 wants calls for state to be module local, but currently
// static via g_module_state.
InitializeModuleState(m);

py::enum_<duckdb::ExplainType>(m, "ExplainType")
.value("STANDARD", duckdb::ExplainType::EXPLAIN_STANDARD)
.value("ANALYZE", duckdb::ExplainType::EXPLAIN_ANALYZE)
Expand Down Expand Up @@ -1046,9 +1072,10 @@ PYBIND11_MODULE(DUCKDB_PYTHON_LIB_NAME, m) { // NOLINT
m.attr("__version__") = std::string(DuckDB::LibraryVersion()).substr(1);
m.attr("__standard_vector_size__") = DuckDB::StandardVectorSize();
m.attr("__git_revision__") = DuckDB::SourceID();
m.attr("__interactive__") = DuckDBPyConnection::DetectAndGetEnvironment();
m.attr("__jupyter__") = DuckDBPyConnection::IsJupyter();
m.attr("__formatted_python_version__") = DuckDBPyConnection::FormattedPythonVersion();
auto &module_state = GetModuleState();
m.attr("__interactive__") = module_state.environment != PythonEnvironmentType::NORMAL;
m.attr("__jupyter__") = module_state.environment == PythonEnvironmentType::JUPYTER;
m.attr("__formatted_python_version__") = module_state.formatted_python_version;
m.def("default_connection", &DuckDBPyConnection::DefaultConnection,
"Retrieve the connection currently registered as the default to be used by the module");
m.def("set_default_connection", &DuckDBPyConnection::SetDefaultConnection,
Expand Down
65 changes: 65 additions & 0 deletions src/duckdb_py/include/duckdb_python/module_state.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb_python/module_state.hpp
//
//
//===----------------------------------------------------------------------===//

#pragma once

#include "duckdb_python/pybind11/pybind_wrapper.hpp"
#include "duckdb/common/shared_ptr.hpp"
#include "duckdb/main/db_instance_cache.hpp"
#include "duckdb/main/database.hpp"
#include "duckdb_python/import_cache/python_import_cache.hpp"
#include "duckdb_python/pyconnection/pyconnection.hpp"
#include <pybind11/critical_section.h>

namespace duckdb {


// Module state structure to hold per-interpreter state
struct DuckDBPyModuleState {
// TODO: Make private / move behind a thread-safe accessor
shared_ptr<DuckDBPyConnection> default_connection_ptr;
mutex default_connection_mutex;

// Python environment tracking
PythonEnvironmentType environment = PythonEnvironmentType::NORMAL;
string formatted_python_version;

DuckDBPyModuleState();

shared_ptr<DuckDBPyConnection> GetDefaultConnection();
void SetDefaultConnection(shared_ptr<DuckDBPyConnection> connection);
void ClearDefaultConnection();

PythonImportCache* GetImportCache();
void ClearImportCache();

DBInstanceCache* GetInstanceCache();

static DuckDBPyModuleState& GetGlobalModuleState();
static void SetGlobalModuleState(DuckDBPyModuleState* state);

private:
PythonImportCache import_cache;
std::unique_ptr<DBInstanceCache> instance_cache;
#ifdef Py_GIL_DISABLED
py::object lock_object;
#endif

// Static module state cache for performance optimization
// TODO: Replace with proper per-interpreter state for multi-interpreter support
static DuckDBPyModuleState* g_module_state;

// Non-copyable
DuckDBPyModuleState(const DuckDBPyModuleState &) = delete;
DuckDBPyModuleState &operator=(const DuckDBPyModuleState &) = delete;
};

DuckDBPyModuleState &GetModuleState();
void SetModuleState(DuckDBPyModuleState *state);

} // namespace duckdb
21 changes: 10 additions & 11 deletions src/duckdb_py/include/duckdb_python/pyconnection/pyconnection.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

namespace duckdb {
struct BoundParameterData;
struct DuckDBPyModuleState;

enum class PythonEnvironmentType { NORMAL, INTERACTIVE, JUPYTER };

Expand Down Expand Up @@ -172,8 +173,7 @@ struct DuckDBPyConnection : public enable_shared_from_this<DuckDBPyConnection> {
case_insensitive_set_t registered_objects;

public:
explicit DuckDBPyConnection() {
}
DuckDBPyConnection();
~DuckDBPyConnection();

public:
Expand All @@ -190,9 +190,17 @@ struct DuckDBPyConnection : public enable_shared_from_this<DuckDBPyConnection> {
static std::string FormattedPythonVersion();
static shared_ptr<DuckDBPyConnection> DefaultConnection();
static void SetDefaultConnection(shared_ptr<DuckDBPyConnection> conn);
static shared_ptr<DuckDBPyConnection> GetDefaultConnection();
static void ClearDefaultConnection();
static void ClearImportCache();
static PythonImportCache *ImportCache();
static bool IsInteractive();

// Instance methods for optimized module state access
bool IsJupyterInstance() const;
bool IsInteractiveInstance() const;
std::string FormattedPythonVersionInstance() const;

unique_ptr<DuckDBPyRelation> ReadCSV(const py::object &name, py::kwargs &kwargs);

py::list ExtractStatements(const string &query);
Expand Down Expand Up @@ -337,11 +345,6 @@ struct DuckDBPyConnection : public enable_shared_from_this<DuckDBPyConnection> {
py::list ListFilesystems();
bool FileSystemIsRegistered(const string &name);

//! Default connection to an in-memory database
static DefaultConnectionHolder default_connection;
//! Caches and provides an interface to get frequently used modules+subtypes
static shared_ptr<PythonImportCache> import_cache;

static bool IsPandasDataframe(const py::object &object);
static PyArrowObjectType GetArrowType(const py::handle &obj);
static bool IsAcceptedArrowObject(const py::object &object);
Expand All @@ -357,10 +360,6 @@ struct DuckDBPyConnection : public enable_shared_from_this<DuckDBPyConnection> {
bool side_effects);
void RegisterArrowObject(const py::object &arrow_object, const string &name);
vector<unique_ptr<SQLStatement>> GetStatements(const py::object &query);

static PythonEnvironmentType environment;
static std::string formatted_python_version;
static void DetectEnvironment();
};

template <typename T>
Expand Down
126 changes: 126 additions & 0 deletions src/duckdb_py/module_state.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb_python/module_state.cpp
//
//
//===----------------------------------------------------------------------===//

#include "duckdb_python/module_state.hpp"
#include <stdexcept>
#include <chrono>
#include <thread>

// Enable debug prints for performance analysis
#define DEBUG_MODULE_STATE 1

namespace duckdb {

// Forward declaration from pyconnection.cpp
void InstantiateNewInstance(DuckDB &db);

// Static member initialization - required for all static class members in C++
DuckDBPyModuleState *DuckDBPyModuleState::g_module_state = nullptr;

// Module state constructor
DuckDBPyModuleState::DuckDBPyModuleState() {
// Create caches
instance_cache = make_uniq<DBInstanceCache>();
// import_cache: direct object due to frequent calls

#ifdef Py_GIL_DISABLED
// Initialize lock object for critical sections
// TODO: Consider moving to finer-grained locks
lock_object = py::none();
#endif

// Detects Python environment and version during intialization
// Moved from DuckDBPyConnection::DetectEnvironment()
py::module_ sys = py::module_::import("sys");
py::object version_info = sys.attr("version_info");
int major = py::cast<int>(version_info.attr("major"));
int minor = py::cast<int>(version_info.attr("minor"));
formatted_python_version = std::to_string(major) + "." + std::to_string(minor);

// If __main__ does not have a __file__ attribute, we are in interactive mode
auto main_module = py::module_::import("__main__");
if (!py::hasattr(main_module, "__file__")) {
environment = PythonEnvironmentType::INTERACTIVE;

if (ModuleIsLoaded<IpythonCacheItem>()) {
// Check to see if we are in a Jupyter Notebook
auto get_ipython = import_cache.IPython.get_ipython();
if (get_ipython.ptr() != nullptr) {
auto ipython = get_ipython();
if (py::hasattr(ipython, "config")) {
py::dict ipython_config = ipython.attr("config");
if (ipython_config.contains("IPKernelApp")) {
environment = PythonEnvironmentType::JUPYTER;
}
}
}
}
}
}

DuckDBPyModuleState &DuckDBPyModuleState::GetGlobalModuleState() {
// TODO: Externalize this static cache when adding multi-interpreter support
// For now, single interpreter assumption allows simple static caching
if (!g_module_state) {
throw InternalException("Module state not initialized - call SetGlobalModuleState() during module init");
}
return *g_module_state;
}

void DuckDBPyModuleState::SetGlobalModuleState(DuckDBPyModuleState *state) {
#if DEBUG_MODULE_STATE
printf("DEBUG: SetGlobalModuleState() called - initializing static cache (built: %s %s)\n", __DATE__, __TIME__);
#endif
g_module_state = state;
}

DuckDBPyModuleState &GetModuleState() {
#if DEBUG_MODULE_STATE
printf("DEBUG: GetModuleState() called\n");
#endif
return DuckDBPyModuleState::GetGlobalModuleState();
}

void SetModuleState(DuckDBPyModuleState *state) {
DuckDBPyModuleState::SetGlobalModuleState(state);
}

shared_ptr<DuckDBPyConnection> DuckDBPyModuleState::GetDefaultConnection() {
lock_guard<mutex> guard(default_connection_mutex);
Copy link

@ngoldbaum ngoldbaum Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the GIL held here? I don't know pybind11 well enough to tell. If it is then you have to explicitly release it before locking a mutex, which is a possibly blocking operation. Otherwise you might deadlock with the GIL (or the garbage collector on the free-threaded build).

This is what MutexExt in PyO3 does. I don't know if there's something equivalent in pybind11. There probably should be if there isn't!

If you don't hold the GIL at this point then disregard all that. But also double-check everywhere else you are doing a possibly-blocking call with a standard library synchronization primitive.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this through...

For context: I'm starting from code that is "GIL" safe, so mostly trying to preserve existing behavior / not change anything.

The GIL should be held in these paths; I didn't add a check, but I'll add a todo to consider a safety check.

The mutexes were just placeholders: next commit replaces them with an ifdef check for free threading and then either a mutex or a scoped_critical_section.

// Reproduce exact logic from original DefaultConnectionHolder::Get()
if (!default_connection_ptr || default_connection_ptr->con.ConnectionIsClosed()) {
py::dict config_dict;
default_connection_ptr = DuckDBPyConnection::Connect(py::str(":memory:"), false, config_dict);
}
return default_connection_ptr;
}

void DuckDBPyModuleState::SetDefaultConnection(shared_ptr<DuckDBPyConnection> connection) {
lock_guard<mutex> guard(default_connection_mutex);
default_connection_ptr = std::move(connection);
}

void DuckDBPyModuleState::ClearDefaultConnection() {
lock_guard<mutex> guard(default_connection_mutex);
default_connection_ptr = nullptr;
}

PythonImportCache *DuckDBPyModuleState::GetImportCache() {
return &import_cache;
}

void DuckDBPyModuleState::ClearImportCache() {
// Direct object will be cleaned up automatically by destructor
// TODO: If explicit clearing is needed, add Clear() method to PythonImportCache
}

DBInstanceCache *DuckDBPyModuleState::GetInstanceCache() {
return instance_cache.get();
}

} // namespace duckdb
3 changes: 3 additions & 0 deletions src/duckdb_py/native/python_conversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -961,6 +961,9 @@ void TransformPythonObjectInternal(py::handle ele, A &result, const B &param, bo
break;
}
if (conversion_target.id() == LogicalTypeId::UBIGINT) {
if (PyErr_Occurred()) {
PyErr_Clear();
}
throw InvalidInputException("Python Conversion Failure: Value out of range for type %s",
conversion_target);
}
Expand Down
Loading