Skip to content

Segmentation fault on try_clone misuse #312

@max-celus

Description

@max-celus

After spending some time in a multithreaded program using duckdb I ran into a segmentation fault. I can be a bit more careful in my Rust code, but I was wondering if I could fix it for you. To do so I need some guidance in the way the underlying API works. The failing test can be viewed here. Its code:

    #[test]
    fn test_clone_after_close() {
        // Additional querying test to make sure our connections are still
        // usable. The original crash would happen without doing any queries.
        fn assert_can_query(conn: &Connection) {
            conn.execute("INSERT INTO test (c1) VALUES (1)", []).expect("insert");
        }

        // 1. Open owned connection
        let owned = checked_memory_handle();
        owned
            .execute_batch("create table test (c1 bigint)")
            .expect("create table");
        assert_can_query(&owned);

        // 2. Create a first clone from owned
        let clone1 = owned.try_clone().expect("first clone");
        assert_can_query(&owned);

        // 3. Close owned connection
        drop(owned);
        assert_can_query(&clone1);

        // 4. Create a second clone from the first clone. Crashes on the inner
        //    `duckdb_connect` with a segmentation fault.
        let clone2 = clone1.try_clone().expect("second clone");
        assert_can_query(&clone1);
        assert_can_query(&clone2);

        // 5. Small additional test
        drop(clone1);
        assert_can_query(&clone2);
    }

The problem is calling InnerConnection::new(...) when the ffi::duckdb_database is set to a null pointer after closing.

I would like to fix it, but there is some overhead involved. And I need some guidance on the internals of the C-interface to duckdb. We could put the ffi::duckdb_database behind some reference counting (like an Arc), and only call ffi::duckdb_close(...) when that database object is dropped. This would introduce an extra pointer indirection through the Arc and the overhead of atomic operations.

But to then make the entire implementation a bit more sound we would need to change the calls:

  • Connection::open_from_raw: as this would now require an Arc-wrapped object.
  • InnerConnection::close(...): must be changed to consume self. Which causes a problem in Connection::close(...) with the Connection::db, which is a RefCell<InnerConnection>. As we would have to consume that inner connection as well. To solve this (and to reflect the fact that one cannot share a Connection across threads anyway, only move it between threads, is to make the methods on Connection take a &mut self parameter.

Here is where I stopped checking out the code. As I would like to hear if you would even want to see such far-reaching changes to the API.

Perhaps there is another (simpler) solution, some options, with varying degrees of ugliness:

  • Have an OwnedConnection that actually owns the InnerConnection. Only that OwnedConnection supports try_clone. Can be implemented with typestate as Connection<Owned>.
  • Perhaps just returning an Err from try_clone when we detect that the "owned" InnerConnection has been closed, but this would require some thread-safe sharing of a flag, with the implied overhead.

I'd love to hear from you. For now I'll just keep the "owned" InnerConnection in a very special place in my code :).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions