Skip to content

Conversation

@James-Gilbert-
Copy link

Changes

  • Adds basic support for DuckDB in file based mode using go-duckdb with the x_bento_extra tag
  • Upgrades to Go 1.24

Testing

  • I verified some basic functionality with reading/exporting to parquet by mounting a writeable volume. According to the go-duckdb docs, JSON + Parquet support should be included in the binaries.
  • Purely in-memory behavior wasn't tested. As far as I can tell, there is some remaining effort required for safe concurrent access via the go bindings.

Notes

Due to some ABI compatibility issues with the pre-built static binaries, a newer base image and changes to the Dockerfile.cgo were needed. It is possible to build each of the dependencies from source (either dynamically or statically).

From experiments, it's possible to get a statically linked binary down to ~200MB building from source, with build time and licensing requirements being the primary trade-off.

Copy link
Collaborator

@gregfurman gregfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks for the PR! Have left some comments.

I was giving the original thread and linked issues a read. It appears that duckdb v1.3.1 will be fixing this. See marcboeker/go-duckdb#454 (comment)

Should we perhaps just wait until this has been fixed and see if that'll solve the CGO dependency? Would be great to have this in the pure Go bento build 😄

}

func sqlOpenWithReworks(ctx context.Context, logger *service.Logger, driver, dsn string, connSettings *connSettings) (*sql.DB, error) {
if !slices.Contains(sql.Drivers(), driver) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach! I think, however, that we should leverage build tags to dynamically set how we should be validating this. For example, see the driverField variable above at

var driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner", "duckdb").
Description("A database [driver](#drivers) to use.")

What if instead we added a LintRule that (by default) returned this error message you've added below except, we only set this when the x_bento_extra tag has NOT been used.

So we add a conn_fields_pure.go where we overwrite this driverField to raise a lint error when not in CGO/bento_extra:

//go:build !x_bento_extra

package sql

func init() {
   driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner").
	Description("A database [driver](#drivers) to use.").LintRule(`root = if this == "duckdb" { [ "Cannot use DuckDB driver outside of CGO build. See ..." ] }`)
}

@James-Gilbert-
Copy link
Author

Wow thanks for the PR! Have left some comments.

I was giving the original thread and linked issues a read. It appears that duckdb v1.3.1 will be fixing this. See marcboeker/go-duckdb#454 (comment)

Should we perhaps just wait until this has been fixed and see if that'll solve the CGO dependency? Would be great to have this in the pure Go bento build 😄

Due to C++ deps, CGO builds are needed for duckdb. But there are a few strategies to reduce overall image and/or binary size.

@gregfurman
Copy link
Collaborator

Can you run a make docs and commit the updated.md files?

Comment on lines 7 to 10
func init() {
driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner").
Description("A database [driver](#drivers) to use.").LintRule(`root = if this == "duckdb" { [ "Cannot use DuckDB driver outside of CGO build. Use a binary built with tag x_bento_extra, or the CGO Docker image" ] }`)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this locally and all looks good!

@@ -1,36 +1,72 @@
FROM golang:1.23 AS build
FROM ubuntu:24.04 AS build
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not stay on an official golang image when building?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to now with the duckdb update, I had a PR there that downgraded the C++ ABI for better compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sick. Have these upstream changes been merged? Or should we address in a follow up


# Pack
FROM debian:latest
FROM ubuntu:24.04 AS final
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question as above: why the move over to ubuntu?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed a newer C++ runtime environment for the duckdb dependencies since it's not a fully static build

Comment on lines +36 to +37
# Copy the full source and build
COPY . ./
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont this also copy the entire website/, .git/, resources/ etc.?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, has the same behavior as before but uses the relative path.


# Install minimal runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
libzmq5 ca-certificates \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why install ca-certificates both in builder and again in final?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a holdover from trying to use slimmer images, can just copy it over again

Copy link
Author

@James-Gilbert- James-Gilbert- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is the goal of the build process to produce a fully static binary? Or is it to create an image with the extra CGO runtime dependencies?

Comment on lines +36 to +37
# Copy the full source and build
COPY . ./
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, has the same behavior as before but uses the relative path.


# Pack
FROM debian:latest
FROM ubuntu:24.04 AS final
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed a newer C++ runtime environment for the duckdb dependencies since it's not a fully static build


# Install minimal runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
libzmq5 ca-certificates \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a holdover from trying to use slimmer images, can just copy it over again

@@ -1,36 +1,72 @@
FROM golang:1.23 AS build
FROM ubuntu:24.04 AS build
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to now with the duckdb update, I had a PR there that downgraded the C++ ABI for better compatibility.

@gregfurman
Copy link
Collaborator

Or is it to create an image with the extra CGO runtime dependencies?

@James-Gilbert- I suppose that since we don't release the CGO/x_bento_extra builds as a static binary, the goal would be to create an image with runtime dependencies baked in.

To clarify, is the goal of the build process to produce a fully static binary?

Curious, would something like this be possible? Perhaps using embed with those C dependencies being statically linked. I'm no expert here so excuse the ignorance 🤷

@gregfurman
Copy link
Collaborator

Also, lmk if you need me to re-build the new SQL documentation since it's failing in your CI 😄

@James-Gilbert-
Copy link
Author

Or is it to create an image with the extra CGO runtime dependencies?

@James-Gilbert- I suppose that since we don't release the CGO/x_bento_extra builds as a static binary, the goal would be to create an image with runtime dependencies baked in.

To clarify, is the goal of the build process to produce a fully static binary?

Curious, would something like this be possible? Perhaps using embed with those C dependencies being statically linked. I'm no expert here so excuse the ignorance 🤷

I think it'd be possible to statically-cross compile, but with significant effort (probably some upstream effort w/DuckDB). Having a final layer with runtime dependencies is much less of a change.

@gregfurman
Copy link
Collaborator

Or is it to create an image with the extra CGO runtime dependencies?

@James-Gilbert- I suppose that since we don't release the CGO/x_bento_extra builds as a static binary, the goal would be to create an image with runtime dependencies baked in.

To clarify, is the goal of the build process to produce a fully static binary?

Curious, would something like this be possible? Perhaps using embed with those C dependencies being statically linked. I'm no expert here so excuse the ignorance 🤷

I think it'd be possible to statically-cross compile, but with significant effort (probably some upstream effort w/DuckDB). Having a final layer with runtime dependencies is much less of a change.

Gotcha! Once you run a make docs and commit the generated files, it should unblock the CI 🟢

Otherwise, lmk re: your capacity to address the remainder of image comments. I'm happy to also follow up if you'd prefer 🙂

@James-Gilbert-
Copy link
Author

It seems that because of the build tag init(), the "duckdb" driver string is dropped after re-running make docs

@gregfurman gregfurman added this to the 1.11 milestone Aug 28, 2025
@gregfurman gregfurman modified the milestones: 1.11, 1.13 Oct 22, 2025
@aronchick
Copy link
Contributor

would really love to have this! lmk if i can help at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants