Skip to content

perf: eliminate per-cell CGO calls in getEnum via pre-built reverse dictionary#122

Open
hellower wants to merge 5 commits intoduckdb:mainfrom
hellower:perf/enum-dict-lookup
Open

perf: eliminate per-cell CGO calls in getEnum via pre-built reverse dictionary#122
hellower wants to merge 5 commits intoduckdb:mainfrom
hellower:perf/enum-dict-lookup

Conversation

@hellower
Copy link

Summary

  • getEnum() was calling 3 CGO functions per cell (VectorGetColumnTypeEnumDictionaryValueDestroyLogicalType) to convert an enum index to its string name
  • initEnum() already builds namesDict (map[string]uint32) for the setter path, but the getter had no reverse mapping and fell back to CGO every time
  • Added enumDict (map[uint32]string) — the reverse of namesDict — built once during initEnum(), replacing 3 CGO round-trips with a single Go map lookup
  • The ENUM dictionary is immutable for the lifetime of the type (CREATE TYPE ... AS ENUM), so caching is safe

Changes

File Change
type_info.go Added enumDict map[uint32]string field to vectorTypeInfo
vector.go initEnum() builds both namesDict and enumDict in one pass; added make() capacity hints
vector_getters.go getEnum() now does vec.enumDict[idx] instead of 3 CGO calls
enum_bench_test.go Benchmark comparing Dict (new) vs CGO (old) paths

Benchmark

goos: linux
goarch: amd64

BenchmarkEnumGetValue/Dict/N=10000     184   6,586,813 ns/op   164,574 B/op   10,092 allocs/op
BenchmarkEnumGetValue/CGO/N=10000       60  19,522,868 ns/op   388,492 B/op   40,092 allocs/op
BenchmarkEnumGetValue/Dict/N=100000     18  62,224,891 ns/op 1,635,777 B/op  100,664 allocs/op
BenchmarkEnumGetValue/CGO/N=100000       6 188,925,219 ns/op 3,876,269 B/op  400,664 allocs/op
Metric Improvement
Throughput ~3x faster
Allocations 75% fewer (3 allocs/cell eliminated)
Heap usage 2.4x less

Test plan

  • All existing tests pass (go test ./...)
  • Benchmark included (go test -bench=BenchmarkEnumGetValue -benchmem)
  • Reviewer: verify ENUM read correctness with edge cases (NULL values, large dictionaries)

🤖 Generated with Claude Code

…ictionary

Problem:

getEnum() was calling 3 CGO functions on every cell read:
  1. mapping.VectorGetColumnType(vec.vec)  — creates a LogicalType from C
  2. mapping.EnumDictionaryValue(lt, idx)  — looks up the enum string via C API
  3. mapping.DestroyLogicalType(&lt)       — frees the LogicalType back to C

For a 100K-row query with an ENUM column, this means 300,000 CGO
round-trips just to convert enum indices to strings — even though the
entire dictionary is already available in Go memory.

Root cause:

initEnum() already builds namesDict (map[string]uint32) for the setter
path (string → index), but the getter path had no reverse mapping and
fell back to CGO on every cell.

Fix:

1. Added enumDict (map[uint32]string) field to vectorTypeInfo — the
   reverse of namesDict (index → name).
2. In initEnum(), both maps are now built in a single pass over the C
   dictionary. Also added capacity hints to avoid rehashing.
3. In getEnum(), replaced the 3 CGO calls with a single Go map lookup.

The ENUM dictionary is immutable for the lifetime of the type
(CREATE TYPE ... AS ENUM), so caching is safe. This is consistent with
the setter path which already uses the cached namesDict.

Benchmark results (enum_bench_test.go):

  goos: linux, goarch: amd64

  BenchmarkEnumGetValue/Dict/N=10000     184   6,586,813 ns/op   164,574 B/op   10,092 allocs/op
  BenchmarkEnumGetValue/CGO/N=10000       60  19,522,868 ns/op   388,492 B/op   40,092 allocs/op
  BenchmarkEnumGetValue/Dict/N=100000     18  62,224,891 ns/op 1,635,777 B/op  100,664 allocs/op
  BenchmarkEnumGetValue/CGO/N=100000       6 188,925,219 ns/op 3,876,269 B/op  400,664 allocs/op

  - Throughput: ~3x faster
  - Allocations: 75% fewer (3 allocs/cell eliminated)
  - Memory: 2.4x less heap usage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@taniabogatsch taniabogatsch requested a review from mlafeldt March 14, 2026 15:08
@taniabogatsch taniabogatsch added the feature / enhancement Code improvements or a new feature label Mar 14, 2026
Copy link
Member

@mlafeldt mlafeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general.

Three open issues:

  1. Tidy checks are failing

  2. Reviewer: verify ENUM read correctness with edge cases (NULL values, large dictionaries)

Since you're already using AI, it would make more sense for you to add those tests. In particular, there should be one for >255 enum values to exercise USMALLINT/UINTEGER branches.

  1. See my remark on using []string.

type_info.go Outdated
tagDict map[uint32]string
// enumDict is the reverse of namesDict for ENUM types (index → name).
// Built once during initEnum to avoid per-cell CGO calls in getEnum.
enumDict map[uint32]string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this mirrors tagDict, but wouldn't a simple string slice be faster since we don't really need hashing etc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched enumDict to []string as suggested. Also added tests for NULL values and >255 entries (USMALLINT branch).

…stion

Enum indices are dense integers starting at 0, so a slice provides
O(1) access without hash overhead. Also adds edge-case tests for
NULL enum values and large dictionaries (>255 entries, USMALLINT path).
@mlafeldt
Copy link
Member

@hellower Thanks for the changes. Tidy checks are still failing though.

…ation

Go's defer statement evaluates function arguments immediately at the
defer site, not when the deferred function executes. This means:

    var types []mapping.LogicalType        // types is nil
    defer destroyLogicalTypes(types)       // captures nil — cleanup is a no-op
    types = append(types, ...)             // types now points to new backing array

When the function returns, destroyLogicalTypes receives the nil slice
that was captured at the defer site, iterates over zero elements, and
never calls DestroyLogicalType on any of the C objects. The LogicalType
objects allocated via the C API are permanently leaked.

Affected functions:
  - logicalStructType (type_info.go) — leaks one LogicalType per STRUCT field
  - logicalUnionType  (type_info.go) — leaks one LogicalType per UNION member
  - inferSliceLogicalTypeAndValue (value.go) — leaks one LogicalType per
    LIST/ARRAY element type inferred during value binding

These functions are called from Appender (STRUCT/UNION columns),
scalar/table UDFs, and any code path that constructs composite
LogicalTypes from Go-side TypeInfo.

Fix: wrap the defer call in a closure so that it captures the variable
by reference, not by value:

    defer func() { destroyLogicalTypes(types) }()

The closure captures the `types` variable itself (not its current value),
so when it executes at function return, it sees the fully-populated
slice and correctly destroys all LogicalType objects.

Verified with a standalone program:

    var s []int
    defer fmt.Println("direct:", len(s))          // prints 0
    defer func() { fmt.Println("closure:", len(s)) }()  // prints 3
    s = append(s, 1, 2, 3)
The same bug pattern exists for destroyValueSlice in value.go:
  - inferSliceLogicalTypeAndValue (line 398): values := make([]Value, 0, length)
  - createSliceValue             (line 459): var values []Value
  - createStructValue            (line 487): var values []Value

All three use `defer destroyValueSlice(values)` where `values` is
nil or len=0 at the defer site. The deferred cleanup iterates over
zero elements and never calls DestroyValue on the C-allocated
duckdb_value objects.

These functions are called when binding STRUCT, LIST, or ARRAY values
via Appender or prepared statements. Each call leaks one C Value
object per element/field.

Not a double-free risk: DuckDB's Create*Value functions (CreateListValue,
CreateArrayValue, CreateStructValue) deep-copy the input values via
allocValues → duckdb_create_*_value. The original Value objects remain
owned by the caller and must be destroyed separately.
- Remove redundant uint32 cast in vector_getters.go (unconvert)
- Wrap defer Close() calls in require.NoError for errcheck compliance
  in enum_bench_test.go (conn, dkRows, Stmt) and types_test.go (rows x2)
@hellower hellower force-pushed the perf/enum-dict-lookup branch 2 times, most recently from b6d76cc to 7e0d908 Compare March 18, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature / enhancement Code improvements or a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants