Skip to content

Commit b676bb0

Browse files
authored
Merge pull request #1 from hackafterdark/fix-duplicate-rows
adjustments made to getting unique row id
2 parents 65d1a2d + 2293378 commit b676bb0

File tree

8 files changed

+390
-43
lines changed

8 files changed

+390
-43
lines changed

DESIGN_PHILOSOPHIES.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Design Philosophies
2+
3+
This document includes information around design philosophies and decisions made to help document and illustrate scenarios one may encounter when using this package.
4+
5+
## Approach
6+
Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)) which is useful among organizations with strict code review processes.
7+
8+
## Comparison to Related Projects
9+
10+
#### GORM
11+
Carta is NOT an object-relational mapper (ORM).
12+
13+
#### sqlx
14+
Sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author) ie, each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts), requires running many queries or running manual post-processing of the result. Carta handles these complexities automatically.
15+
16+
## Protection vs. Graceful Handling
17+
18+
A core design principle of the `carta` mapper is to prioritize **user protection and clarity** over attempting a "graceful" but potentially incorrect guess. The library's guiding philosophy is to only proceed if the user's intent is perfectly clear. If there is any ambiguity in the mapping operation, `carta` will **fail fast** by returning an error, forcing the developer to be more explicit.
19+
20+
Making a guess might seem helpful, but it can hide serious, silent bugs. The following scenarios illustrate the balance between failing on ambiguous operations (Protection) and handling well-defined transformations (Graceful Handling).
21+
22+
---
23+
24+
### Scenario 1: Multi-column Query to a Basic Slice (Protection)
25+
26+
- **Query:** `SELECT name, email FROM users`
27+
- **Destination:** `var data []string`
28+
- **Behavior:** `carta.Map` **returns an error immediately**: `carta: when mapping to a slice of a basic type, the query must return exactly one column (got 2)`.
29+
- **Why this is Protection:** The library has no way of knowing if the user intended to map the `name` or the `email` column. A "graceful" solution might be to pick the first column arbitrarily, but this could lead to the wrong data being silently loaded into the slice. By failing fast, `carta` forces the developer to write an unambiguous query (e.g., `SELECT name FROM users`), ensuring the result is guaranteed to be correct.
30+
31+
---
32+
33+
### Scenario 2: SQL `NULL` to a Non-nullable Go Field (Protection)
34+
35+
- **Query:** `SELECT id, NULL AS name FROM users`
36+
- **Destination:** `var users []User` (where `User.Name` is a `string`)
37+
- **Behavior:** `carta.Map` **returns an error during scanning** (e.g., `carta: cannot load NULL into non-nullable type string for column name`).
38+
- **Why this is Protection:** A standard Go `string` cannot represent a `NULL` value. A "graceful" but incorrect solution would be to use the zero value (`""`), which is valid data and semantically different from "no data". This can cause subtle bugs in application logic. By failing, `carta` forces the developer to explicitly handle nullability in their Go struct by using a pointer (`*string`) or a nullable type (`sql.NullString`), making the code more robust and correct.
39+
40+
---
41+
42+
### Scenario 3: Merging `JOIN`ed Rows into Structs (Graceful Handling)
43+
44+
- **Query:** `SELECT b.id, p.id FROM blogs b JOIN posts p ON b.id = p.blog_id`
45+
- **Destination:** `var blogs []BlogWithPosts`
46+
- **Behavior:** `carta` **gracefully handles** the fact that the same blog ID appears in multiple rows. It creates one `Blog` object and appends each unique `Post` to its `Posts` slice.
47+
- **Why this is Graceful:** This is the core purpose of the library. There is no ambiguity. The library uses the unique ID of the `Blog` (the `b.id` column) to understand that these rows all describe the same parent entity. This is a well-defined transformation, not a guess.

README.md

Lines changed: 11 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Carta
22
[![codecov](https://codecov.io/github/hackafterdark/carta/graph/badge.svg?token=TYvbPGGlcL)](https://codecov.io/github/hackafterdark/carta)
33

4-
Dead simple SQL data mapper for complex Go structs.
4+
A simple SQL data mapper for complex Go structs. Load SQL data onto Go structs while keeping track of has-one and has-many relationships.
55

6-
Load SQL data onto Go structs while keeping track of has-one and has-many relationships
6+
Carta is not an object-relational mapper (ORM). With large and complex datasets, using ORMs becomes restrictive and reduces performance when working with complex queries. [Read more about the design philosophy.](#design-philosophy)
77

88
## Examples
99
Using carta is very simple. All you need to do is:
@@ -20,11 +20,11 @@ blogs := []Blog{}
2020
carta.Map(rows, &blogs)
2121
```
2222

23-
Assume that in above exmple, we are using a schema containing has-one and has-many relationships:
23+
Assume that in the above example, we are using a schema containing has-one and has-many relationships:
2424

2525
![schema](https://i.ibb.co/SPH3zhQ/Schema.png)
2626

27-
And here is our SQL query along with the corresponging Go struct:
27+
And here is our SQL query along with the corresponding Go struct:
2828
```
2929
select
3030
b.id,
@@ -93,15 +93,6 @@ blogs:
9393
}]
9494
```
9595

96-
97-
## Comparison to Related Projects
98-
99-
#### GORM
100-
Carta is NOT an an object-relational mapper(ORM). Read more in [Approach](#Approach)
101-
102-
#### sqlx
103-
Sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author) ie, each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts), requires running many queries or running manual post-processing of the result. Carta handles these complexities automatically.
104-
10596
## Guide
10697

10798
### Column and Field Names
@@ -233,18 +224,14 @@ Other types, such as TIME, will will be converted from plain text in future vers
233224
go get -u github.com/hackafterdark/carta
234225
```
235226

227+
## Design Philosophy
236228

237-
## Important Notes
229+
The `carta` package follows a "fail-fast" philosophy to ensure that mapping operations are unambiguous and to protect users from silent bugs. For a detailed explanation of the error handling approach and the balance between user protection and graceful handling, please see the [Design Philosophies](./DESIGN_PHILOSOPHIES.md) document.
238230

239-
Carta removes any duplicate rows. This is a side effect of the data mapping as it is unclear which object to instantiate if the same data arrives more than once.
240-
If this is not a desired outcome, you should include a uniquely identifiable columns in your query and the corresponding fields in your structs.
241-
242-
To prevent relatively expensive reflect operations, carta caches the structure of your struct using the column mames of your query response as well as the type of your struct.
243-
244-
## Approach
245-
Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)) which is useful among organizations with strict code review processes.
231+
## Important Notes
246232

247-
Carta is not an object-relational mapper(ORM). With large and complex datasets, using ORMs becomes restrictive and reduces performance when working with complex queries.
233+
When mapping to **slices of structs**, Carta removes duplicate entities. This is a side effect of the data mapping process, which merges rows that identify the same entity (e.g., a `Blog` with the same ID appearing in multiple rows due to a `JOIN`). To ensure correct mapping, you should always include uniquely identifiable columns (like a primary key) in your query for each struct entity.
248234

249-
### License
250-
Apache License
235+
When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), every row from the query is treated as a unique element, and **no de-duplication occurs**.
236+
237+
To prevent relatively expensive reflect operations, carta caches the structure of your struct using the column names of your query response as well as the type of your struct.

ai_plans/FIX_DUPLICATE_ROWS.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Plan: Fix Incorrect De-duplication for Basic Slices
2+
3+
## 1. Problem Summary
4+
The `carta` library was incorrectly de-duplicating rows when mapping to a slice of a basic type (e.g., `[]string`). The logic, designed to merge `JOIN`ed rows for slices of structs, was misapplied, causing data loss. This also meant the `m.IsBasic` code path was entirely untested.
5+
6+
The goal was to modify the library to correctly preserve all rows, including duplicates, when mapping to a basic slice, and to add the necessary test coverage.
7+
8+
## 2. Evolution of the Solution
9+
10+
The final solution was reached through an iterative process of implementation and refinement based on code review feedback.
11+
12+
### Initial Implementation
13+
The first version of the fix introduced two key changes:
14+
1. **Position-Based Unique IDs:** In `load.go`, when `m.IsBasic` is true, `loadRow` generates a per-result-set unique ID (e.g., "row-0", "row-1") from the zero-based row index, ensuring every row is treated as unique within that mapping operation.
15+
2. **Single-Column Rule:** In `column.go`, the `allocateColumns` function was updated to enforce a strict rule: if the destination is a basic slice, the SQL query must return **exactly one column**. This prevents ambiguity.
16+
17+
### Refinements from Code Review
18+
Feedback from code review prompted several improvements:
19+
- **Performance:** In `load.go`, `fmt.Sprintf` was replaced with `strconv.Itoa` for generating the position-based unique ID.
20+
- **Idiomatic Go:** Error creation now uses `fmt.Errorf` instead of `errors.New(fmt.Sprintf(...))`.
21+
- **Clearer errors:** The single-column rule error message includes the actual number of columns found.
22+
- **Test coverage:** A negative test was added to `mapper_test.go` to ensure the single-column rule correctly returns an error.
23+
24+
### Final Fix: Handling Nested Basic Mappers
25+
The most critical refinement came from identifying a flaw in the single-column rule: it did not correctly handle **nested** basic slices (e.g., a struct field like `Tags []string`). The initial logic would have incorrectly failed if other columns for the parent struct were present.
26+
27+
The final patch corrected this by making the logic in `allocateColumns` more nuanced:
28+
- **For top-level basic slices** (`len(m.AncestorNames) == 0`), the result set must contain exactly one projected column (as labeled by the driver after alias resolution). Expressions are allowed if aliased to a single column.
29+
- **For nested basic slices**, the function now searches the remaining columns for exactly one that matches the ancestor-qualified name (e.g., `tags`). It returns an error if zero or more than one match is found.
30+
31+
This final change ensures the logic is robust for both top-level and nested use cases.
32+
33+
## 3. Summary of Changes Executed
34+
1. **Modified `load.go`**:
35+
- Updated `loadRow` to accept a `rowCount` parameter and propagate it to nested mappers.
36+
- For `m.IsBasic`, generate a per-row unique ID from `rowCount` to preserve duplicates (applies to nested basics as well).
37+
- Refactored error handling and string formatting based on code review feedback.
38+
2. **Modified `column.go`**:
39+
- Updated `allocateColumns` to differentiate between top-level and nested basic mappers, enforcing the correct single-column matching rule for each.
40+
- Improved the error message to be more descriptive.
41+
3. **Modified `mapper.go`**:
42+
- Corrected the logic in `determineFieldsNames` to properly handle casing in `carta` tags, ensuring ancestor names are generated correctly.
43+
4. **Added Tests to `mapper_test.go`**:
44+
- Top-level `[]string`: verifies duplicates are preserved.
45+
- Top-level `[]string` (negative): multi-column queries produce an error.
46+
- Nested `PostWithTags.Tags []string`: verifies correct column matching and mapping.
47+
- Nested (negative): zero or multiple matching columns produce an error.
48+
5. **Updated Documentation**:
49+
- Updated `README.md` to clarify the difference in de-duplication behavior.
50+
- Created `DESIGN_PHILOSOPHIES.md` to document the "fail-fast" error handling approach.

column.go

Lines changed: 56 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package carta
22

33
import (
44
"database/sql"
5+
"fmt"
56
"sort"
67
"strings"
78
)
@@ -14,19 +15,70 @@ type column struct {
1415
i fieldIndex
1516
}
1617

18+
// allocateColumns maps result set columns into the given Mapper's fields and its sub-mappers.
19+
// It populates m.PresentColumns and m.SortedColumnIndexes, sets AncestorNames on sub-maps,
20+
// and removes claimed entries from the provided columns map.
21+
//
22+
// For a mapper marked IsBasic:
23+
// - Top-level (no ancestors): requires exactly one remaining column overall and binds it.
24+
// - Nested (has ancestors): requires exactly one matching ancestor-qualified column among
25+
// the remaining columns and binds it.
26+
//
27+
// Otherwise it returns an error.
28+
// For non-basic mappers, it matches basic fields by name using getColumnNameCandidates
29+
// (honoring the mapper/sub-map delimiter and ancestor names) and records each matched
30+
// column (including the field index). After collecting direct-field mappings it sorts
31+
// the resulting column indexes for m.SortedColumnIndexes and then recursively allocates
32+
// columns for each sub-map.
33+
//
34+
// The function mutates the Mapper structures and the input columns map. It returns any
35+
// error returned by recursive allocation or an error when the IsBasic column constraint
36+
// is violated.
1737
func allocateColumns(m *Mapper, columns map[string]column) error {
1838
presentColumns := map[string]column{}
1939
if m.IsBasic {
20-
candidates := getColumnNameCandidates("", m.AncestorNames, m.Delimiter)
21-
for cName, c := range columns {
22-
if _, ok := candidates[cName]; ok {
40+
if len(m.AncestorNames) == 0 {
41+
// Top-level basic mapper: must map exactly one column overall
42+
if len(columns) != 1 {
43+
return fmt.Errorf(
44+
"carta: when mapping to a slice of a basic type, "+
45+
"the query must return exactly one column (got %d)",
46+
len(columns),
47+
)
48+
}
49+
for cName, c := range columns {
2350
presentColumns[cName] = column{
2451
typ: c.typ,
2552
name: cName,
2653
columnIndex: c.columnIndex,
2754
}
28-
delete(columns, cName) // dealocate claimed column
55+
delete(columns, cName)
56+
break
57+
}
58+
} else {
59+
// Nested basic mapper: pick exactly one matching ancestor-qualified column
60+
candidates := getColumnNameCandidates("", m.AncestorNames, m.Delimiter)
61+
var matched []string
62+
for cName := range columns {
63+
if candidates[cName] {
64+
matched = append(matched, cName)
65+
}
66+
}
67+
if len(matched) != 1 {
68+
return fmt.Errorf(
69+
"carta: basic sub-mapper for %v expected exactly one matching column "+
70+
"(ancestors %v), got %d matches",
71+
m.Typ, m.AncestorNames, len(matched),
72+
)
73+
}
74+
cName := matched[0]
75+
c := columns[cName]
76+
presentColumns[cName] = column{
77+
typ: c.typ,
78+
name: cName,
79+
columnIndex: c.columnIndex,
2980
}
81+
delete(columns, cName)
3082
}
3183
} else {
3284
for i, field := range m.Fields {

load.go

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ package carta
22

33
import (
44
"database/sql"
5-
"errors"
65
"fmt"
76
"reflect"
7+
"strconv"
88

99
"github.com/hackafterdark/carta/value"
1010
)
@@ -18,16 +18,18 @@ func (m *Mapper) loadRows(rows *sql.Rows, colTyps []*sql.ColumnType) (*resolver,
1818
colTypNames[i] = colTyps[i].DatabaseTypeName()
1919
}
2020
rsv := newResolver()
21+
rowCount := 0
2122
for rows.Next() {
2223
for i := 0; i < len(colTyps); i++ {
2324
row[i] = value.NewCell(colTypNames[i])
2425
}
2526
if err = rows.Scan(row...); err != nil {
2627
return nil, err
2728
}
28-
if err = loadRow(m, row, rsv); err != nil {
29+
if err = loadRow(m, row, rsv, rowCount); err != nil {
2930
return nil, err
3031
}
32+
rowCount++
3133
}
3234
if err := rows.Err(); err != nil {
3335
return nil, err
@@ -55,17 +57,33 @@ func (m *Mapper) loadRows(rows *sql.Rows, colTyps []*sql.ColumnType) (*resolver,
5557
//
5658
// for example, if a blog has many Authors
5759
//
58-
// rows are actually []*Cell, theu are passed here as interface since sql scan requires []interface{}
59-
func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
60+
// loadRow maps a single scanned SQL row into the resolver using the provided Mapper.
61+
//
62+
// It creates or reuses an element in rsv based on a computed unique id:
63+
// - For basic mappers (m.IsBasic) the id is "row-<rowCount>" (ensures per-row identity).
64+
// - For non-basic mappers the id is derived from the row values via getUniqueId.
65+
//
66+
// The function expects row to contain the scanned values as []*value.Cell (passed as []interface{} because sql.Scan requires that shape).
67+
// For each present column it converts the corresponding Cell into the destination field (handling pointers, nullable types, basic primitives, and known struct wrappers such as Time, NullBool, NullString, etc.).
68+
// If a column is NULL, loadRow enforces that the destination is either a pointer or a type listed in value.NullableTypes; otherwise it returns an error.
69+
// After populating the element it initializes per-submap resolvers (if any) and recursively calls loadRow for each non-nil subMap, passing the same rowCount.
70+
//
71+
// Returns an error on conversion failures, attempts to load null into non-nullable destinations, or on any recursive loadRow error.
72+
func loadRow(m *Mapper, row []interface{}, rsv *resolver, rowCount int) error {
6073
var (
6174
err error
6275
dstField reflect.Value // destination field to be set with
6376
cell *value.Cell
6477
elem *element
6578
found bool
79+
uid uniqueValId
6680
)
6781

68-
uid := getUniqueId(row, m)
82+
if m.IsBasic {
83+
uid = uniqueValId("row-" + strconv.Itoa(rowCount))
84+
} else {
85+
uid = getUniqueId(row, m)
86+
}
6987

7088
if elem, found = rsv.elements[uid]; !found {
7189
// unique row mapping found, new object
@@ -103,7 +121,7 @@ func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
103121
if cell.IsNull() {
104122
_, nullable := value.NullableTypes[typ]
105123
if !(isDstPtr || nullable) {
106-
return errors.New(fmt.Sprintf("carta: cannot load null value to type %s for column %s", typ, col.name))
124+
return fmt.Errorf("carta: cannot load null value to type %s for column %s", typ, col.name)
107125
}
108126
// no need to set destination if cell is null
109127
} else {
@@ -216,7 +234,7 @@ func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
216234
if subMap.isNil(row) {
217235
continue
218236
}
219-
if err = loadRow(subMap, row, elem.subMaps[i]); err != nil {
237+
if err = loadRow(subMap, row, elem.subMaps[i], rowCount); err != nil {
220238
return err
221239
}
222240
}

load_test.go

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ func TestLoadRow(t *testing.T) {
4949
}
5050

5151
rsv := newResolver()
52-
err = loadRow(m, row, rsv)
52+
err = loadRow(m, row, rsv, 0)
5353
if err != nil {
5454
t.Fatalf("error loading row: %s", err)
5555
}
@@ -98,7 +98,7 @@ func TestLoadRowNullValue(t *testing.T) {
9898
}
9999

100100
rsv := newResolver()
101-
err = loadRow(m, row, rsv)
101+
err = loadRow(m, row, rsv, 0)
102102
if err != nil {
103103
t.Fatalf("error loading row: %s", err)
104104
}
@@ -167,7 +167,7 @@ func TestLoadRowDataTypes(t *testing.T) {
167167
}
168168

169169
rsv := newResolver()
170-
err = loadRow(m, row, rsv)
170+
err = loadRow(m, row, rsv, 0)
171171
if err != nil {
172172
t.Fatalf("error loading row: %s", err)
173173
}
@@ -225,7 +225,7 @@ func TestLoadRowConversionError(t *testing.T) {
225225
}
226226

227227
rsv := newResolver()
228-
err = loadRow(m, row, rsv)
228+
err = loadRow(m, row, rsv, 0)
229229
if err == nil {
230230
t.Fatalf("expected a conversion error, but got nil")
231231
}
@@ -263,7 +263,7 @@ func TestLoadRowNullToNonNull(t *testing.T) {
263263
}
264264

265265
rsv := newResolver()
266-
err = loadRow(m, row, rsv)
266+
err = loadRow(m, row, rsv, 0)
267267
if err == nil {
268268
t.Fatalf("expected an error when loading a null value to a non-nullable field, but got nil")
269269
}
@@ -297,7 +297,7 @@ func TestLoadRowNullTypes(t *testing.T) {
297297
}
298298

299299
rsv := newResolver()
300-
err = loadRow(m, row, rsv)
300+
err = loadRow(m, row, rsv, 0)
301301
if err != nil {
302302
t.Fatalf("error loading row: %s", err)
303303
}

0 commit comments

Comments
 (0)