Merge pull request #1 from hackafterdark/fix-duplicate-rows

tmaiaroto · web-flow · commit b676bb0b8268 · 2025-08-16T11:29:44.000-07:00
adjustments made to getting unique row id
diff --git a/DESIGN_PHILOSOPHIES.md b/DESIGN_PHILOSOPHIES.md
@@ -0,0 +1,47 @@
+# Design Philosophies
+
+This document includes information around design philosophies and decisions made to help document and illustrate scenarios one may encounter when using this package.
+
+## Approach
+Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)) which is useful among organizations with strict code review processes.
+
+## Comparison to Related Projects
+
+#### GORM
+Carta is NOT an object-relational mapper (ORM).
+
+#### sqlx
+Sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author) ie, each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts), requires  running many queries or running manual post-processing of the result. Carta handles these complexities automatically.
+
+## Protection vs. Graceful Handling
+
+A core design principle of the `carta` mapper is to prioritize **user protection and clarity** over attempting a "graceful" but potentially incorrect guess. The library's guiding philosophy is to only proceed if the user's intent is perfectly clear. If there is any ambiguity in the mapping operation, `carta` will **fail fast** by returning an error, forcing the developer to be more explicit.
+
+Making a guess might seem helpful, but it can hide serious, silent bugs. The following scenarios illustrate the balance between failing on ambiguous operations (Protection) and handling well-defined transformations (Graceful Handling).
+
+---
+
+### Scenario 1: Multi-column Query to a Basic Slice (Protection)
+
+-   **Query:** `SELECT name, email FROM users`
+-   **Destination:** `var data []string`
+-   **Behavior:** `carta.Map` **returns an error immediately**: `carta: when mapping to a slice of a basic type, the query must return exactly one column (got 2)`.
+-   **Why this is Protection:** The library has no way of knowing if the user intended to map the `name` or the `email` column. A "graceful" solution might be to pick the first column arbitrarily, but this could lead to the wrong data being silently loaded into the slice. By failing fast, `carta` forces the developer to write an unambiguous query (e.g., `SELECT name FROM users`), ensuring the result is guaranteed to be correct.
+
+---
+
+### Scenario 2: SQL `NULL` to a Non-nullable Go Field (Protection)
+
+-   **Query:** `SELECT id, NULL AS name FROM users`
+-   **Destination:** `var users []User` (where `User.Name` is a `string`)
+-   **Behavior:** `carta.Map` **returns an error during scanning** (e.g., `carta: cannot load NULL into non-nullable type string for column name`).
+-   **Why this is Protection:** A standard Go `string` cannot represent a `NULL` value. A "graceful" but incorrect solution would be to use the zero value (`""`), which is valid data and semantically different from "no data". This can cause subtle bugs in application logic. By failing, `carta` forces the developer to explicitly handle nullability in their Go struct by using a pointer (`*string`) or a nullable type (`sql.NullString`), making the code more robust and correct.
+
+---
+
+### Scenario 3: Merging `JOIN`ed Rows into Structs (Graceful Handling)
+
+-   **Query:** `SELECT b.id, p.id FROM blogs b JOIN posts p ON b.id = p.blog_id`
+-   **Destination:** `var blogs []BlogWithPosts`
+-   **Behavior:** `carta` **gracefully handles** the fact that the same blog ID appears in multiple rows. It creates one `Blog` object and appends each unique `Post` to its `Posts` slice.
+-   **Why this is Graceful:** This is the core purpose of the library. There is no ambiguity. The library uses the unique ID of the `Blog` (the `b.id` column) to understand that these rows all describe the same parent entity. This is a well-defined transformation, not a guess.
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # Carta
 [![codecov](https://codecov.io/github/hackafterdark/carta/graph/badge.svg?token=TYvbPGGlcL)](https://codecov.io/github/hackafterdark/carta)
 
-Dead simple SQL data mapper for complex Go structs. 
+A simple SQL data mapper for complex Go structs. Load SQL data onto Go structs while keeping track of has-one and has-many relationships.
 
-Load SQL data onto Go structs while keeping track of has-one and has-many relationships
+Carta is not an object-relational mapper (ORM). With large and complex datasets, using ORMs becomes restrictive and reduces performance when working with complex queries. [Read more about the design philosophy.](#design-philosophy)
 
 ## Examples 
 Using carta is very simple. All you need to do is: 
@@ -20,11 +20,11 @@ blogs := []Blog{}
 carta.Map(rows, &blogs)
 ```
 
-Assume that in above exmple, we are using a schema containing has-one and has-many relationships:
+Assume that in the above example, we are using a schema containing has-one and has-many relationships:
 
 ![schema](https://i.ibb.co/SPH3zhQ/Schema.png)
 
-And here is our SQL query along with the corresponging Go struct:
+And here is our SQL query along with the corresponding Go struct:
 ```
 select
        b.id,
@@ -93,15 +93,6 @@ blogs:
 }]
 ```
 
-
-## Comparison to Related Projects
-
-#### GORM
-Carta is NOT an an object-relational mapper(ORM). Read more in [Approach](#Approach)
-
-#### sqlx
-Sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author) ie, each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts), requires  running many queries or running manual post-processing of the result. Carta handles these complexities automatically.
-
 ## Guide
 
 ### Column and Field Names
@@ -233,18 +224,14 @@ Other types, such as TIME, will will be converted from plain text in future vers
 go get -u github.com/hackafterdark/carta
 ```
 
+## Design Philosophy
 
-## Important Notes 
+The `carta` package follows a "fail-fast" philosophy to ensure that mapping operations are unambiguous and to protect users from silent bugs. For a detailed explanation of the error handling approach and the balance between user protection and graceful handling, please see the [Design Philosophies](./DESIGN_PHILOSOPHIES.md) document.
 
-Carta removes any duplicate rows. This is a side effect of the data mapping as it is unclear which object to instantiate if the same data arrives more than once.
-If this is not a desired outcome, you should include a uniquely identifiable columns in your query and the corresponding fields in your structs.
- 
-To prevent relatively expensive reflect operations, carta caches the structure of your struct using the column mames of your query response as well as the type of your struct. 
-
-## Approach
-Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)) which is useful among organizations with strict code review processes.
+## Important Notes
 
-Carta is not an object-relational mapper(ORM). With large and complex datasets, using ORMs becomes restrictive and reduces performance when working with complex queries. 
+When mapping to **slices of structs**, Carta removes duplicate entities. This is a side effect of the data mapping process, which merges rows that identify the same entity (e.g., a `Blog` with the same ID appearing in multiple rows due to a `JOIN`). To ensure correct mapping, you should always include uniquely identifiable columns (like a primary key) in your query for each struct entity.
 
-### License
-Apache License
+When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), every row from the query is treated as a unique element, and **no de-duplication occurs**.
+ 
+To prevent relatively expensive reflect operations, carta caches the structure of your struct using the column names of your query response as well as the type of your struct.
diff --git a/ai_plans/FIX_DUPLICATE_ROWS.md b/ai_plans/FIX_DUPLICATE_ROWS.md
@@ -0,0 +1,50 @@
+# Plan: Fix Incorrect De-duplication for Basic Slices
+
+## 1. Problem Summary
+The `carta` library was incorrectly de-duplicating rows when mapping to a slice of a basic type (e.g., `[]string`). The logic, designed to merge `JOIN`ed rows for slices of structs, was misapplied, causing data loss. This also meant the `m.IsBasic` code path was entirely untested.
+
+The goal was to modify the library to correctly preserve all rows, including duplicates, when mapping to a basic slice, and to add the necessary test coverage.
+
+## 2. Evolution of the Solution
+
+The final solution was reached through an iterative process of implementation and refinement based on code review feedback.
+
+### Initial Implementation
+The first version of the fix introduced two key changes:
+1.  **Position-Based Unique IDs:** In `load.go`, when `m.IsBasic` is true, `loadRow` generates a per-result-set unique ID (e.g., "row-0", "row-1") from the zero-based row index, ensuring every row is treated as unique within that mapping operation.
+2.  **Single-Column Rule:** In `column.go`, the `allocateColumns` function was updated to enforce a strict rule: if the destination is a basic slice, the SQL query must return **exactly one column**. This prevents ambiguity.
+
+### Refinements from Code Review
+Feedback from code review prompted several improvements:
+-   **Performance:** In `load.go`, `fmt.Sprintf` was replaced with `strconv.Itoa` for generating the position-based unique ID.
+-   **Idiomatic Go:** Error creation now uses `fmt.Errorf` instead of `errors.New(fmt.Sprintf(...))`.
+-   **Clearer errors:** The single-column rule error message includes the actual number of columns found.
+-   **Test coverage:** A negative test was added to `mapper_test.go` to ensure the single-column rule correctly returns an error.
+
+### Final Fix: Handling Nested Basic Mappers
+The most critical refinement came from identifying a flaw in the single-column rule: it did not correctly handle **nested** basic slices (e.g., a struct field like `Tags []string`). The initial logic would have incorrectly failed if other columns for the parent struct were present.
+
+The final patch corrected this by making the logic in `allocateColumns` more nuanced:
+-   **For top-level basic slices** (`len(m.AncestorNames) == 0`), the result set must contain exactly one projected column (as labeled by the driver after alias resolution). Expressions are allowed if aliased to a single column.
+-   **For nested basic slices**, the function now searches the remaining columns for exactly one that matches the ancestor-qualified name (e.g., `tags`). It returns an error if zero or more than one match is found.
+
+This final change ensures the logic is robust for both top-level and nested use cases.
+
+## 3. Summary of Changes Executed
+1.  **Modified `load.go`**:
+    -   Updated `loadRow` to accept a `rowCount` parameter and propagate it to nested mappers.
+    -   For `m.IsBasic`, generate a per-row unique ID from `rowCount` to preserve duplicates (applies to nested basics as well).
+    -   Refactored error handling and string formatting based on code review feedback.
+2.  **Modified `column.go`**:
+    -   Updated `allocateColumns` to differentiate between top-level and nested basic mappers, enforcing the correct single-column matching rule for each.
+    -   Improved the error message to be more descriptive.
+3.  **Modified `mapper.go`**:
+    -   Corrected the logic in `determineFieldsNames` to properly handle casing in `carta` tags, ensuring ancestor names are generated correctly.
+4.  **Added Tests to `mapper_test.go`**:
+    -   Top-level `[]string`: verifies duplicates are preserved.
+    -   Top-level `[]string` (negative): multi-column queries produce an error.
+    -   Nested `PostWithTags.Tags []string`: verifies correct column matching and mapping.
+    -   Nested (negative): zero or multiple matching columns produce an error.
+5.  **Updated Documentation**:
+    -   Updated `README.md` to clarify the difference in de-duplication behavior.
+    -   Created `DESIGN_PHILOSOPHIES.md` to document the "fail-fast" error handling approach.
diff --git a/column.go b/column.go
@@ -2,6 +2,7 @@ package carta
 
 import (
 	"database/sql"
+	"fmt"
 	"sort"
 	"strings"
 )
@@ -14,19 +15,70 @@ type column struct {
 	i           fieldIndex
 }
 
+// allocateColumns maps result set columns into the given Mapper's fields and its sub-mappers.
+// It populates m.PresentColumns and m.SortedColumnIndexes, sets AncestorNames on sub-maps,
+// and removes claimed entries from the provided columns map.
+//
+// For a mapper marked IsBasic:
+//   - Top-level (no ancestors): requires exactly one remaining column overall and binds it.
+//   - Nested (has ancestors): requires exactly one matching ancestor-qualified column among
+//     the remaining columns and binds it.
+//
+// Otherwise it returns an error.
+// For non-basic mappers, it matches basic fields by name using getColumnNameCandidates
+// (honoring the mapper/sub-map delimiter and ancestor names) and records each matched
+// column (including the field index). After collecting direct-field mappings it sorts
+// the resulting column indexes for m.SortedColumnIndexes and then recursively allocates
+// columns for each sub-map.
+//
+// The function mutates the Mapper structures and the input columns map. It returns any
+// error returned by recursive allocation or an error when the IsBasic column constraint
+// is violated.
 func allocateColumns(m *Mapper, columns map[string]column) error {
 	presentColumns := map[string]column{}
 	if m.IsBasic {
-		candidates := getColumnNameCandidates("", m.AncestorNames, m.Delimiter)
-		for cName, c := range columns {
-			if _, ok := candidates[cName]; ok {
+		if len(m.AncestorNames) == 0 {
+			// Top-level basic mapper: must map exactly one column overall
+			if len(columns) != 1 {
+				return fmt.Errorf(
+					"carta: when mapping to a slice of a basic type, "+
+						"the query must return exactly one column (got %d)",
+					len(columns),
+				)
+			}
+			for cName, c := range columns {
 				presentColumns[cName] = column{
 					typ:         c.typ,
 					name:        cName,
 					columnIndex: c.columnIndex,
 				}
-				delete(columns, cName) // dealocate claimed column
+				delete(columns, cName)
+				break
+			}
+		} else {
+			// Nested basic mapper: pick exactly one matching ancestor-qualified column
+			candidates := getColumnNameCandidates("", m.AncestorNames, m.Delimiter)
+			var matched []string
+			for cName := range columns {
+				if candidates[cName] {
+					matched = append(matched, cName)
+				}
+			}
+			if len(matched) != 1 {
+				return fmt.Errorf(
+					"carta: basic sub-mapper for %v expected exactly one matching column "+
+						"(ancestors %v), got %d matches",
+					m.Typ, m.AncestorNames, len(matched),
+				)
+			}
+			cName := matched[0]
+			c := columns[cName]
+			presentColumns[cName] = column{
+				typ:         c.typ,
+				name:        cName,
+				columnIndex: c.columnIndex,
 			}
+			delete(columns, cName)
 		}
 	} else {
 		for i, field := range m.Fields {
diff --git a/load.go b/load.go
@@ -2,9 +2,9 @@ package carta
 
 import (
 	"database/sql"
-	"errors"
 	"fmt"
 	"reflect"
+	"strconv"
 
 	"github.com/hackafterdark/carta/value"
 )
@@ -18,16 +18,18 @@ func (m *Mapper) loadRows(rows *sql.Rows, colTyps []*sql.ColumnType) (*resolver,
 		colTypNames[i] = colTyps[i].DatabaseTypeName()
 	}
 	rsv := newResolver()
+	rowCount := 0
 	for rows.Next() {
 		for i := 0; i < len(colTyps); i++ {
 			row[i] = value.NewCell(colTypNames[i])
 		}
 		if err = rows.Scan(row...); err != nil {
 			return nil, err
 		}
-		if err = loadRow(m, row, rsv); err != nil {
+		if err = loadRow(m, row, rsv, rowCount); err != nil {
 			return nil, err
 		}
+		rowCount++
 	}
 	if err := rows.Err(); err != nil {
 		return nil, err
@@ -55,17 +57,33 @@ func (m *Mapper) loadRows(rows *sql.Rows, colTyps []*sql.ColumnType) (*resolver,
 //
 //	for example, if a blog has many Authors
 //
-// rows are actually []*Cell, theu are passed here as interface since sql scan requires []interface{}
-func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
+// loadRow maps a single scanned SQL row into the resolver using the provided Mapper.
+//
+// It creates or reuses an element in rsv based on a computed unique id:
+// - For basic mappers (m.IsBasic) the id is "row-<rowCount>" (ensures per-row identity).
+// - For non-basic mappers the id is derived from the row values via getUniqueId.
+//
+// The function expects row to contain the scanned values as []*value.Cell (passed as []interface{} because sql.Scan requires that shape).
+// For each present column it converts the corresponding Cell into the destination field (handling pointers, nullable types, basic primitives, and known struct wrappers such as Time, NullBool, NullString, etc.).
+// If a column is NULL, loadRow enforces that the destination is either a pointer or a type listed in value.NullableTypes; otherwise it returns an error.
+// After populating the element it initializes per-submap resolvers (if any) and recursively calls loadRow for each non-nil subMap, passing the same rowCount.
+//
+// Returns an error on conversion failures, attempts to load null into non-nullable destinations, or on any recursive loadRow error.
+func loadRow(m *Mapper, row []interface{}, rsv *resolver, rowCount int) error {
 	var (
 		err      error
 		dstField reflect.Value // destination field to be set with
 		cell     *value.Cell
 		elem     *element
 		found    bool
+		uid      uniqueValId
 	)
 
-	uid := getUniqueId(row, m)
+	if m.IsBasic {
+		uid = uniqueValId("row-" + strconv.Itoa(rowCount))
+	} else {
+		uid = getUniqueId(row, m)
+	}
 
 	if elem, found = rsv.elements[uid]; !found {
 		// unique row mapping found, new object
@@ -103,7 +121,7 @@ func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
 			if cell.IsNull() {
 				_, nullable := value.NullableTypes[typ]
 				if !(isDstPtr || nullable) {
-					return errors.New(fmt.Sprintf("carta: cannot load null value to type %s for column %s", typ, col.name))
+					return fmt.Errorf("carta: cannot load null value to type %s for column %s", typ, col.name)
 				}
 				// no need to set destination if cell is null
 			} else {
@@ -216,7 +234,7 @@ func loadRow(m *Mapper, row []interface{}, rsv *resolver) error {
 		if subMap.isNil(row) {
 			continue
 		}
-		if err = loadRow(subMap, row, elem.subMaps[i]); err != nil {
+		if err = loadRow(subMap, row, elem.subMaps[i], rowCount); err != nil {
 			return err
 		}
 	}
diff --git a/load_test.go b/load_test.go
@@ -49,7 +49,7 @@ func TestLoadRow(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err != nil {
 		t.Fatalf("error loading row: %s", err)
 	}
@@ -98,7 +98,7 @@ func TestLoadRowNullValue(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err != nil {
 		t.Fatalf("error loading row: %s", err)
 	}
@@ -167,7 +167,7 @@ func TestLoadRowDataTypes(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err != nil {
 		t.Fatalf("error loading row: %s", err)
 	}
@@ -225,7 +225,7 @@ func TestLoadRowConversionError(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err == nil {
 		t.Fatalf("expected a conversion error, but got nil")
 	}
@@ -263,7 +263,7 @@ func TestLoadRowNullToNonNull(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err == nil {
 		t.Fatalf("expected an error when loading a null value to a non-nullable field, but got nil")
 	}
@@ -297,7 +297,7 @@ func TestLoadRowNullTypes(t *testing.T) {
 	}
 
 	rsv := newResolver()
-	err = loadRow(m, row, rsv)
+	err = loadRow(m, row, rsv, 0)
 	if err != nil {
 		t.Fatalf("error loading row: %s", err)
 	}
diff --git a/mapper.go b/mapper.go
diff --git a/mapper_test.go b/mapper_test.go

Original file line number	Diff line number	Diff line change
`@@ -49,7 +49,7 @@ func TestLoadRow(t *testing.T) {`
`49`	`49`	`}`
`50`	`50`
`51`	`51`	`rsv := newResolver()`
`52`		`- err = loadRow(m, row, rsv)`
	`52`	`+ err = loadRow(m, row, rsv, 0)`
`53`	`53`	`if err != nil {`
`54`	`54`	`t.Fatalf("error loading row: %s", err)`
`55`	`55`	`}`
`@@ -98,7 +98,7 @@ func TestLoadRowNullValue(t *testing.T) {`
`98`	`98`	`}`
`99`	`99`
`100`	`100`	`rsv := newResolver()`
`101`		`- err = loadRow(m, row, rsv)`
	`101`	`+ err = loadRow(m, row, rsv, 0)`
`102`	`102`	`if err != nil {`
`103`	`103`	`t.Fatalf("error loading row: %s", err)`
`104`	`104`	`}`
`@@ -167,7 +167,7 @@ func TestLoadRowDataTypes(t *testing.T) {`
`167`	`167`	`}`
`168`	`168`
`169`	`169`	`rsv := newResolver()`
`170`		`- err = loadRow(m, row, rsv)`
	`170`	`+ err = loadRow(m, row, rsv, 0)`
`171`	`171`	`if err != nil {`
`172`	`172`	`t.Fatalf("error loading row: %s", err)`
`173`	`173`	`}`
`@@ -225,7 +225,7 @@ func TestLoadRowConversionError(t *testing.T) {`
`225`	`225`	`}`
`226`	`226`
`227`	`227`	`rsv := newResolver()`
`228`		`- err = loadRow(m, row, rsv)`
	`228`	`+ err = loadRow(m, row, rsv, 0)`
`229`	`229`	`if err == nil {`
`230`	`230`	`t.Fatalf("expected a conversion error, but got nil")`
`231`	`231`	`}`
`@@ -263,7 +263,7 @@ func TestLoadRowNullToNonNull(t *testing.T) {`
`263`	`263`	`}`
`264`	`264`
`265`	`265`	`rsv := newResolver()`
`266`		`- err = loadRow(m, row, rsv)`
	`266`	`+ err = loadRow(m, row, rsv, 0)`
`267`	`267`	`if err == nil {`
`268`	`268`	`t.Fatalf("expected an error when loading a null value to a non-nullable field, but got nil")`
`269`	`269`	`}`
`@@ -297,7 +297,7 @@ func TestLoadRowNullTypes(t *testing.T) {`
`297`	`297`	`}`
`298`	`298`
`299`	`299`	`rsv := newResolver()`
`300`		`- err = loadRow(m, row, rsv)`
	`300`	`+ err = loadRow(m, row, rsv, 0)`
`301`	`301`	`if err != nil {`
`302`	`302`	`t.Fatalf("error loading row: %s", err)`
`303`	`303`	`}`