|
1 | | -# Problem: Incorrect De-duplication When Mapping to Basic Slices |
2 | | - |
3 | | -## Summary |
4 | | -The `carta` library is designed to de-duplicate entities when mapping SQL rows to slices of structs (e.g., `[]User`). This is achieved by generating a unique ID for each entity based on the content of its primary key columns. This behavior is correct for handling `JOIN`s where a single entity might appear across multiple rows. |
5 | | - |
6 | | -However, this same logic is incorrectly applied when the mapping destination is a slice of a basic type (e.g., `[]string`, `[]int`). In this scenario, rows with duplicate values are treated as the same entity and are de-duplicated, which is incorrect. The desired behavior is to preserve every row from the result set, including duplicates. |
7 | | - |
8 | | -This issue is the root cause for the following problems: |
9 | | -1. The `if m.IsBasic` code path in `load.go` lacks test coverage because no tests exist for mapping to basic slices. |
10 | | -2. Attempts to write such tests lead to infinite loops and incorrect behavior because the column allocation and unique ID generation logic are not designed to handle this case. |
11 | | - |
12 | | -## Proposed Solution |
13 | | -The solution is to create a distinct execution path for "basic mappers" (`m.IsBasic == true`) that ensures every row is treated as a unique element. |
14 | | - |
15 | | -This will be accomplished in two main steps: |
16 | | - |
17 | | -### 1. Fix Column Allocation (`allocateColumns`) |
18 | | -The logic will be modified to enforce a clear rule for basic slices: the source SQL query must return **exactly one column**. |
19 | | - |
20 | | -- If `m.IsBasic` is true, the function will bypass the existing name-matching logic. |
21 | | -- It will validate that only one column is present in the query result. |
22 | | -- This single column will be assigned as the `PresentColumn` for the mapper. |
23 | | -- If more than one column is found, the function will return an error to prevent ambiguity. |
24 | | - |
25 | | -### 2. Fix Unique ID Generation (`loadRow`) |
26 | | -The logic will be modified to generate a unique ID based on the row's position rather than its content. |
27 | | - |
28 | | -- If `m.IsBasic` is true, the call to `getUniqueId(row, m)` will be bypassed. |
29 | | -- A new, position-based unique ID will be generated for each row (e.g., using a simple counter that increments with each row processed). |
30 | | -- This ensures that every row, regardless of its content, is treated as a distinct element to be added to the destination slice. |
31 | | - |
32 | | -This approach preserves the existing, correct behavior for struct mapping while introducing a new, robust path for handling basic slices correctly. |
33 | | - |
34 | | -## Plan |
35 | | -1. **Modify `column.go`**: Update the `allocateColumns` function to implement the single-column rule for basic mappers. |
36 | | -2. **Modify `load.go`**: Update the `loadRow` function to use a position-based counter for unique ID generation when `m.IsBasic` is true. |
37 | | -3. **Add Tests**: Create a new test case in `mapper_test.go` that maps a query result to a slice of a basic type (e.g., `[]string`) to validate the fix and provide coverage for the `m.IsBasic` code path. |
| 1 | +# Plan: Fix Incorrect De-duplication for Basic Slices |
| 2 | + |
| 3 | +## 1. Problem Summary |
| 4 | +The `carta` library was incorrectly de-duplicating rows when mapping to a slice of a basic type (e.g., `[]string`). The logic, designed to merge `JOIN`ed rows for slices of structs, was misapplied, causing data loss. This also meant the `m.IsBasic` code path was entirely untested. |
| 5 | + |
| 6 | +The goal was to modify the library to correctly preserve all rows, including duplicates, when mapping to a basic slice, and to add the necessary test coverage. |
| 7 | + |
| 8 | +## 2. Evolution of the Solution |
| 9 | + |
| 10 | +The final solution was reached through an iterative process of implementation and refinement based on code review feedback. |
| 11 | + |
| 12 | +### Initial Implementation |
| 13 | +The first version of the fix introduced two key changes: |
| 14 | +1. **Position-Based Unique IDs:** In `load.go`, the `loadRow` function was modified. When `m.IsBasic` is true, it now generates a unique ID based on the row's position in the result set (e.g., "row-0", "row-1") instead of its content. This ensures every row is treated as a unique element. |
| 15 | +2. **Single-Column Rule:** In `column.go`, the `allocateColumns` function was updated to enforce a strict rule: if the destination is a basic slice, the SQL query must return **exactly one column**. This prevents ambiguity. |
| 16 | + |
| 17 | +### Refinements from Code Review |
| 18 | +Feedback from a code review (via Coderabbit) prompted several improvements: |
| 19 | +- **Performance:** In `load.go`, `fmt.Sprintf` was replaced with the more performant `strconv.Itoa` for generating the position-based unique ID. |
| 20 | +- **Idiomatic Go:** Error creation was changed from `errors.New(fmt.Sprintf(...))` to the more idiomatic `fmt.Errorf`. |
| 21 | +- **Clearer Errors:** The error message for the single-column rule was improved to include the actual number of columns found, aiding debugging. |
| 22 | +- **Test Coverage:** A negative test case was added to `mapper_test.go` to ensure the single-column rule correctly returns an error. |
| 23 | + |
| 24 | +### Final Fix: Handling Nested Basic Mappers |
| 25 | +The most critical refinement came from identifying a flaw in the single-column rule: it did not correctly handle **nested** basic slices (e.g., a struct field like `Tags []string`). The initial logic would have incorrectly failed if other columns for the parent struct were present. |
| 26 | + |
| 27 | +The final patch corrected this by making the logic in `allocateColumns` more nuanced: |
| 28 | +- **For top-level basic slices** (`len(m.AncestorNames) == 0`), the query must still contain exactly one column overall. |
| 29 | +- **For nested basic slices**, the function now searches the remaining columns for exactly one that matches the ancestor-qualified name (e.g., `tags`). It returns an error if zero or more than one match is found. |
| 30 | + |
| 31 | +This final change ensures the logic is robust for both top-level and nested use cases. |
| 32 | + |
| 33 | +## 3. Summary of Changes Executed |
| 34 | +1. **Modified `load.go`**: |
| 35 | + - Updated `loadRow` to accept a `rowCount` parameter. |
| 36 | + - Implemented logic to generate a unique ID from `rowCount` when `m.IsBasic` is true. |
| 37 | + - Refactored error handling and string formatting based on code review feedback. |
| 38 | +2. **Modified `column.go`**: |
| 39 | + - Updated `allocateColumns` to differentiate between top-level and nested basic mappers, enforcing the correct single-column matching rule for each. |
| 40 | + - Improved the error message to be more descriptive. |
| 41 | +3. **Modified `mapper.go`**: |
| 42 | + - Corrected the logic in `determineFieldsNames` to properly handle casing in `carta` tags, ensuring ancestor names are generated correctly. |
| 43 | +4. **Added Tests to `mapper_test.go`**: |
| 44 | + - Added a test for a top-level basic slice (`[]string`) to verify that duplicates are preserved. |
| 45 | + - Added a negative test to ensure an error is returned for a multi-column query to a top-level basic slice. |
| 46 | + - Added a test for a nested basic slice (`PostWithTags.Tags []string`) to verify correct mapping. |
| 47 | + - Added negative tests to ensure errors are returned for nested basic slices with zero or multiple matching columns. |
| 48 | +5. **Updated Documentation**: |
| 49 | + - Updated `README.md` to clarify the difference in de-duplication behavior. |
| 50 | + - Created `DESIGN_PHILOSOPHIES.md` to document the "fail-fast" error handling approach. |
0 commit comments