Skip to content

Complete schema-based detection: Remove remaining hardcoded field lists in json.go #61

@richardwooding

Description

@richardwooding

Summary

Follow-up to #20 and #59: PR #60 addressed the main JSON field detection in shouldUseJSONPath() by using schema information, but multiple hardcoded lists remain throughout json.go. These create the same maintainability, security, and scalability issues as the original problem.

Remaining Hardcoded Lists

1. Array Field Detection (isJSONArrayField() - lines 219-252)

jsonArrayFields := map[string][]string{
    "json_users":         {"tags", "scores", "attributes"},
    "json_products":      {"features", "reviews", "categories"},
    "users":              {"preferences", "profile"},
    "products":           {"metadata", "details"},
    "information_assets": {"metadata", "properties", "classification", "content_structure"},
    "documents":          {"content", "structure", "taxonomy", "analytics"},
}

arrayFieldNames := []string{"tags", "permissions", "features", "categories", "scores", "attributes"}

2. JSONB vs JSON Detection (isJSONBField() - lines 264-304)

jsonbFields := map[string][]string{
    "json_users":         {"settings", "tags", "scores"},
    "json_products":      {"features", "reviews", "properties"},
    "information_assets": {"metadata", "classification"},
    "documents":          {"content", "taxonomy"},
}

jsonbParentFields := []string{"settings", "properties"}

3. Array Element Type (getJSONArrayFunction() - lines 307-352)

simpleArrayFields := []string{"tags", "scores", "categories"}
complexArrayFields := []string{"attributes", "features", "reviews"}

4. Numeric JSON Fields (isNumericJSONField() - lines 110-121)

numericFields := []string{"level", "score", "value", "count", "amount", "price", 
                         "rating", "age", "size", "capacity", "megapixels", "cores", 
                         "threads", "ram", "storage", "vram", "weight", "frequency", "helpful"}

5. JSON Object Variables (isJSONObjectFieldAccess() - lines 189-207)

jsonObjectVars := []string{"attr", "item", "element", "obj", "feature", "review"}

6. Numeric Iteration Variables (needsNumericCasting() - lines 96-108)

numericIterationVars := []string{"score", "value", "num", "amount", "count", "level"}

7. Fallback in shouldUseJSONPath() (lines 40-47)

jsonFields := []string{"preferences", "metadata", "profile", "details", "settings", 
                      "properties", "analytics", "content", "structure", "taxonomy", 
                      "classification", "content_structure"}

8. hasJSONFieldInChain() (lines 66-72)

jsonFields := []string{"preferences", "metadata", "profile", "details", "settings", 
                      "properties", "analytics", "content", "structure", "taxonomy", 
                      "classification", "content_structure"}

Problems

  1. User schemas not supported - Users with fields like "config", "options", "data" won't get correct JSON handling
  2. Inconsistency - Lists differ between functions (some have "content_structure", others don't)
  3. Maintenance burden - Adding a new JSON column type requires updating 8+ locations
  4. Security risk - Fields not in hardcoded lists may be treated incorrectly
  5. Test-specific hardcoding - Maps like jsonArrayFields contain test table names

Proposed Solution

Extend pg.FieldSchema with complete type information:

type FieldSchema struct {
    Name        string
    Type        string        // PostgreSQL type name
    Repeated    bool          // true for arrays
    Schema      []FieldSchema // for composite types
    IsJSON      bool          // true for json/jsonb types (added in #60)
    
    // NEW: Additional type information
    IsJSONB     bool          // distinguish JSONB from JSON
    ElementType string        // for arrays: type of array elements
    IsNumeric   bool          // for JSON fields that contain numeric data
}

Alternative: Query-time schema lookup

func (con *converter) getFieldInfo(tableName, fieldName string) *FieldInfo {
    if con.schemas == nil {
        return nil // fall back to hardcoded
    }
    
    schema, ok := con.schemas[tableName]
    if !ok {
        return nil
    }
    
    for _, field := range schema {
        if field.Name == fieldName {
            return &FieldInfo{
                IsJSON:      field.IsJSON,
                IsJSONB:     field.Type == "jsonb",
                IsArray:     field.Repeated,
                ElementType: field.ElementType,
                IsNumeric:   field.IsNumeric,
            }
        }
    }
    
    return nil
}

Update all detection functions to use schema:

  1. isJSONArrayField() → check field.Repeated && field.IsJSON
  2. isJSONBField() → check field.Type == "jsonb"
  3. getJSONArrayFunction() → determine from field.ElementType
  4. isNumericJSONField() → check field.IsNumeric or infer from ElementType

Benefits

  • ✅ Works with any user schema without code changes
  • ✅ Single source of truth for field type information
  • ✅ No maintenance burden when adding new fields
  • ✅ Correct behavior guaranteed by schema
  • ✅ Removes all hardcoded lists from json.go

Migration Path

  1. Keep hardcoded lists as fallback when schemas == nil (backward compatibility)
  2. Extend FieldSchema with new fields
  3. Update LoadTableSchema() to populate new fields
  4. Update detection functions to check schema first, fall back to hardcoded
  5. Deprecation notice for users not passing schemas
  6. Eventually remove hardcoded lists (breaking change, major version bump)

Impact

Critical - Same as #20:

  • Affects correctness and maintainability
  • Users cannot use library with custom JSON column names
  • Multiple inconsistent lists throughout codebase

Related

Acceptance Criteria

  • All hardcoded field lists removed from json.go
  • FieldSchema extended with complete type information
  • All detection functions use schema information
  • Fallback to hardcoded lists when schemas not provided (backward compatibility)
  • Tests pass with diverse schema configurations
  • Documentation updated with schema-based approach
  • Examples show how to properly configure schemas

Note: This completes the work started in #20 and #59. PR #60 fixed the most critical path (shouldUseJSONPath), but 7+ other hardcoded lists remain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions