Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
305 changes: 305 additions & 0 deletions usage-rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
# Ecto Code Generation Rules for LLM

## CRITICAL RULES - ALWAYS FOLLOW

### SECURITY CRITICAL - NEVER VIOLATE
- **NEVER** cast parameters without explicit field listing: `cast(changeset, params, [:field1, :field2])` - NOT `cast(changeset, params, Map.keys(params))`
- **NEVER** interpolate user input directly into queries - always use `^` for parameter binding
- **ALWAYS** use `redact: true` on password fields and sensitive data in schemas
- **ALWAYS** validate and cast external input through changesets before database operations

### DATA INTEGRITY CRITICAL
- **ALWAYS** pair `unsafe_validate_unique` with `unique_constraint` - validations alone cannot prevent race conditions
- **ALWAYS** use database constraints (`unique_constraint`, `foreign_key_constraint`, `check_constraint`) as the source of truth
- **NEVER** implement "get or create" without proper `on_conflict` handling or constraint-based error recovery
- **ALWAYS** use transactions (`Repo.transaction` or `Ecto.Multi`) for operations that must be atomic

### PERFORMANCE CRITICAL
- **ALWAYS** preload associations that will be accessed in loops to prevent N+1 queries
- **PREFER** `insert_all`, `update_all`, `delete_all` for bulk operations over iterating individual operations
- **ALWAYS** add indexes for foreign keys: `create index(:table, [:foreign_key_id])`
- **USE** `on_conflict` for upserts instead of separate get/insert operations

## SCHEMA DEFINITION RULES

### Primary Schema Structure
```elixir
schema "table_name" do
field :name, :string # Use appropriate Ecto type
field :count, :integer, default: 0 # Specify defaults when needed
field :metadata, :map # Use :map for JSON/JSONB
field :tags, {:array, :string} # Arrays with element type
field :status, Ecto.Enum, values: [:draft, :published] # Use Ecto.Enum for constrained values
field :computed, :string, virtual: true # Virtual fields don't persist
field :password, :string, redact: true # Redact sensitive fields

belongs_to :user, User # Foreign key: user_id
has_many :posts, Post # One-to-many
has_one :profile, Profile # One-to-one
many_to_many :tags, Tag, join_through: "posts_tags" # Many-to-many

timestamps() # ALWAYS include for created_at/updated_at
end
```

### UUID Primary Keys (when needed)
```elixir
@primary_key {:id, :binary_id, autogenerate: true}
@foreign_key_type :binary_id
```

### Embedded Schemas Pattern
```elixir
# USE embedded schemas for: tightly coupled data, value objects, form data
embeds_one :address, Address
embeds_many :line_items, LineItem

# SEPARATE embedded schema module:
defmodule Address do
use Ecto.Schema
embedded_schema do # Note: embedded_schema, not schema
field :street, :string
field :city, :string
end
end
```

## CHANGESET PATTERNS - STRICT ORDERING

### Standard Changeset Pipeline
```elixir
def changeset(struct, params) do
struct
|> cast(params, [:field1, :field2]) # 1. ALWAYS list fields explicitly
|> validate_required([:field1]) # 2. Required fields
|> validate_length(:field1, min: 3) # 3. Format validations
|> validate_format(:email, ~r/@/) # 4. More format validations
|> unsafe_validate_unique(:email, Repo) # 5. DB validations (optional optimization)
|> unique_constraint(:email) # 6. CRITICAL: Always add constraint
|> foreign_key_constraint(:user_id) # 7. Referential integrity
end
```

### Multiple Changesets for Different Operations
```elixir
# PREFER separate changesets over conditional logic
def registration_changeset(user, params) do
user
|> cast(params, [:email, :password])
|> validate_required([:email, :password])
|> unique_constraint(:email)
end

def update_changeset(user, params) do
user
|> cast(params, [:name, :bio])
|> validate_length(:bio, max: 500)
end
```

## QUERY PATTERNS

### Query Composition Rules
```elixir
# BUILD queries incrementally
def base_query, do: from(p in Post, where: p.published == true)
def by_author(query, author_id), do: from(q in query, where: q.author_id == ^author_id)
def recent(query), do: from(q in query, order_by: [desc: q.inserted_at])

# COMPOSE queries
Post |> base_query() |> by_author(123) |> recent() |> Repo.all()
```

### Dynamic Query Building
```elixir
# USE dynamic for complex conditional queries
def filter(query, params) do
dynamic = true

dynamic =
if params[:name] do
dynamic([p], ^dynamic and p.name == ^params[:name])
else
dynamic
end

from(q in query, where: ^dynamic)
end
```

### Preloading Strategies
```elixir
# PREFER query-level preloads
Repo.all(from p in Post, preload: [:author, comments: :user])

# USE Repo.preload for existing structs
posts |> Repo.preload([:author, :comments])

# AVOID nested preloads unless necessary
# BAD: preload: [author: [posts: :comments]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily bad, even if you have N posts this will only be 3 queries. It's not ideal to do if you don't need those relations.

```

## MIGRATION PATTERNS

### Migration Structure
```elixir
def change do
create table(:posts) do
add :title, :string, null: false # Specify constraints
add :body, :text # Use :text for long strings
add :view_count, :integer, default: 0
add :user_id, references(:users, on_delete: :delete_all), null: false

timestamps()
end

create index(:posts, [:user_id]) # ALWAYS index foreign keys
create unique_index(:posts, [:slug]) # Unique constraints need indexes
end
```

### Safe Production Migrations
```elixir
# USE concurrent index creation for large tables
create index(:posts, [:created_at], concurrently: true)

# SEPARATE structure and data migrations
# Migration 1: Add column with default
# Migration 2: Backfill data
# Migration 3: Add not null constraint
```

## REPOSITORY OPERATIONS

### Insert/Update Patterns
```elixir
# PATTERN: Handle success and failure
case Repo.insert(changeset) do
{:ok, struct} -> # success path
{:error, changeset} -> # handle errors
end

# USE bang functions only when failure is unexpected
user = Repo.insert!(changeset) # Will raise on error
```

### Upsert Patterns
```elixir
# ALWAYS specify conflict_target and on_conflict
Repo.insert(changeset,
on_conflict: [set: [updated_at: DateTime.utc_now()]],
conflict_target: [:email],
returning: true
)

# BULK upserts
Repo.insert_all(Post, posts,
on_conflict: :replace_all,
conflict_target: [:external_id]
)
```

## TESTING PATTERNS

### Factory Pattern
```elixir
def user_factory(attrs \\ %{}) do
%User{
email: "user#{System.unique_integer()}@example.com", # ENSURE uniqueness
name: "Test User"
}
|> Map.merge(attrs)
end

def insert_user(attrs \\ %{}) do
user_factory(attrs) |> Repo.insert!()
end
```

### SQL Sandbox Setup
```elixir
setup do
:ok = Ecto.Adapters.SQL.Sandbox.checkout(Repo)

# For async tests with other processes
Ecto.Adapters.SQL.Sandbox.mode(Repo, {:shared, self()})
end
```

## MULTI-TENANCY PATTERNS

### Foreign Key Multi-tenancy
```elixir
schema "posts" do
field :org_id, :integer # Tenant identifier
field :title, :string
belongs_to :user, User,
foreign_key: :user_id,
references: :id,
with: [org_id: :org_id] # Composite foreign key
end

# ALWAYS scope queries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one will bump into the newish phoenix scoping stuff.

def posts_for_org(org_id) do
from(p in Post, where: p.org_id == ^org_id)
end
```

### Query Prefix Multi-tenancy
```elixir
# USE for schema-based isolation
Repo.all(Post, prefix: "tenant_#{tenant_id}")
Repo.insert(changeset, prefix: tenant_prefix)
```

## ERROR HANDLING PATTERNS

### Changeset Error Formatting
```elixir
# TRANSFORM errors for API/UI consumption
def format_errors(changeset) do
Ecto.Changeset.traverse_errors(changeset, fn {msg, opts} ->
Enum.reduce(opts, msg, fn {key, value}, acc ->
String.replace(acc, "%{#{key}}", to_string(value))
end)
end)
end
```

## TRANSACTION PATTERNS

### Ecto.Multi for Complex Operations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one might be better served with the newer Repo.transact

```elixir
Multi.new()
|> Multi.insert(:user, user_changeset)
|> Multi.run(:profile, fn repo, %{user: user} ->
# Dependent operation
repo.insert(%Profile{user_id: user.id})
end)
|> Repo.transaction()
|> case do
{:ok, %{user: user, profile: profile}} -> # Success
{:error, failed_step, changeset, _} -> # Handle failure
end
```

## PRIORITY INDICATORS FOR CODE GENERATION

When generating Ecto code:
1. **SECURITY FIRST**: Never compromise on parameter casting, always use changesets
2. **DATA INTEGRITY SECOND**: Always use database constraints, handle race conditions
3. **PERFORMANCE THIRD**: Prevent N+1 queries, use bulk operations when possible
4. **CLARITY FOURTH**: Separate concerns, use multiple changesets, compose queries

## COMMON LLM MISTAKES TO AVOID
Copy link

@jeregrine jeregrine Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common Ecto LLM mistakes I see:

  • unnecessary validations, ex: Ecto.Enum and validates_includes
  • extra indexes all over with no query or usage pattern to justify
  • LLM's changing migrations/schemas to fix related tests(lolsob)
  • way over testing schema definitions (ie test to verify defaults, test to see of Ecto.Association is set on specific fields in the schema metadata, tests to verify repo.transaction works etc)
  • using varchar instead of text
  • putting size limits on text columns that was absurdly small via validations (based on nothing but vibes AFAICT)
  • String.to_atom to change a Ecto.Enum into atom before cast


- DO NOT generate `cast(params, Map.keys(params))` - this is a critical security flaw
- DO NOT forget `timestamps()` in schemas
- DO NOT use `validate_unique` without `unique_constraint`
- DO NOT query in loops without preloading
- DO NOT mix embedded schemas and associations incorrectly
- DO NOT forget to index foreign keys
- DO NOT use string interpolation in queries - always use `^` for binding
- DO NOT implement get-or-create without proper race condition handling
- DO NOT use a single changeset for all operations - separate by use case
- DO NOT ignore the return values of Repo operations - handle both success and error cases