This document serves as a quick reference for developers working on the sql_saga project, summarizing key concepts from the official PostgreSQL documentation for writing C-language extensions.
Reference: PostgreSQL Documentation: C-Language Functions
- Dynamic Loading: C functions are compiled into shared libraries (
.sofiles) and loaded by the server on demand. Each shared library must contain aPG_MODULE_MAGIC;macro in one of its source files. - Version 1 Calling Convention: This is the standard for C functions.
- Functions must be declared as
Datum funcname(PG_FUNCTION_ARGS). - A
PG_FUNCTION_INFO_V1(funcname);macro call is required for each function. - Arguments are accessed via
PG_GETARG_...()macros. - Results are returned via
PG_RETURN_...()macros.
- Functions must be declared as
- Memory Management:
- Use
palloc()andpfree()instead ofmalloc()andfree(). - Memory allocated with
palloc()is managed within PostgreSQL's memory contexts and is automatically freed at the end of a transaction, preventing leaks. - For data that must persist across multiple function calls (e.g., in aggregates or set-returning functions), use the appropriate long-lived memory context (e.g.,
multi_call_memory_ctx).
- Use
- Data Types:
- Data can be passed by value or by reference.
- Variable-length types (like
textor ranges) have a 4-byte header containing the total length. UseVARHDRSZandSET_VARSIZEmacros to work with them.
- NULL Handling:
- Functions can be declared
STRICTin SQL, which means they are not called if any input is NULL. PostgreSQL handles this automatically. - For non-
STRICTfunctions, usePG_ARGISNULL(arg_index)to check for NULLs andPG_RETURN_NULL()to return a NULL value.
- Functions can be declared
Reference: PostgreSQL Documentation: User-Defined Aggregates
- State Machine: Aggregates operate as a state machine.
stype: The data type of the internal state. For complex states without a SQL equivalent,internalis used.sfunc(State Transition Function): Called for each input row. It takes the current state and the input value(s) and returns the new state.finalfunc(Final Function): Called after all rows are processed. It takes the final state and returns the aggregate result. This is optional.
- State Management in C:
- The
AggCheckCallContextfunction can be used to determine if a C function is being called as part of an aggregate. - This allows for optimizations, such as modifying the state value in-place, which is safe for transition values.
- The second argument to
AggCheckCallContextcan retrieve the memory context for aggregate state values, which is crucial for data that must persist between calls.covers_without_gaps.cuses this for its state management.
- The
finalfunc_extra: This option inCREATE AGGREGATEpasses extra arguments to the final function, corresponding to the aggregate's input arguments. This is useful when the final function needs access to arguments that are constant across the aggregation group.covers_without_gapsuses this to pass thetargetrange to its final function.
- Normal Aggregates and Ordering: For most aggregates, if an
ORDER BYclause is provided within the aggregate call (e.g.,SUM(x ORDER BY y)), PostgreSQL sorts the input rows before feeding them to the transition function. This process is handled by the PostgreSQL executor and is transparent to the aggregate's C functions. covers_without_gapsAnomaly: Thesql_saga.covers_without_gapsaggregate is an exception. It is a normal aggregate, but its internal logic requires that the input ranges are passed to it in sorted order based on their start bounds.- Developer Responsibility: The C code in
covers_without_gaps_transfnexplicitly checks if the current range's start is before the previous one and raises an error if the input is unsorted. - User Responsibility: Consequently, any SQL query using
covers_without_gapsmust include anORDER BYclause on the range column to ensure correctly sorted input. For example:SELECT sql_saga.covers_without_gaps(my_range, target_range ORDER BY my_range) FROM my_table;
- This is a critical implementation detail for both users and developers of
sql_saga. The aggregate does not perform the sorting; it validates that the sorting has been done.
- Developer Responsibility: The C code in
PostgreSQL provides several ways to inspect constraint definitions, and choosing the correct one is critical for robust event triggers that handle DDL changes.
-
pg_constraint.consrc(Deprecated and Removed): This column stored a human-readable text representation of aCHECKconstraint's expression. It was removed in PostgreSQL 12 because it was not reliably updated when referenced objects were renamed, making it unsafe for use in event triggers. -
pg_get_constraintdef(oid)(For Display): This function returns a formatted, human-readable string representing a constraint's definition (e.g.,CHECK ((col1 < col2))). While useful for display, it is not ideal for programmatic logic because the exact formatting can change between PostgreSQL versions, and it may not accurately reflect the state of renamed objects within the same transaction that the trigger is handling.sql_saga'srename_followingtrigger initially used this, leading to brittle string comparisons. -
pg_constraint.conbinandpg_get_expr()(Robust forCHECKconstraints): Theconbincolumn stores the internal, parsed expression tree for aCHECKconstraint. Thepg_get_expr(conbin, conrelid)function decompiles this tree back into a canonical string representation (e.g.,(col1 < col2)). This is the most reliable way to inspect aCHECKconstraint's logic, as it is independent of display formatting.sql_saganow uses this method. -
pg_constraint.conkey(Robust forUNIQUE,PRIMARY KEY,FOREIGN KEY): This column stores an array of the attribute numbers (attnum) that make up the constraint. This is the most robust way to identify these types of constraints, as it is completely independent of column names.sql_sagauses this method to track renamedUNIQUEconstraints. -
EXCLUDEConstraints: These are more complex. While they haveconkey, they also haveconexclop(an array of operator OIDs). Reconstructing the definition programmatically is difficult. For these,sql_sagastill relies on a pattern match (LIKE) againstpg_get_constraintdef(), which is a pragmatic compromise.
A deep understanding of PostgreSQL's data visibility rules for triggers is essential for developing sql_saga. The documentation reveals that there are two distinct models of behavior, and the choice between them has profound consequences for our implementation.
The core distinction lies between regular AFTER triggers and CONSTRAINT TRIGGERs.
Reference: Visibility of Data Changes
A regular row-level AFTER trigger sees the final state of the data after the DML operation is complete. The documentation states: "When a row-level AFTER trigger is fired, all data changes made by the outer command are already complete, and are visible to the invoked trigger function."
The example in the documentation (A Complete Trigger Example) confirms this. When a regular AFTER DELETE trigger runs SELECT count(*), it sees that the row count has been decremented.
Reference: Overview of Trigger Behavior
CONSTRAINT TRIGGERs, which sql_saga uses, are subject to a much stricter visibility rule. The documentation is explicit: "A query started by a constraint trigger will see the state of the database as of the start of the statement that caused the trigger to fire, regardless of the SET CONSTRAINTS mode."
This means that even if a constraint trigger is DEFERRABLE and its execution is postponed until the end of the transaction, any query it runs operates on a data snapshot from the past. It is blind to the changes made by the statement that queued it, and any subsequent statements in the same transaction.
sql_saga's entire design is dictated by the trade-offs between these two models.
-
Why
sql_sagaMUST UseCONSTRAINT TRIGGERs: The primary reason is to get theDEFERRABLEproperty. This allowssql_sagato validate temporal foreign keys at the end of a transaction, permitting multi-statement updates (like anUPDATEfollowed by anINSERTthat "fills the gap") to succeed if the final state is consistent. RegularAFTERtriggers are not deferrable and would fail immediately. -
Consequence 1: The Multi-Statement Update Limitation: By choosing
CONSTRAINT TRIGGERs,sql_sagais locked into the stricter "sees initial state" visibility model. This fully explains why the multi-statementUPDATEs in28_with_exclusion_constraints.sqlfail. The trigger for the firstUPDATEis queued, but its view of the database is frozen. It cannot see the changes from the secondUPDATE. At the end of the transaction, it executes its query against its old snapshot, sees a temporary gap, and correctly (from its perspective) reports a violation. This is a fundamental and documented limitation of PostgreSQL's trigger architecture. -
Consequence 2: The
uk_delete_check_cLogic: This clarifies why theuk_delete_check_cfunction must explicitly exclude theOLDrow from its validation query.- A regular
AFTER DELETEtrigger (Model 1) would see the final state where the row is already gone, and would not need special logic. - However,
uk_delete_check_cis aCONSTRAINT TRIGGER(Model 2). Its query sees the initial state of the table, which includes the row that is about to be deleted. Without excluding theOLDrow, the validation query would incorrectly find that the row still provides coverage, and the check would always pass. The current implementation is therefore correct and necessary.
- A regular
Analysis of PostgreSQL's trigger system reveals important details that directly influence sql_saga's design and limitations.
Reference: PostgreSQL Documentation: Overview of Trigger Behavior
- Definition: A
CONSTRAINT TRIGGERis a special type ofAFTER ROWtrigger whose execution timing can be controlled withSET CONSTRAINTS. They can beDEFERRABLE, allowing their execution to be postponed until the end of the transaction. - Usage in
sql_saga: All ofsql_saga's temporal foreign key triggers are implemented as constraint triggers. This is what allows multi-statement updates to proceed without failing immediately, by deferring the check. - The MVCC Snapshot Rule for Constraint Triggers: There is a subtle but critical distinction between when a trigger executes and what data it sees.
- Execution Time: As described in the
CREATE TRIGGERdocumentation, aDEFERRABLEconstraint trigger's execution can be postponed until the end of the transaction. This is why multi-statement updates don't fail immediately. - Data Visibility: However, the "Overview of Trigger Behavior" chapter provides the crucial context on data visibility: "A query started by a constraint trigger will see the state of the database as of the start of the statement that caused the trigger to fire, regardless of the SET CONSTRAINTS mode."
- Reconciling the Two: This means that although the trigger function runs at commit time, any query it executes operates on a data snapshot from the past—specifically from the beginning of the statement that originally queued it. It is blind to its own statement's changes and any subsequent statements in the transaction. This is the root cause of the validation failures for multi-statement updates seen in
28_with_exclusion_constraints_test.sql.
- Execution Time: As described in the
- Definition: A statement-level trigger fires only once per statement, regardless of the number of rows affected. The optional
REFERENCINGclause allows the trigger to access all modified rows in "transition tables" (e.g.,OLD TABLEandNEW TABLE). - Use Case: This mechanism is the correct way to validate complex changes that occur within a single statement. For example, an
UPDATE ... FROM ...that modifies multiple rows could be validated correctly by a statement-level trigger that examines the final state of all affected rows in theNEW TABLEtransition table. - Limitation for
sql_saga: While powerful, this does not solve the multi-statement transaction problem. A statement-level trigger still fires at the end of each statement, not at the end of the transaction, so it remains blind to changes from subsequent statements within the same transaction block.
The core of the multi-statement update problem lies in the difference between PostgreSQL's built-in DEFERRABLE foreign keys and the user-defined CONSTRAINT TRIGGERs that sql_saga must use.
-
Built-in Foreign Keys: These are deeply integrated into the PostgreSQL executor. When deferred, their final validation can re-check the constraint against the actual, final state of the data at the end of the transaction. They have access to internal state management that is not exposed to user-defined functions. This is why they work as users intuitively expect.
-
User-Defined
CONSTRAINT TRIGGERs: These are a more limited mechanism. As documented, they are bound by a strict MVCC snapshot rule: their queries always see the state of the database as of the start of the statement that queued them, even when execution is deferred to the end of the transaction. They cannot see the final, transaction-complete state. -
The Previous
pl/pgsqlIllusion: The oldpl/pgsqltriggers insql_sagadid not have the same semantics as built-in FKs. They only appeared to work for the multi-statementUPDATEcase because of a bug. For updates that only changed the time period, the validation query failed to exclude theOLDrow from its coverage check. This effectively disabled the validation for that specific scenario, masking the underlying limitation. The correct C implementation fixed this bug, which brought the architectural limitation to light.
From first principles, we have learned that there is a fundamental limitation in PostgreSQL's trigger architecture for validating multi-statement transactions that are only consistent at the very end. Standard DEFERRABLE foreign keys can handle this because they use internal mechanisms not available to user-defined triggers.
The correct way to validate a complex change affecting multiple rows within a single DML statement is with a statement-level trigger.
- An
AFTER UPDATE ... FOR EACH STATEMENTtrigger with aREFERENCING OLD TABLE AS old_rows, NEW TABLE AS new_rowsclause can see the complete "before" and "after" state of all rows modified by the statement. - This trigger could then identify the full set of unique keys that were affected and re-validate all foreign keys that reference them. This would correctly handle complex operations like
UPDATE ... FROM ...orMERGE.
The critical limitation is that statement-level triggers are not deferrable to the end of a transaction. They execute at the end of each statement. In a transaction consisting of an UPDATE followed by an INSERT, the statement-level trigger for the UPDATE would fire before the INSERT occurs, see a temporary gap in coverage, and report a violation.
sql_sagausesCONSTRAINT TRIGGERs because they are the only user-definable trigger type that isDEFERRABLEto the end of the transaction.- However, these triggers are restricted to
FOR EACH ROWand, crucially, operate on a data snapshot from the start of the statement, not the transaction. - This is a pragmatic compromise. It correctly handles the common cases of single-row changes and simple multi-statement transactions (e.g.,
UPDATEthenINSERTto fill a gap), but it cannot support complex multi-statementUPDATEs as demonstrated in28_with_exclusion_constraints.sql.
Therefore, the decision to comment out these tests is correct. The architectural pattern they represent is fundamentally incompatible with the validation mechanisms available to user-defined triggers in PostgreSQL.
The correct and most robust way to perform complex temporal modifications is to encapsulate the logic within a single, high-level procedural function. This approach, validated by the external statbus_speed project, is the strategic path forward for sql_saga.
The "Plan and Execute" Pattern with "Add-Then-Modify" Ordering:
The definitive solution for complex temporal modifications is the "Plan and Execute" pattern, implemented within a single C or pl/pgsql function.
-
The Plan Phase (Read-Only): The function first reads all source and target data and calculates a complete DML plan (
DELETEs,UPDATEs,INSERTs) that is guaranteed to result in a consistent final state. This phase is pure and does not write to the database or queue any triggers. -
The Execute Phase (Write-Only): The function then executes this plan using a critical "add-then-modify" order. It first performs all
INSERToperations to add new timeline segments. This creates a temporary state of overlapping periods. It then performs allUPDATEs andDELETEs to truncate or remove the old segments.
Why This Is the Correct Solution:
- Deferred Exclusion Constraints: The temporary overlaps created by the "add-first" approach would normally fail. However, they are correctly handled by setting the table's exclusion constraints to
DEFERRED. - Working with the Snapshot Rule: The
sql_sagaCONSTRAINT TRIGGERs queued by the laterUPDATEorDELETEstatements will see a snapshot that contains both the new rows (from the earlierINSERTs) and the old rows. The trigger's validation logic (all_rows_in_snapshot - OLD_row + NEW_row) will therefore be checking a set of rows that correctly represents the final, consistent state of the timeline. - A Single Top-Level Statement: All triggers are deferred until the top-level function call completes, at which point the planner has already guaranteed the timeline is consistent. The triggers act as a powerful, low-level safety net that verifies the planner's logic.
This "add-then-modify" execution strategy is the principled way to solve the multi-statement update problem. It avoids the MVCC snapshot limitation by creating an intermediate state that allows the trigger's validation logic to see the future.
Reference: PostgreSQL Documentation: Chapter 38. Event Triggers
This section summarizes key concepts about event triggers, focusing on their behavior and how they are used within sql_saga.
- Scope: Unlike regular triggers, which are attached to a single table, event triggers are global to a database.
- Events: They capture DDL (Data Definition Language) events, not DML (Data Manipulation Language) events.
- Execution Order: If multiple event triggers are defined for the same event, they are fired in alphabetical order by name.
- Creation: They are created with
CREATE EVENT TRIGGERand require a special function that returns theevent_triggerpseudo-type.
sql_saga relies on the precise timing of two key events:
- Fires: After a DDL command has executed and its changes have been written to the system catalogs, but before the transaction commits.
sql_sagaUsage: Used by therename_followingtrigger. Because it fires after the catalog changes, the trigger can inspect the "new" state of renamed objects (like columns or constraints) and updatesql_saga's metadata tables accordingly.- Context Function:
pg_event_trigger_ddl_commands()returns a list of the DDL commands that were executed.
- Fires: Just before the
ddl_command_endevent for any command that drops objects. - Critical Timing Detail: The documentation states: "the trigger is executed after the objects have been deleted from the system catalogs, so it's not possible to look them up anymore." This is a fundamental point for understanding
sql_saga's protection logic. sql_sagaUsage: Used by thedrop_protectiontrigger. The trigger's logic is a direct consequence of this timing rule:- To detect dropped top-level objects (like a table with an
era): The trigger must use thepg_event_trigger_dropped_objects()function, which provides a list of objects that were just deleted. This is used for cleanup. - To protect dependent objects (like constraints or triggers created by
sql_saga): The protection logic must query the system catalogs and check if the object is missing. A query usingWHERE NOT EXISTS (SELECT FROM pg_constraint ...)is the correct and necessary way to detect if a protected constraint was dropped, because the drop has already happened when the trigger code runs.
- To detect dropped top-level objects (like a table with an
- Interface: C functions do not receive normal arguments. Instead, they get a context pointer to an
EventTriggerDatastruct. - Macro: The
CALLED_AS_EVENT_TRIGGER(fcinfo)macro must be used to verify that the function was called as an event trigger. EventTriggerDatastruct: This struct provides theeventname (e.g.,"sql_drop") and the commandtag(e.g.,"DROP TABLE").- Return Value: The function must return
NULLviaPG_RETURN_NULL().
Reference: PostgreSQL Documentation: Server Programming Interface
- Execution Context: SPI allows C functions to execute SQL commands within the server. All SPI operations must be wrapped between
SPI_connect()andSPI_finish(). - Error Handling: A critical feature of SPI is its error handling model. If any command executed via SPI (e.g.,
SPI_execute) fails, control is not returned to the C function. Instead, the current transaction or subtransaction is immediately rolled back, and an error is raised, unwinding the stack. Documented error-return codes from SPI functions are only for errors detected within the SPI layer itself, not for errors from the executed SQL. - Memory Management: Memory for query results (like
SPI_tuptable) is allocated in a context that is automatically freed bySPI_finish(). For data that needs to be returned from the function, it must be copied into the upper executor context using functions likeSPI_pallocorSPI_copytuple.