Measures and Dimensional Calculations in Morel #344
julianhyde
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Disclaimer: The following was written by Claude, after it had read "Measures in SQL". I agree with most of its conclusions.
Design notes for adding measure support to Morel, informed by Measures in SQL (Hyde & Fremlin, SIGMOD-Companion '24).
Background
In SQL, a measure is a new kind of column that encapsulates an aggregate calculation and attaches it to a table. Unlike a regular column (which has a value per row), a measure's value depends on an evaluation context — a predicate over the table's dimensions. The
AToperator modifies this context, enabling concise dimensional calculations like year-over-year growth or proportion-of-total without correlated subqueries or self-joins.BI languages (MDX, DAX, Tableau LOD) have had this concept for decades. The paper shows how to bring it into SQL while preserving composability and closure. The question here is: how should it work in a functional language like Morel?
The Core Insight: Measures Are Lambdas
The paper reveals this directly in its formal semantics. For a measure
sumRevenuedefined asSUM(revenue)on tableOrders, the system generates an auxiliary function:that evaluates
SUM(revenue)over the rows satisfyingpred. In Morel terms, a measure is a closure over a base relation:The entire SQL machinery — evaluation context,
AToperator,VISIBLE— is syntactic sugar for constructing and transforming that predicate argument.Challenges for a Functional Language
1. Implicit vs. explicit context
The power of measures in SQL comes from the evaluation context flowing implicitly from
GROUP BYand query structure. InSELECT prodName, AGGREGATE(sumRevenue) FROM ... GROUP BY prodName, the contextprodName = 'Widgets'is derived automatically. A functional language makes things explicit — which is normally a virtue, but here it would force the user to manually construct predicates, destroying the conciseness.2. Measures are neither values nor functions — they're deferred aggregations
A regular column
e.salhas typeintat every row. A measureavgSalhas typeintonly after specifying which rows to aggregate over. This is a phase distinction. The type system must prevent accidentally using a measure as a scalar (avgSal + 1is meaningless without context) while allowing it insidecomputeoratexpressions.3. The "attached to a relation" problem
A measure is not free-standing — it's bound to a specific base table. When you
from e in enhancedOrders, the measuree.sumRevenueis available but carries a hidden reference back toorders. This doesn't exist in ML's type system. A record field is just a value; there's no notion of a field that's actually a suspended computation over the record's source relation.4. Composability through views
The paper emphasizes that tables-with-measures are closed under query operations — a query over a table with measures returns a table with measures. This "measures propagate through the query pipeline" property requires careful design so that
from ... yield ... where ...transformations preserve measure availability.5. The
atoperator breaks referential transparency (locally)profitMargin at {year = year - 1}evaluates the same expression in a different context. This is essentially dynamic scoping — the result depends on which "year" the caller provides. In functional terms it maps tolocalin a reader monad, but introducing monadic plumbing everywhere would be heavy.Proposed Design: Lambdas + Syntactic Sugar
Measures are functions under the hood, with syntactic sugar that makes the common patterns concise.
Defining Measures
Introduce a
definestep infromexpressions that attaches measures to a relation:definedoesn't change the row type or filter rows — it attaches named aggregate computations. The result type would be something like:{orderDate: date, prodName: string, revenue: real, cost: real, sumRevenue: real measure, sumCost: real measure, profitMargin: real measure} listwhere
real measureis a new type constructor meaning "an aggregation that produces areal."Evaluating Measures
Inside
group ... compute, measures evaluate against the group's context automatically:The desugaring:
profitMarginexpands to the aggregate expression(sum over o.revenue - sum over o.cost) / sum over o.revenue, scoped to rows whereprodNamematches the group key. No new evaluationmachinery needed — it reuses
compute.Without
group, a barecompute o.sumRevenuegives the grand total.The
atOperator for Dimensional CalculationsIntroduce
atas a postfix operator on measures:The
atmodifier transforms the evaluation context:at {year = year - 1}at {all year}at {all prodName, all year}Desugaring
atmeasure at {year = year - 1}desugars to a correlated subquery-like expansion. In the lambda formulation,atmodifies the predicate:Year-over-Year Growth: Full Example
Why Not Plain Lambdas?
You could do all of this with explicit lambdas today, without any new syntax:
This works but has serious problems:
at-style context modification requires writing out the full predicate.The sugar adds three things: (a) the measure definition is co-located with the data, (b) the evaluation context is derived from query structure automatically, (c)
atlets you modify one dimension without restating the others.Type System Sketch
Rules:
defineclause,agg over ehas typet measureagg : t list -> tcompute, a measure evaluates to its base typetAGGREGATEm at {...}has the same type asmm1 + m2 : int measureif both areint measureOpen Questions
Should
definebe afromstep or a separate declaration? Afromstep composes naturally in pipelines. But a standalonevalbinding would allow reuse across queries.Semi-additive measures (e.g., inventory that sums across products but uses
last_valueacross time) — these need per-dimension aggregate specifications. Probably out of scope initially.Interaction with Morel's
fromsteps —where,yield,orderafterdefineshould propagate the measures.groupwithoutcomputedrops them (the dimensions changed).yieldthat projects away a dimension changes the measure's dimensionality.The
visiblemodifier — shouldwhereclauses restrict measures? The paper's default is no (measures see the full base table);visibleopts in to the query's filters. For Morel, the default matters for user expectations.Interaction with
union/except/intersect— set operations on relations with measures need clear semantics. Presumably measures are only preserved if both sides define the same measures over the same base table.References
Beta Was this translation helpful? Give feedback.
All reactions