Skip to content

[LIQ] Add new aggregate functions, aliases, and queryable aggregate registry#1891

Merged
zefhemel merged 28 commits intosilverbulletmd:mainfrom
mjf:liq-improve-aggregates
Mar 19, 2026
Merged

[LIQ] Add new aggregate functions, aliases, and queryable aggregate registry#1891
zefhemel merged 28 commits intosilverbulletmd:mainfrom
mjf:liq-improve-aggregates

Conversation

@mjf
Copy link
Copy Markdown
Contributor

@mjf mjf commented Mar 17, 2026

  • Extend with 13 new built-in aggregates: product, string_agg, yaml_agg, json_agg, bit_and, bit_or, bit_xor, bool_and, bool_or, stddev_pop, stddev_samp, var_pop and var_samp.

  • Introduce aggregate.alias API allowing users to define custom aliases for any aggregate. Standard aliases (every, std, stddev and variance) are now defined via this API rather than hardcoded.

  • Add index.aggregates queryable collection so users can discover all available aggregates directly from LIQ queries.

UPDATE:

  • Add guards against LIQ_NULL leaking in misc. way in many places!

TL;DR in commit messages... :)

…egistry

* Extend with 13 new built-in aggregates: `product`, `string_agg`,
  `yaml_agg`, `json_agg`, `bit_and`, `bit_or`, `bit_xor`, `bool_and`,
  `bool_or`, `stddev_pop`, `stddev_samp`, `var_pop` and `var_samp`.

* Introduce `aggregate.alias` API allowing users to define custom
  aliases for any aggregate. Standard aliases (`every`, `std`, `stddev`
  and `variance`) are now defined via this API rather than hardcoded.

* Add `index.aggregates` queryable collection so users can discover
  all available aggregates directly from LIQ queries.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as draft March 17, 2026 08:44
@mjf mjf marked this pull request as ready for review March 17, 2026 10:31
@mjf
Copy link
Copy Markdown
Contributor Author

mjf commented Mar 17, 2026

This too fixes custom aggregates that were in fact never working.

…rage

`config.set` uses `LuaNativeJSFunction` which calls `luaValueToJS` on
all arguments. This converted the aggregate `LuaTable` to a plain JS
object and wrapped `LuaFunction` callbacks in JS functions that also
converted their returned values via `luaValueToJS`. The result was that
state returned by initialize (a `LuaTable`) got converted to a plain JS
object before being passed to `iterate`. Therefor Lua operations like
`table.insert` on that were failing because they expected a `LuaTable`
and not a plain JS array.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf
Copy link
Copy Markdown
Contributor Author

mjf commented Mar 17, 2026

Also fixes table functions not working in intermediate state in aggregates.

mjf added 5 commits March 17, 2026 15:51
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
…egistry

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as draft March 18, 2026 07:39
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as ready for review March 18, 2026 07:55
mjf added 4 commits March 18, 2026 09:23
This preserves `null`/`undefined` as-is (both map to Lua nil) and
prevents them from falling through to the `typeof` "object" branch.

For this PR it means that null `target` in our `aggregates` entries will
correctly show as empty/`nil` in query results rather than `{}`.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
… leaks

* `sum(`) and` product(`) now return null when no rows match (matching
  Postgres semantics) instead of returning 0 and 1 respectively.

* Query result columns that hold null are internally preserved using
  a `LIQ_NULL` sentinel so that column keys survive in `LuaTable`
  storage.  This sentinel was leaking into Lua code as "userdata"
  through three read paths:

  * `luaIndexValue`: `rawGet` returned the sentinel directly to Lua when
    accessing table fields,

  * `rawget` (stdlib): the builtin `rawget` function exposed the
    sentinel without converting it back to `nil`,

  * `createAugmentedEnv`: string interpolation unpacked table values via
    `rawGet` into local variables, making the sentinel visible in
    template expressions like `${var}`.

  All three now convert `LIQ_NULL` to `nil` at the read boundary,
  keeping the sentinel internal to table storage where it belongs.

* Update affected test expectations accordingly.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
…regate `iterate`s

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as draft March 18, 2026 12:24
@mjf
Copy link
Copy Markdown
Contributor Author

mjf commented Mar 18, 2026

Switched to draft again, working on LIQ_NULL leakage in many places.

mjf added 7 commits March 18, 2026 13:27
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
`JSON.stringify(Symbol(...))` in an array produces null by accident.
That is a JS implementation detail we **MUST NOT** rely on. Explicit
null push makes intent clear and avoids surprises if the `Symbol`
representation ever changes.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
`js-yaml` has no knowledge of the `LIQ_NULL` symbol. Passing null makes
it emit YAML null (or `~`), which is the correct YAML representation of
a missing value and matches standard `json_agg`/`yaml_agg`
NULL-inclusion semantics.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Without this, `LIQ_NULL` sort keys would fall through to `valA < valB`
which is always false for `Symbol`s which is breaking the `nulls
first`/`nulls last` contract...

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
…NULL` sentinel

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
…visible text

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as ready for review March 18, 2026 13:33
@mjf
Copy link
Copy Markdown
Contributor Author

mjf commented Mar 18, 2026

I believe most real issues are now solved.

@mjf mjf marked this pull request as draft March 18, 2026 20:59
mjf added 3 commits March 19, 2026 08:25
…egates

Extra arguments (2nd, 3rd, etc.) to aggregate functions were evaluated
against the outer query environment where the object variable is not
bound. This caused multi-argument aggregates like `covar_samp(data.y,
data.x)` to fail with nil reference errors. This commit addresses this
by evaluating extra args per-item inside the iterate loop using the item
environment so all arguments resolve correctly.

We also add few common aggregates:

- `covar_pop`, `covar_samp`, `corr`: population/sample covariance and
  correlation coefficient using online co-moment algorithm.

- `quantile(value, q, method)`: general quantile with interpolation
  methods: lower, higher, nearest, midpoint and default linear.

- `percentile_cont(value, q)`: continuous percentile (linear)

- `percentile_disc(value, q)`: discrete percentile (lower)

Note: `percentile_cont` and `percentile_disc` share the `quantile`
implementation through `ctx.name` at initialize time.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@mjf mjf marked this pull request as ready for review March 19, 2026 07:35
@mjf
Copy link
Copy Markdown
Contributor Author

mjf commented Mar 19, 2026

Ok, I was wrong, now I believe we are ok to go... 🎆

@zefhemel
Copy link
Copy Markdown
Collaborator

Very cool, but could you add some tests for the new aggregates?

mjf added 3 commits March 19, 2026 08:59
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
@zefhemel zefhemel merged commit 6cbb61c into silverbulletmd:main Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants