Skip to content

Commit d740a83

Browse files
Merge remote-tracking branch 'github/main' into polars_isin
2 parents d73b4bb + b454256 commit d740a83

File tree

63 files changed

+3434
-1116
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+3434
-1116
lines changed

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,41 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.16.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.15.0...v2.16.0) (2025-08-20)
8+
9+
10+
### Features
11+
12+
* Add `bigframes.pandas.options.display.precision` option ([#1979](https://github.com/googleapis/python-bigquery-dataframes/issues/1979)) ([15e6175](https://github.com/googleapis/python-bigquery-dataframes/commit/15e6175ec0aeb1b7b02d0bba9e8e1e018bd11c31))
13+
* Add level, inplace params to reset_index ([#1988](https://github.com/googleapis/python-bigquery-dataframes/issues/1988)) ([3446950](https://github.com/googleapis/python-bigquery-dataframes/commit/34469504b79a082d3380f9f25c597483aef2068a))
14+
* Add ML code samples from dbt blog post ([#1978](https://github.com/googleapis/python-bigquery-dataframes/issues/1978)) ([ebaa244](https://github.com/googleapis/python-bigquery-dataframes/commit/ebaa244a9eb7b87f7f9fd9c3bebe5c7db24cd013))
15+
* Add where, coalesce, fillna, casewhen, invert local impl ([#1976](https://github.com/googleapis/python-bigquery-dataframes/issues/1976)) ([f7f686c](https://github.com/googleapis/python-bigquery-dataframes/commit/f7f686cf85ab7e265d9c07ebc7f0cd59babc5357))
16+
* Adjust anywidget CSS to prevent overflow ([#1981](https://github.com/googleapis/python-bigquery-dataframes/issues/1981)) ([204f083](https://github.com/googleapis/python-bigquery-dataframes/commit/204f083a2f00fcc9fd1500dcd7a738eda3904d2f))
17+
* Format page number in table widget ([#1992](https://github.com/googleapis/python-bigquery-dataframes/issues/1992)) ([e83836e](https://github.com/googleapis/python-bigquery-dataframes/commit/e83836e8e1357f009f3f95666f1661bdbe0d3751))
18+
* Or, And, Xor can execute locally ([#1994](https://github.com/googleapis/python-bigquery-dataframes/issues/1994)) ([59c52a5](https://github.com/googleapis/python-bigquery-dataframes/commit/59c52a55ebea697855eb4c70529e226cc077141f))
19+
* Support callable bigframes function for dataframe where ([#1990](https://github.com/googleapis/python-bigquery-dataframes/issues/1990)) ([44c1ec4](https://github.com/googleapis/python-bigquery-dataframes/commit/44c1ec48cc4db1c4c9c15ec1fab43d4ef0758e56))
20+
* Support callable for series where method ([#2005](https://github.com/googleapis/python-bigquery-dataframes/issues/2005)) ([768b82a](https://github.com/googleapis/python-bigquery-dataframes/commit/768b82af96a5dd0c434edcb171036eb42cfb9b41))
21+
* When using `repr_mode = "anywidget"`, numeric values align right ([15e6175](https://github.com/googleapis/python-bigquery-dataframes/commit/15e6175ec0aeb1b7b02d0bba9e8e1e018bd11c31))
22+
23+
24+
### Bug Fixes
25+
26+
* Address the packages issue for bigframes function ([#1991](https://github.com/googleapis/python-bigquery-dataframes/issues/1991)) ([68f1d22](https://github.com/googleapis/python-bigquery-dataframes/commit/68f1d22d5ed8457a5cabc7751ed1d178063dd63e))
27+
* Correct pypdf dependency specifier for remote PDF functions ([#1980](https://github.com/googleapis/python-bigquery-dataframes/issues/1980)) ([0bd5e1b](https://github.com/googleapis/python-bigquery-dataframes/commit/0bd5e1b3c004124d2100c3fbec2fbe1e965d1e96))
28+
* Enable default retries in calls to BQ Storage Read API ([#1985](https://github.com/googleapis/python-bigquery-dataframes/issues/1985)) ([f25d7bd](https://github.com/googleapis/python-bigquery-dataframes/commit/f25d7bd30800dffa65b6c31b0b7ac711a13d790f))
29+
* Fix the copyright year in dbt sample files ([#1996](https://github.com/googleapis/python-bigquery-dataframes/issues/1996)) ([fad5722](https://github.com/googleapis/python-bigquery-dataframes/commit/fad57223d129f0c95d0c6a066179bb66880edd06))
30+
31+
32+
### Performance Improvements
33+
34+
* Faster session startup by defering anon dataset fetch ([#1982](https://github.com/googleapis/python-bigquery-dataframes/issues/1982)) ([2720c4c](https://github.com/googleapis/python-bigquery-dataframes/commit/2720c4cf070bf57a0930d7623bfc41d89cc053ee))
35+
36+
37+
### Documentation
38+
39+
* Add examples of running bigframes in kaggle ([#2002](https://github.com/googleapis/python-bigquery-dataframes/issues/2002)) ([7d89d76](https://github.com/googleapis/python-bigquery-dataframes/commit/7d89d76976595b75cb0105fbe7b4f7ca2fdf49f2))
40+
* Remove preview warning from partial ordering mode sample notebook ([#1986](https://github.com/googleapis/python-bigquery-dataframes/issues/1986)) ([132e0ed](https://github.com/googleapis/python-bigquery-dataframes/commit/132e0edfe9f96c15753649d77fcb6edd0b0708a3))
41+
742
## [2.15.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.14.0...v2.15.0) (2025-08-11)
843

944

bigframes/core/compile/polars/compiler.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,10 @@ def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
198198
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
199199
return l_input | r_input
200200

201+
@compile_op.register(bool_ops.XorOp)
202+
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
203+
return l_input ^ r_input
204+
201205
@compile_op.register(num_ops.AddOp)
202206
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
203207
return l_input + r_input

bigframes/core/compile/sqlglot/aggregate_compiler.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
import sqlglot.expressions as sge
1717

18-
from bigframes.core import expression
18+
from bigframes.core import expression, window_spec
1919
from bigframes.core.compile.sqlglot.aggregations import (
2020
binary_compiler,
2121
nullary_compiler,
@@ -56,3 +56,21 @@ def compile_aggregate(
5656
return binary_compiler.compile(aggregate.op, left, right)
5757
else:
5858
raise ValueError(f"Unexpected aggregation: {aggregate}")
59+
60+
61+
def compile_analytic(
62+
aggregate: expression.Aggregation,
63+
window: window_spec.WindowSpec,
64+
) -> sge.Expression:
65+
if isinstance(aggregate, expression.NullaryAggregation):
66+
return nullary_compiler.compile(aggregate.op)
67+
if isinstance(aggregate, expression.UnaryAggregation):
68+
column = typed_expr.TypedExpr(
69+
scalar_compiler.compile_scalar_expression(aggregate.arg),
70+
aggregate.arg.output_type,
71+
)
72+
return unary_compiler.compile(aggregate.op, column, window)
73+
elif isinstance(aggregate, expression.BinaryAggregation):
74+
raise NotImplementedError("binary analytic operations not yet supported")
75+
else:
76+
raise ValueError(f"Unexpected analytic operation: {aggregate}")

bigframes/core/compile/sqlglot/aggregations/unary_compiler.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
import sqlglot.expressions as sge
2020

21+
from bigframes import dtypes
2122
from bigframes.core import window_spec
2223
import bigframes.core.compile.sqlglot.aggregations.op_registration as reg
2324
from bigframes.core.compile.sqlglot.aggregations.windows import apply_window_if_present
@@ -36,14 +37,26 @@ def compile(
3637
return UNARY_OP_REGISTRATION[op](op, column, window=window)
3738

3839

40+
@UNARY_OP_REGISTRATION.register(agg_ops.CountOp)
41+
def _(
42+
op: agg_ops.CountOp,
43+
column: typed_expr.TypedExpr,
44+
window: typing.Optional[window_spec.WindowSpec] = None,
45+
) -> sge.Expression:
46+
return apply_window_if_present(sge.func("COUNT", column.expr), window)
47+
48+
3949
@UNARY_OP_REGISTRATION.register(agg_ops.SumOp)
4050
def _(
4151
op: agg_ops.SumOp,
4252
column: typed_expr.TypedExpr,
4353
window: typing.Optional[window_spec.WindowSpec] = None,
4454
) -> sge.Expression:
55+
expr = column.expr
56+
if column.dtype == dtypes.BOOL_DTYPE:
57+
expr = sge.Cast(this=column.expr, to="INT64")
4558
# Will be null if all inputs are null. Pandas defaults to zero sum though.
46-
expr = apply_window_if_present(sge.func("SUM", column.expr), window)
59+
expr = apply_window_if_present(sge.func("SUM", expr), window)
4760
return sge.func("IFNULL", expr, ir._literal(0, column.dtype))
4861

4962

bigframes/core/compile/sqlglot/compiler.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,75 @@ def compile_aggregate(
298298

299299
return child.aggregate(aggregations, by_cols, tuple(dropna_cols))
300300

301+
@_compile_node.register
302+
def compile_window(
303+
self, node: nodes.WindowOpNode, child: ir.SQLGlotIR
304+
) -> ir.SQLGlotIR:
305+
window_spec = node.window_spec
306+
if node.expression.op.order_independent and window_spec.is_unbounded:
307+
# notably percentile_cont does not support ordering clause
308+
window_spec = window_spec.without_order()
309+
310+
window_op = aggregate_compiler.compile_analytic(node.expression, window_spec)
311+
312+
inputs: tuple[sge.Expression, ...] = tuple(
313+
scalar_compiler.compile_scalar_expression(expression.DerefOp(column))
314+
for column in node.expression.column_references
315+
)
316+
317+
clauses: list[tuple[sge.Expression, sge.Expression]] = []
318+
if node.expression.op.skips_nulls and not node.never_skip_nulls:
319+
for column in inputs:
320+
clauses.append((sge.Is(this=column, expression=sge.Null()), sge.Null()))
321+
322+
if window_spec.min_periods and len(inputs) > 0:
323+
if node.expression.op.skips_nulls:
324+
# Most operations do not count NULL values towards min_periods
325+
not_null_columns = [
326+
sge.Not(this=sge.Is(this=column, expression=sge.Null()))
327+
for column in inputs
328+
]
329+
# All inputs must be non-null for observation to count
330+
if not not_null_columns:
331+
is_observation_expr: sge.Expression = sge.convert(True)
332+
else:
333+
is_observation_expr = not_null_columns[0]
334+
for expr in not_null_columns[1:]:
335+
is_observation_expr = sge.And(
336+
this=is_observation_expr, expression=expr
337+
)
338+
is_observation = ir._cast(is_observation_expr, "INT64")
339+
observation_count = windows.apply_window_if_present(
340+
sge.func("SUM", is_observation), window_spec
341+
)
342+
else:
343+
# Operations like count treat even NULLs as valid observations
344+
# for the sake of min_periods notnull is just used to convert
345+
# null values to non-null (FALSE) values to be counted.
346+
is_observation = ir._cast(
347+
sge.Not(this=sge.Is(this=inputs[0], expression=sge.Null())),
348+
"INT64",
349+
)
350+
observation_count = windows.apply_window_if_present(
351+
sge.func("COUNT", is_observation), window_spec
352+
)
353+
354+
clauses.append(
355+
(
356+
observation_count < sge.convert(window_spec.min_periods),
357+
sge.Null(),
358+
)
359+
)
360+
if clauses:
361+
when_expressions = [sge.When(this=cond, true=res) for cond, res in clauses]
362+
window_op = sge.Case(ifs=when_expressions, default=window_op)
363+
364+
# TODO: check if we can directly window the expression.
365+
return child.window(
366+
window_op=window_op,
367+
output_column_id=node.output_name.sql,
368+
)
369+
301370

302371
def _replace_unsupported_ops(node: nodes.BigFrameNode):
303372
node = nodes.bottom_up(node, rewrite.rewrite_slice)

0 commit comments

Comments
 (0)