Skip to content

Commit c4c7fa5

Browse files
authored
feat: add st_buffer, st_centroid, and st_convexhull and their corresponding GeoSeries methods (#1963)
* chore: create a specs folder for llm-driven development * include test instructions * add steps for adding an operator * create high-level spec * reformat detailed list * add detailed steps * WIP: implement ops for st_buffer, st_centroid, and st_convexhull * wip: continue implementation * add note about doctest * apply new function * fix doctest * be more forceful regarding spec-driven development * feat: implement GeoSeries scalar operators * revert scalar_op_compiler.py troubles * switch back to unary * avoid option type in st_buffer * Update specs/2025-08-04-geoseries-scalars.md * Apply suggestions from code review * remove keyword-only arguments * fix warnings and mypy errors * make buffer doctest more robust
1 parent c67ac28 commit c4c7fa5

File tree

11 files changed

+996
-60
lines changed

11 files changed

+996
-60
lines changed

GEMINI.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Contribution guidelines, tailored for LLM agents
2+
3+
## Testing
4+
5+
We use `nox` to instrument our tests.
6+
7+
- To test your changes, run unit tests with `nox`:
8+
9+
```bash
10+
nox -r -s unit
11+
```
12+
13+
- To run a single unit test:
14+
15+
```bash
16+
nox -r -s unit-3.13 -- -k <name of test>
17+
```
18+
19+
- To run system tests, you can execute::
20+
21+
# Run all system tests
22+
$ nox -r -s system
23+
24+
# Run a single system test
25+
$ nox -r -s system-3.13 -- -k <name of test>
26+
27+
- The codebase must have better coverage than it had previously after each
28+
change. You can test coverage via `nox -s unit system cover` (takes a long
29+
time).
30+
31+
## Code Style
32+
33+
- We use the automatic code formatter `black`. You can run it using
34+
the nox session `format`. This will eliminate many lint errors. Run via:
35+
36+
```bash
37+
nox -r -s format
38+
```
39+
40+
- PEP8 compliance is required, with exceptions defined in the linter configuration.
41+
If you have ``nox`` installed, you can test that you have not introduced
42+
any non-compliant code via:
43+
44+
```
45+
nox -r -s lint
46+
```
47+
48+
## Documentation
49+
50+
If a method or property is implementing the same interface as a third-party
51+
package such as pandas or scikit-learn, place the relevant docstring in the
52+
corresponding `third_party/bigframes_vendored/package_name` directory, not in
53+
the `bigframes` directory. Implementations may be placed in the `bigframes`
54+
directory, though.
55+
56+
### Testing code samples
57+
58+
Code samples are very important for accurate documentation. We use the "doctest"
59+
framework to ensure the samples are functioning as expected. After adding a code
60+
sample, please ensure it is correct by running doctest. To run the samples
61+
doctests for just a single method, refer to the following example:
62+
63+
```bash
64+
pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut
65+
```
66+
67+
## Tips for implementing common BigFrames features
68+
69+
### Adding a scalar operator
70+
71+
For an example, see commit
72+
[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425).
73+
74+
To add a new scalar operator, follow these steps:
75+
76+
1. **Define the operation dataclass:**
77+
- In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one.
78+
- Create a new dataclass inheriting from `base_ops.UnaryOp` for unary
79+
operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp`
80+
for ternary operators, or `base_ops.NaryOp for operators with many
81+
arguments. Note that these operators are counting the number column-like
82+
arguments. A function that takes only a single column but several literal
83+
values would still be a `UnaryOp`.
84+
- Define the `name` of the operation and any parameters it requires.
85+
- Implement the `output_type` method to specify the data type of the result.
86+
87+
2. **Export the new operation:**
88+
- In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list.
89+
90+
3. **Implement the user-facing function (pandas-like):**
91+
92+
- Identify the canonical function from pandas / geopandas / awkward array /
93+
other popular Python package that this operator implements.
94+
- Find the corresponding class in BigFrames. For example, the implementation
95+
for most geopandas.GeoSeries methods is in
96+
`bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented
97+
in `bigframes/series.py` or one of the accessors, such as `StringMethods`
98+
in `bigframes/operations/strings.py`.
99+
- Create the user-facing function that will be called by users (e.g., `length`).
100+
- If the SQL method differs from pandas or geopandas in a way that can't be
101+
made the same, raise a `NotImplementedError` with an appropriate message and
102+
link to the feedback form.
103+
- Add the docstring to the corresponding file in
104+
`third_party/bigframes_vendored`, modeled after pandas / geopandas.
105+
106+
4. **Implement the user-facing function (SQL-like):**
107+
108+
- In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one.
109+
- Create the user-facing function that will be called by users (e.g., `st_length`).
110+
- This function should take a `Series` for any column-like inputs, plus any other parameters.
111+
- Inside the function, call `series._apply_unary_op`,
112+
`series._apply_binary_op`, or similar passing the operation dataclass you
113+
created.
114+
- Add a comprehensive docstring with examples.
115+
- In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list.
116+
117+
5. **Implement the compilation logic:**
118+
- In `bigframes/core/compile/scalar_op_compiler.py`:
119+
- If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method.
120+
- If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature.
121+
- Create a new compiler implementation function (e.g., `geo_length_op_impl`).
122+
- Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`.
123+
- This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression.
124+
125+
6. **Add Tests:**
126+
- Add system tests in the `tests/system/` directory to verify the end-to-end
127+
functionality of the new operator. Test various inputs, including edge cases
128+
and `NULL` values.
129+
130+
Where possible, run the same test code against pandas or GeoPandas and
131+
compare that the outputs are the same (except for dtypes if BigFrames
132+
differs from pandas).
133+
- If you are overriding a pandas or GeoPandas property, add a unit test to
134+
ensure the correct behavior (e.g., raising `NotImplementedError` if the
135+
functionality is not supported).
136+
137+
138+
## Constraints
139+
140+
- Only add git commits. Do not change git history.
141+
- Follow the spec file for development.
142+
- Check off items in the "Acceptance
143+
criteria" and "Detailed steps" sections with `[x]`.
144+
- Please do this as they are completed.
145+
- Refer back to the spec after each step.

bigframes/bigquery/__init__.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@
2929
)
3030
from bigframes.bigquery._operations.geo import (
3131
st_area,
32+
st_buffer,
33+
st_centroid,
34+
st_convexhull,
3235
st_difference,
3336
st_distance,
3437
st_intersection,
@@ -54,11 +57,18 @@
5457
# approximate aggregate ops
5558
"approx_top_count",
5659
# array ops
57-
"array_length",
5860
"array_agg",
61+
"array_length",
5962
"array_to_string",
63+
# datetime ops
64+
"unix_micros",
65+
"unix_millis",
66+
"unix_seconds",
6067
# geo ops
6168
"st_area",
69+
"st_buffer",
70+
"st_centroid",
71+
"st_convexhull",
6272
"st_difference",
6373
"st_distance",
6474
"st_intersection",
@@ -81,8 +91,4 @@
8191
"sql_scalar",
8292
# struct ops
8393
"struct",
84-
# datetime ops
85-
"unix_micros",
86-
"unix_millis",
87-
"unix_seconds",
8894
]

bigframes/bigquery/_operations/geo.py

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,187 @@ def st_area(
103103
return series
104104

105105

106+
def st_buffer(
107+
series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries],
108+
buffer_radius: float,
109+
num_seg_quarter_circle: float = 8.0,
110+
use_spheroid: bool = False,
111+
) -> bigframes.series.Series:
112+
"""
113+
Computes a `GEOGRAPHY` that represents all points whose distance from the
114+
input `GEOGRAPHY` is less than or equal to `distance` meters.
115+
116+
.. note::
117+
BigQuery's Geography functions, like `st_buffer`, interpret the geometry
118+
data type as a point set on the Earth's surface. A point set is a set
119+
of points, lines, and polygons on the WGS84 reference spheroid, with
120+
geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data
121+
122+
**Examples:**
123+
124+
>>> import bigframes.geopandas
125+
>>> import bigframes.pandas as bpd
126+
>>> import bigframes.bigquery as bbq
127+
>>> from shapely.geometry import Point
128+
>>> bpd.options.display.progress_bar = None
129+
130+
>>> series = bigframes.geopandas.GeoSeries(
131+
... [
132+
... Point(0, 0),
133+
... Point(1, 1),
134+
... ]
135+
... )
136+
>>> series
137+
0 POINT (0 0)
138+
1 POINT (1 1)
139+
dtype: geometry
140+
141+
>>> buffer = bbq.st_buffer(series, 100)
142+
>>> bbq.st_area(buffer) > 0
143+
0 True
144+
1 True
145+
dtype: boolean
146+
147+
Args:
148+
series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries):
149+
A series containing geography objects.
150+
buffer_radius (float):
151+
The distance in meters.
152+
num_seg_quarter_circle (float, optional):
153+
Specifies the number of segments that are used to approximate a
154+
quarter circle. The default value is 8.0.
155+
use_spheroid (bool, optional):
156+
Determines how this function measures distance. If use_spheroid is
157+
FALSE, the function measures distance on the surface of a perfect
158+
sphere. The use_spheroid parameter currently only supports the
159+
value FALSE. The default value of use_spheroid is FALSE.
160+
161+
Returns:
162+
bigframes.pandas.Series:
163+
A series of geography objects representing the buffered geometries.
164+
"""
165+
op = ops.GeoStBufferOp(
166+
buffer_radius=buffer_radius,
167+
num_seg_quarter_circle=num_seg_quarter_circle,
168+
use_spheroid=use_spheroid,
169+
)
170+
series = series._apply_unary_op(op)
171+
series.name = None
172+
return series
173+
174+
175+
def st_centroid(
176+
series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries],
177+
) -> bigframes.series.Series:
178+
"""
179+
Computes the geometric centroid of a `GEOGRAPHY` type.
180+
181+
For `POINT` and `MULTIPOINT` types, this is the arithmetic mean of the
182+
input coordinates. For `LINESTRING` and `POLYGON` types, this is the
183+
center of mass. For `GEOMETRYCOLLECTION` types, this is the center of
184+
mass of the collection's elements.
185+
186+
.. note::
187+
BigQuery's Geography functions, like `st_centroid`, interpret the geometry
188+
data type as a point set on the Earth's surface. A point set is a set
189+
of points, lines, and polygons on the WGS84 reference spheroid, with
190+
geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data
191+
192+
**Examples:**
193+
194+
>>> import bigframes.geopandas
195+
>>> import bigframes.pandas as bpd
196+
>>> import bigframes.bigquery as bbq
197+
>>> from shapely.geometry import Polygon, LineString, Point
198+
>>> bpd.options.display.progress_bar = None
199+
200+
>>> series = bigframes.geopandas.GeoSeries(
201+
... [
202+
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
203+
... LineString([(0, 0), (1, 1), (0, 1)]),
204+
... Point(0, 1),
205+
... ]
206+
... )
207+
>>> series
208+
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
209+
1 LINESTRING (0 0, 1 1, 0 1)
210+
2 POINT (0 1)
211+
dtype: geometry
212+
213+
>>> bbq.st_centroid(series)
214+
0 POINT (0.03333 0.06667)
215+
1 POINT (0.49998 0.70712)
216+
2 POINT (0 1)
217+
dtype: geometry
218+
219+
Args:
220+
series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries):
221+
A series containing geography objects.
222+
223+
Returns:
224+
bigframes.pandas.Series:
225+
A series of geography objects representing the centroids.
226+
"""
227+
series = series._apply_unary_op(ops.geo_st_centroid_op)
228+
series.name = None
229+
return series
230+
231+
232+
def st_convexhull(
233+
series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries],
234+
) -> bigframes.series.Series:
235+
"""
236+
Computes the convex hull of a `GEOGRAPHY` type.
237+
238+
The convex hull is the smallest convex set that contains all of the
239+
points in the input `GEOGRAPHY`.
240+
241+
.. note::
242+
BigQuery's Geography functions, like `st_convexhull`, interpret the geometry
243+
data type as a point set on the Earth's surface. A point set is a set
244+
of points, lines, and polygons on the WGS84 reference spheroid, with
245+
geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data
246+
247+
**Examples:**
248+
249+
>>> import bigframes.geopandas
250+
>>> import bigframes.pandas as bpd
251+
>>> import bigframes.bigquery as bbq
252+
>>> from shapely.geometry import Polygon, LineString, Point
253+
>>> bpd.options.display.progress_bar = None
254+
255+
>>> series = bigframes.geopandas.GeoSeries(
256+
... [
257+
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
258+
... LineString([(0, 0), (1, 1), (0, 1)]),
259+
... Point(0, 1),
260+
... ]
261+
... )
262+
>>> series
263+
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
264+
1 LINESTRING (0 0, 1 1, 0 1)
265+
2 POINT (0 1)
266+
dtype: geometry
267+
268+
>>> bbq.st_convexhull(series)
269+
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
270+
1 POLYGON ((0 0, 1 1, 0 1, 0 0))
271+
2 POINT (0 1)
272+
dtype: geometry
273+
274+
Args:
275+
series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries):
276+
A series containing geography objects.
277+
278+
Returns:
279+
bigframes.pandas.Series:
280+
A series of geography objects representing the convex hulls.
281+
"""
282+
series = series._apply_unary_op(ops.geo_st_convexhull_op)
283+
series.name = None
284+
return series
285+
286+
106287
def st_difference(
107288
series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries],
108289
other: Union[

0 commit comments

Comments
 (0)