Skip to content

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Sep 29, 2025

This PR migrates generic geo algorithms in wherobots/geo's geo-traits-ext and geo-generic-alg crates into sedona-db, and switches depended versions of geo, wkb and wkt to official release versions.

Detailed changes involved:

  • geo-trait-ext: moved from wherobots/geo and implemented the traits for wkb
  • geo-generic-alg: moved from wherobots/geo, removed not-yet-ported non-generic algorithms directly copied from geo
  • geo-test-fixtures: copied from georust/geo for testing the generic algorithms in geo-generic-alg
  • wkb: upgrade to the latest version, fix API breaking changes related to WKB writer options; ported WKB to GEOS conversion to sedona-geos crate
  • adbc_core: upgrade to the latest version and fix API breaking changes

Upstream changes:

@Kontinuation Kontinuation force-pushed the migrate-generic-alg branch 3 times, most recently from 90a859c to 94ae57a Compare September 30, 2025 08:41
@Kontinuation Kontinuation marked this pull request as ready for review September 30, 2025 16:20
@Kontinuation Kontinuation changed the title [WIP] Eliminate forked dependencies: geo, wkb, wkt and adbc_core chore: Eliminate forked dependencies: geo, wkb, wkt and adbc_core Sep 30, 2025
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the tip of the iceberg on a huge, huge amount of work and that the distance work in particular was essential to ensuring we had reasonable performance for our first release!

This is a huge amount of code, and it's unclear to me which pieces are copied and which pieces are original. It's difficult to review this at once, although I understand that we needed this PR in order to ensure all the pieces were there.

I do think there are some orthogonal sets of changes here that would enable the community to more effectively understand and/or review these changes:

  • ADBC changes (I'm happy to copy these over since that one was my fault)
  • WKB/GEOS changes. These could use an isolated/duplicate wkb dependency in a subcrate if the rest of the crates can't handle the new wkb version. I wonder if the wkb dependency to support the WkbExecutor will be a problem here (isolating that is probably a good idea anyway).
  • Test fixtures (https://github.com/apache/sedona-testing is perhaps a better place for these, or perhaps there is a way to expose them from the original crate from whence they came?)
  • sedona-geo-traits-ext (without updating the top-level Cargo.toml). There are a few pieces of this that are missing docstrings and this might be a good opportunity to provide those.
  • sedona-geo-generic-alg (without updating the top-level Cargo.toml). I think we shouldn't include anything we're not specifically using yet (we can copy over the implementation from the fork when we use it, at which point we can run the benchmarks to check the level of complexity. There are some things that duplicate internals we already have (bounds, for example) that could be folded in.
  • Update the top-level Cargo.toml and use the new crates everywhere

I'm happy to review it all at once as well but I think the project will be able to better engage the community if we split it up a bit!

Comment on lines 215 to 217
// Binary of the WKB are not the same, but geometries could be topologically the same.
let expected = wkb::reader::read_wkb(expected_wkb);
let actual = wkb::reader::read_wkb(actual_wkb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have this be a different function (in other words, I would like to know when I am asserting topological equality vs. coordiante-for-coordinate equality (within some tolerance) or byte-for-byte equality).

@@ -52,3 +52,4 @@ sedona-expr = { path = "../sedona-expr" }
sedona-schema = { path = "../sedona-schema" }
wkb = { workspace = true }
wkt = { workspace = true }
geo = { workspace = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep geo out of sedona-testing, which is otherwise algorithm library-neutral (individual tests can always use geo if they want!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we implement topological equality without depending on geo? Do we need to implement a simplified version of the topological equality check in sedona-testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just feature flag it? It can even be in the default features (but then add default-feautres = false to the sedona dependency on sedona-testing, which is a little strange but we have it because of the random geometry generator).

Cargo.toml Outdated
Comment on lines 25 to 27
"rust/geo-traits-ext",
"rust/geo-generic-alg",
"rust/geo-test-fixtures",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename these to sedona-*?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It will help avoid package name conflicts on crates.io.

sedona = { path = "../../rust/sedona" }
sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] }
rstest = { workspace = true }
geo = { workspace = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need geo or would geo-types suffice? (It seems odd to include geo here?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need geo-types here so I'll switch to use geo-types instead of geo. This is a test dependency so I thought that it should be fine.

@Kontinuation
Copy link
Member Author

The WKB/GEOS change cannot be done without the geo-traits-ext and geo-generic-alg change in place. Upgrading the wkb crate in sedona-geos will introduce conflicts of geo-traits versions. I can do this in the final step where we upgrade the top-level Cargo.toml.

The other steps looks fine. I'll submit several smaller PRs to gradually replace/upgrade the dependencies.

@paleolimbot
Copy link
Member

The WKB/GEOS change cannot be done without the geo-traits-ext and geo-generic-alg change in place.

Got it! I'm not worried about that change (it's all your code and I've already reviewed it once 🙂 )

@jiayuasu
Copy link
Member

jiayuasu commented Oct 6, 2025

@Kontinuation are we ready to merge this?

@Kontinuation Kontinuation marked this pull request as draft October 7, 2025 00:31
@Kontinuation
Copy link
Member Author

@Kontinuation are we ready to merge this?

No. I'll break it into 5 smaller PRs. I have converted it to a draft to prevent it from being accidentally merged.

Kontinuation added a commit that referenced this pull request Oct 7, 2025
…t API changes (#190)

This is part of the forked/non-crates.io dependency elimination plan: #165. We switched the adbc_core version from a git rev to the latest release version and fixed API breaking changes.
Kontinuation added a commit that referenced this pull request Oct 8, 2025
…WKBs topologically (#192)

This is part of the forked dependency elimination plan #165. We've hit some overlay operation result assertion errors after upgrading geo to the latest 0.31.0 release. The failures were caused by geo producing different, but topologically equivalent results, and we are asserting on the byte-level equality of resulting WKBs.

This patch adds `assert_scalar_equal_wkb_geometry_topologically` to sedona-testing behind a feature flag "geo". We'll switch to use `assert_scalar_equal_wkb_geometry_topologically` for testing overlay ST functions (ST_Union_Aggr and ST_Intersection_Aggr) after upgrading geo. This function is also useful in many other scenarios, so we add this function in its dedicated PR.
@Kontinuation
Copy link
Member Author

The geo-test-fixtures crate is not published to crates.io, and it is only used as a dev-dependency in geo. I have tried depending on geo-test-fixtures using geo-test-fixtures = { git = "https://github.com/georust/geo.git", tag = "geo-0.31.0"} because we only use it as a dev-dependency as well, so having a git dependency does not block us from publishing the crates. However, I hit a strange error caused by duplicating the geo-types dependency (git rev vs. released), and there's no easy workaround.

I don't think asking georust/geo to publish geo-test-fixtures is a valid request, so I'll copy the wkt files to apache/sedona-testing, and migrate lib.rs to rust/sedona-testing.

Kontinuation added a commit that referenced this pull request Oct 8, 2025
…rate in georust/geo (#193)

This is part of the forked dependency elimination plan: #165.

We'll maintain our generic geo algorithms refactored from georust/geo, the test fixtures for checking the correctness of our refactored implementation is also needed. Unfortunately, geo-test-fixtures is not published to crates.io, so we have to copy it to our own projects to use them.

The WKT files were already committed to the apache/sedona-testing repository (see apache/sedona-testing#9), which is a submodule of sedona-db. We add fixtures for loading the WKT files in the submodule to sedona-testing. These newly added fixtures will be used by future PRs.
Kontinuation added a commit that referenced this pull request Oct 9, 2025
This is part of the forked dependency elimination plan: #165. This PR depends on #193.

This PR moves geo-traits-ext from wherobots/geo to sedona-db and renamed it to sedona-geo-traits-ext. Currently, it is a standalone crate and can be compiled using `cd rust/sedona-geo-traits-ext && cargo build`. We'll update the Cargo.toml files in the final step to make it live.
Kontinuation added a commit that referenced this pull request Oct 9, 2025
This is part of the forked dependency elimination plan: #165. This PR depends on #194.

This PR moves geo-generic-alg from wherobots/geo to sedona-db and renamed it to sedona-geo-generic-alg. Currently it is a standalone crate and can be compiled using `cd rust/sedona-geo-generic-alg && cargo build`. We'll update the Cargo.toml files in the final step to make it live.

The code moved to sedona-db only contains the algorithms actually used by other parts of the project. There's a backup containing all ported algorithms here: https://github.com/Kontinuation/sedona-db/tree/full-migrate-generic-alg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants