-
Notifications
You must be signed in to change notification settings - Fork 30
chore: Eliminate forked dependencies: geo, wkb, wkt and adbc_core #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
90a859c to
94ae57a
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the tip of the iceberg on a huge, huge amount of work and that the distance work in particular was essential to ensuring we had reasonable performance for our first release!
This is a huge amount of code, and it's unclear to me which pieces are copied and which pieces are original. It's difficult to review this at once, although I understand that we needed this PR in order to ensure all the pieces were there.
I do think there are some orthogonal sets of changes here that would enable the community to more effectively understand and/or review these changes:
- ADBC changes (I'm happy to copy these over since that one was my fault)
- WKB/GEOS changes. These could use an isolated/duplicate
wkbdependency in a subcrate if the rest of the crates can't handle the new wkb version. I wonder if the wkb dependency to support the WkbExecutor will be a problem here (isolating that is probably a good idea anyway). - Test fixtures (https://github.com/apache/sedona-testing is perhaps a better place for these, or perhaps there is a way to expose them from the original crate from whence they came?)
- sedona-geo-traits-ext (without updating the top-level Cargo.toml). There are a few pieces of this that are missing docstrings and this might be a good opportunity to provide those.
- sedona-geo-generic-alg (without updating the top-level Cargo.toml). I think we shouldn't include anything we're not specifically using yet (we can copy over the implementation from the fork when we use it, at which point we can run the benchmarks to check the level of complexity. There are some things that duplicate internals we already have (bounds, for example) that could be folded in.
- Update the top-level Cargo.toml and use the new crates everywhere
I'm happy to review it all at once as well but I think the project will be able to better engage the community if we split it up a bit!
rust/sedona-testing/src/compare.rs
Outdated
| // Binary of the WKB are not the same, but geometries could be topologically the same. | ||
| let expected = wkb::reader::read_wkb(expected_wkb); | ||
| let actual = wkb::reader::read_wkb(actual_wkb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have this be a different function (in other words, I would like to know when I am asserting topological equality vs. coordiante-for-coordinate equality (within some tolerance) or byte-for-byte equality).
rust/sedona-testing/Cargo.toml
Outdated
| @@ -52,3 +52,4 @@ sedona-expr = { path = "../sedona-expr" } | |||
| sedona-schema = { path = "../sedona-schema" } | |||
| wkb = { workspace = true } | |||
| wkt = { workspace = true } | |||
| geo = { workspace = true } | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep geo out of sedona-testing, which is otherwise algorithm library-neutral (individual tests can always use geo if they want!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we implement topological equality without depending on geo? Do we need to implement a simplified version of the topological equality check in sedona-testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just feature flag it? It can even be in the default features (but then add default-feautres = false to the sedona dependency on sedona-testing, which is a little strange but we have it because of the random geometry generator).
Cargo.toml
Outdated
| "rust/geo-traits-ext", | ||
| "rust/geo-generic-alg", | ||
| "rust/geo-test-fixtures", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename these to sedona-*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It will help avoid package name conflicts on crates.io.
c/sedona-geos/Cargo.toml
Outdated
| sedona = { path = "../../rust/sedona" } | ||
| sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] } | ||
| rstest = { workspace = true } | ||
| geo = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need geo or would geo-types suffice? (It seems odd to include geo here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need geo-types here so I'll switch to use geo-types instead of geo. This is a test dependency so I thought that it should be fine.
|
The WKB/GEOS change cannot be done without the geo-traits-ext and geo-generic-alg change in place. Upgrading the wkb crate in The other steps looks fine. I'll submit several smaller PRs to gradually replace/upgrade the dependencies. |
Got it! I'm not worried about that change (it's all your code and I've already reviewed it once 🙂 ) |
|
@Kontinuation are we ready to merge this? |
No. I'll break it into 5 smaller PRs. I have converted it to a draft to prevent it from being accidentally merged. |
- Applied Apache license headers to 63 Rust source files in geo-generic-alg and geo-traits-ext crates - Added license_ignore.txt to exclude .wkt test fixture files from license checks - Fixes RAT (Release Audit Tool) violations for packaging workflow
e53efe4 to
f02cb77
Compare
…WKBs topologically (#192) This is part of the forked dependency elimination plan #165. We've hit some overlay operation result assertion errors after upgrading geo to the latest 0.31.0 release. The failures were caused by geo producing different, but topologically equivalent results, and we are asserting on the byte-level equality of resulting WKBs. This patch adds `assert_scalar_equal_wkb_geometry_topologically` to sedona-testing behind a feature flag "geo". We'll switch to use `assert_scalar_equal_wkb_geometry_topologically` for testing overlay ST functions (ST_Union_Aggr and ST_Intersection_Aggr) after upgrading geo. This function is also useful in many other scenarios, so we add this function in its dedicated PR.
The I don't think asking georust/geo to publish geo-test-fixtures is a valid request, so I'll copy the wkt files to apache/sedona-testing, and migrate lib.rs to rust/sedona-testing. |
…rate in georust/geo (#193) This is part of the forked dependency elimination plan: #165. We'll maintain our generic geo algorithms refactored from georust/geo, the test fixtures for checking the correctness of our refactored implementation is also needed. Unfortunately, geo-test-fixtures is not published to crates.io, so we have to copy it to our own projects to use them. The WKT files were already committed to the apache/sedona-testing repository (see apache/sedona-testing#9), which is a submodule of sedona-db. We add fixtures for loading the WKT files in the submodule to sedona-testing. These newly added fixtures will be used by future PRs.
This is part of the forked dependency elimination plan: #165. This PR depends on #193. This PR moves geo-traits-ext from wherobots/geo to sedona-db and renamed it to sedona-geo-traits-ext. Currently, it is a standalone crate and can be compiled using `cd rust/sedona-geo-traits-ext && cargo build`. We'll update the Cargo.toml files in the final step to make it live.
This is part of the forked dependency elimination plan: #165. This PR depends on #194. This PR moves geo-generic-alg from wherobots/geo to sedona-db and renamed it to sedona-geo-generic-alg. Currently it is a standalone crate and can be compiled using `cd rust/sedona-geo-generic-alg && cargo build`. We'll update the Cargo.toml files in the final step to make it live. The code moved to sedona-db only contains the algorithms actually used by other parts of the project. There's a backup containing all ported algorithms here: https://github.com/Kontinuation/sedona-db/tree/full-migrate-generic-alg.
This PR migrates generic geo algorithms in wherobots/geo's geo-traits-ext and geo-generic-alg crates into sedona-db, and switches depended versions of geo, wkb and wkt to official release versions.
Detailed changes involved:
Upstream changes: