Indexes 4: Adds Indexes for SPK repositories#1339
Indexes 4: Adds Indexes for SPK repositories#1339dcookspi wants to merge 1 commit intoindex-3-flatbuffer-and-solver-package-specfrom
Conversation
|
|
||
| /// Whether to get the solver to use repository indexes instead of | ||
| /// the repository directly. | ||
| pub use_indexes: bool, | ||
|
|
| // The default number of tables seems to be 1 million, and | ||
| // that isn't enough for the current index. Tried 100 | ||
| // million and that didn't error, but this is not a great | ||
| // solution long term, especially once VersionFilters are | ||
| // converted from strings. | ||
| max_tables: MAX_FLATBUFFER_TABLES, |
There was a problem hiding this comment.
This may become a problem longer term in repositories that do no delete old unused deprecated packages. It may also be a problem if VersionFilters are properly represented in the flatbuffer schema, instead of leaving them as strings (I think that'll be worth it, but it's something to keep in mind).
There was a problem hiding this comment.
Along the lines of the idea to do the verification after writing a new index file, can we find out how many tables were used and print a warning if it is some percentage of the maximum?
| // Does not store any more details because the package | ||
| // is deprecated. |
There was a problem hiding this comment.
This means that deprecated packages in the index cannot be safely used in current solves as things stand. Because they have no install requirements in the index, they will short-circuit any solve they are resolved in. This is a problem for solves that ask for a deprecated build specifically. However, the number of times this happens is vanishingly small in our experience. There are several options to address this:
- add a
--deprecatedflag that disables index use if deprecated packages are wanted (it turns out this does not exist for solves at the moment) add a--deprecatedflag that disables index use if deprecated packages are wanted (it turns out this does not exist for solves at the moment). These solves would lack the benefit of any index - add a way of marking a specific request as allowing deprecated builds, like the --deprecated but for that request only, like spk has for prerelease policies in requests. This could either disable index use, or we could split the solvers processing to first ask the repo if the build is deprecated, and then load the package data from the index or underlying repo as required.
- include the full solver require data for deprecated packages in the index, which increases the index size, loading time and generation time. The increases might be managable but the data would be unused for most solves.
- something else?
This probably needs a separate discussion.
There was a problem hiding this comment.
- Another option: have 2 indexes, one for non-deprecated and one for deprecated things, and access the appropriate index as required. It's unlikely that lots of deprecated builds will be pulled into most solves. This idea might be suitable to use with the "ask the repo if the build is deprecated before deciding whether to fully load it" option above.
There was a problem hiding this comment.
And another (a subset of the second one above) reverse and split the solvers processing to: first, ask the repo if the build is deprecated, and then load the package data if it needs the package. An Indexed repo, when asked to read a pacakge, could get the data from the index for non-deprecated packages, and from underlying repo for deprecated ones. The solver wouldn't need to know where the package is coming from, but it would need to change the current order of operations for deprecated checks in the current solvers.
There was a problem hiding this comment.
This comment is out of date now. I tested adding all the deprecated data to the index, and while it increases the index side, it doesn't impact the solve times (see PR5 #1340 for some times).
- Update the comment in the code
| // TODO: might want to add something to load them from | ||
| // underlying repo if required, such as when the | ||
| // --deprecate flag is set. |
There was a problem hiding this comment.
It turns out there isn't one (a --deprecate flag) for spk env, build or explain, just for ls and search.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| pub enum RepoIndex { | ||
| Flat(FlatBufferRepoIndex), | ||
| } |
There was a problem hiding this comment.
There were more kinds of index during development and testing. There may be more in future.
00757f2 to
8530adc
Compare
|
I can't replicate the test failures locally at the moment. I'll dig into them further tomororw. |
47ea722 to
af8dcc2
Compare
68bb519 to
1e28bf4
Compare
| assert_repo_and_index_have_same_packages(repo2, indexed_repo2).await; | ||
| } | ||
|
|
||
| // TODO: add rest of solves sample repos to this as tests |
There was a problem hiding this comment.
The repos used in these tests (so far) are copied from the solver_test.rs tests. There are only a few of them, but the others could be duplicated into more tests here, or all the test repo setups could be moved to somewhere shared between spk-storage and spk-solve.
At the moment, there is a difference between the repo setup for the standard Repository tests (that make_repo() fixture function), and the Solver tests (that use the make_repo!() macro). That difference should probably be removed in future, it's a minor irritation when adding tests.
1e28bf4 to
854446a
Compare
The issue was related to deprecated packages and the migration-to-components feature. I've put in a temporary fix so the tests pass for now. But it should not get merged as is. A direction needs to be decided with regard to deprecated packages, solves, and what to index in order to replace the temp fix with a better solution. See #1339 (comment) comment above. |
af8dcc2 to
53f24d1
Compare
a58142a to
b32582c
Compare
53f24d1 to
cb733cd
Compare
… flabuffers index and configuration Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
b32582c to
1a43f64
Compare
| /// Name of the solver whose output to show when multiple solvers are being run. | ||
| pub solver_to_show: String, | ||
|
|
||
| /// Whether to get the solver to use repository indexes instead of |
There was a problem hiding this comment.
| /// Whether to get the solver to use repository indexes instead of | |
| /// Whether to let the solver use repository indexes instead of |
This sounds more natural IMO.
| /// Whether to validate flatbuffers index data before using it. | ||
| /// Validating is safer but adds some overhead at the start of a | ||
| /// solve when using an index. | ||
| pub verify_flatbuffers_index_before_use: bool, |
There was a problem hiding this comment.
It's a bit awkward to both try to be generic about what index types are available but also have "top-level" configuration options that are specific to one index type.
For that reason I would suggest having a sub-struct for containing options pertaining to the flatbuffers index.
Calling it the flatbuffers index is a separate concern. Who's to say some other index wouldn't get created and also choose to use flatbuffers for the storage representation? In practice I don't see it being likely that we'll have competing index implementations, but it is nice to leave the possibility open.
| // Exclude this deprecated build. | ||
| let reason = provider | ||
| .pool | ||
| .intern_string(format!("{} is deprecated", ident)); |
There was a problem hiding this comment.
| .intern_string(format!("{} is deprecated", ident)); | |
| .intern_string(format!("{ident} is deprecated")); |
| #[case::step(step_solver(), false)] | ||
| #[case::step_indexed(step_solver(), true)] | ||
| #[case::resolvo(resolvo_solver(), false)] | ||
| #[case::resolvo_indexed(resolvo_solver(), true)] | ||
| #[tokio::test] | ||
| async fn test_solver_package_with_no_recipe( | ||
| #[case] mut solver: SolverImpl, | ||
| #[case] use_index: bool, | ||
| random_build_id: BuildId, | ||
| ) { |
There was a problem hiding this comment.
| #[case::step(step_solver(), false)] | |
| #[case::step_indexed(step_solver(), true)] | |
| #[case::resolvo(resolvo_solver(), false)] | |
| #[case::resolvo_indexed(resolvo_solver(), true)] | |
| #[tokio::test] | |
| async fn test_solver_package_with_no_recipe( | |
| #[case] mut solver: SolverImpl, | |
| #[case] use_index: bool, | |
| random_build_id: BuildId, | |
| ) { | |
| #[case::step(step_solver())] | |
| #[case::resolvo(resolvo_solver())] | |
| #[tokio::test] | |
| async fn test_solver_package_with_no_recipe( | |
| #[case] mut solver: SolverImpl, | |
| #[values(true, false)] use_index: bool, | |
| random_build_id: BuildId, | |
| ) { |
I'd prefer this pattern for simplicity / extensibility. I was thinking we'd have a declarative macro to help with removing boilerplate on every test.
| repo.publish_package(&spec.into(), &components) | ||
| .await | ||
| .unwrap(); | ||
| let repo = wrap_repo_for_test(repo, use_index).await; |
There was a problem hiding this comment.
It's not ideal that every test has to remember to call this, though I guess there would be a lint that use_index is unused if that's the case.
| #[tokio::test] | ||
| async fn test_solver_component_embedded_multiple_versions( | ||
| #[values(step_solver(), resolvo_solver())] mut solver: SolverImpl, | ||
| #[values(false, true)] use_index: bool, |
There was a problem hiding this comment.
Ayyy...
You could argue this is better than defining cases for the solver flavors.
| // The default number of tables seems to be 1 million, and | ||
| // that isn't enough for the current index. Tried 100 | ||
| // million and that didn't error, but this is not a great | ||
| // solution long term, especially once VersionFilters are | ||
| // converted from strings. | ||
| max_tables: MAX_FLATBUFFER_TABLES, |
There was a problem hiding this comment.
Along the lines of the idea to do the verification after writing a new index file, can we find out how many tables were used and print a warning if it is some percentage of the maximum?
| } | ||
|
|
||
| // Create the new index file with the correct permissions | ||
| if let Err(err) = tokio::fs::write(&filepath, builder.finished_data()).await { |
There was a problem hiding this comment.
We need to write to a temp file first and rename if successful.
| use crate::{IndexedRepository, RepoWalkerBuilder, RepoWalkerItem, RepositoryHandle}; | ||
|
|
||
| // A copy of the one in spk-solve-macros because using it directly | ||
| // here cases an import loop |
There was a problem hiding this comment.
| // here cases an import loop | |
| // here causes an import loop |
| matches!(self, Self::Runtime(_)) | ||
| } | ||
|
|
||
| pub fn is_indexed(&self) -> bool { |
There was a problem hiding this comment.
If you add Variantly to the derive list you get this method for free.
These have been fixed now. |
Note: for info on benefits of indexing for spk solves see #1340 (5 of 5). Maybe start there and work back down to this PR if you prefer to review PRs top down.
This adds a new
Indexedrepository type,RepositoryIndextraits, aRepoIndexenum, and a flatbuffer index implementation inFlatBufferRepoIndex, along with some configuration. The indexing is designed to help the solvers. An index only contains information useful for solving. It does not contain enough information to do other things, such as build or test the packages.The hierarchy of new types is roughly:
This also updates the solver and repo based tests to include the new Indexed repo. This does not add command line tools or solver interaction for indexes. Those are in the last PR in this chain.
This is the change that adds indexes and index based packages to Spk repositories.
This is 4 of 5 chained PRs for adding indexes to spk solves:
new_unchecked()constructors to spk schema objects #1337spk repo indexsubcommand for index generation and updates #1340