Skip to content

Indexes 4: Adds Indexes for SPK repositories#1339

Open
dcookspi wants to merge 1 commit intoindex-3-flatbuffer-and-solver-package-specfrom
index-4-indexed-repository-and-fbindex
Open

Indexes 4: Adds Indexes for SPK repositories#1339
dcookspi wants to merge 1 commit intoindex-3-flatbuffer-and-solver-package-specfrom
index-4-indexed-repository-and-fbindex

Conversation

@dcookspi
Copy link
Copy Markdown
Collaborator

@dcookspi dcookspi commented Mar 19, 2026

Note: for info on benefits of indexing for spk solves see #1340 (5 of 5). Maybe start there and work back down to this PR if you prefer to review PRs top down.

This adds a new Indexed repository type, RepositoryIndex traits, a RepoIndex enum, and a flatbuffer index implementation in FlatBufferRepoIndex, along with some configuration. The indexing is designed to help the solvers. An index only contains information useful for solving. It does not contain enough information to do other things, such as build or test the packages.

The hierarchy of new types is roughly:

RepositoryHandle (existing spk repos)
   Indexed 
       RepoIndex
           FlatBufferRepoIndex  (makes SolverPackageSpecs, a v0 package alternate, see PR 3 of 5 below)
                flatbuffers schema (see PR 3 of 5 below)

This also updates the solver and repo based tests to include the new Indexed repo. This does not add command line tools or solver interaction for indexes. Those are in the last PR in this chain.

This is the change that adds indexes and index based packages to Spk repositories.

This is 4 of 5 chained PRs for adding indexes to spk solves:

  1. Indexes 1: Change Package and related traits to not return references to fields #1336
  2. Indexes 2: Add new_unchecked() constructors to spk schema objects #1337
  3. Indexes 3: Adds flatbuffers schema and SolverPackageSpec for indexes to spk #1338
  4. this PR
  5. Indexes 5: Adds spk repo index subcommand for index generation and updates #1340

@dcookspi dcookspi self-assigned this Mar 19, 2026
@dcookspi dcookspi added enhancement New feature or request SPI AOI Area of interest for SPI pr-chain This PR doesn't target the main branch, don't merge! labels Mar 19, 2026
Comment on lines +90 to +94

/// Whether to get the solver to use repository indexes instead of
/// the repository directly.
pub use_indexes: bool,

Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of these config settings are used in the PR, some are used in the last PR in the chain: #1340

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs for these have been added to PR 5 #1340

Comment on lines +89 to +94
// The default number of tables seems to be 1 million, and
// that isn't enough for the current index. Tried 100
// million and that didn't error, but this is not a great
// solution long term, especially once VersionFilters are
// converted from strings.
max_tables: MAX_FLATBUFFER_TABLES,
Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may become a problem longer term in repositories that do no delete old unused deprecated packages. It may also be a problem if VersionFilters are properly represented in the flatbuffer schema, instead of leaving them as strings (I think that'll be worth it, but it's something to keep in mind).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of the idea to do the verification after writing a new index file, can we find out how many tables were used and print a warning if it is some percentage of the maximum?

Comment on lines +686 to +687
// Does not store any more details because the package
// is deprecated.
Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that deprecated packages in the index cannot be safely used in current solves as things stand. Because they have no install requirements in the index, they will short-circuit any solve they are resolved in. This is a problem for solves that ask for a deprecated build specifically. However, the number of times this happens is vanishingly small in our experience. There are several options to address this:

  • add a --deprecated flag that disables index use if deprecated packages are wanted (it turns out this does not exist for solves at the moment) add a --deprecated flag that disables index use if deprecated packages are wanted (it turns out this does not exist for solves at the moment). These solves would lack the benefit of any index
  • add a way of marking a specific request as allowing deprecated builds, like the --deprecated but for that request only, like spk has for prerelease policies in requests. This could either disable index use, or we could split the solvers processing to first ask the repo if the build is deprecated, and then load the package data from the index or underlying repo as required.
  • include the full solver require data for deprecated packages in the index, which increases the index size, loading time and generation time. The increases might be managable but the data would be unused for most solves.
  • something else?

This probably needs a separate discussion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Another option: have 2 indexes, one for non-deprecated and one for deprecated things, and access the appropriate index as required. It's unlikely that lots of deprecated builds will be pulled into most solves. This idea might be suitable to use with the "ask the repo if the build is deprecated before deciding whether to fully load it" option above.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another (a subset of the second one above) reverse and split the solvers processing to: first, ask the repo if the build is deprecated, and then load the package data if it needs the package. An Indexed repo, when asked to read a pacakge, could get the data from the index for non-deprecated packages, and from underlying repo for deprecated ones. The solver wouldn't need to know where the package is coming from, but it would need to change the current order of operations for deprecated checks in the current solvers.

Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date now. I tested adding all the deprecated data to the index, and while it increases the index side, it doesn't impact the solve times (see PR5 #1340 for some times).

  • Update the comment in the code

Comment on lines +688 to +690
// TODO: might want to add something to load them from
// underlying repo if required, such as when the
// --deprecate flag is set.
Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out there isn't one (a --deprecate flag) for spk env, build or explain, just for ls and search.

Comment on lines +58 to +60
pub enum RepoIndex {
Flat(FlatBufferRepoIndex),
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were more kinds of index during development and testing. There may be more in future.

@dcookspi
Copy link
Copy Markdown
Collaborator Author

I can't replicate the test failures locally at the moment. I'll dig into them further tomororw.

@dcookspi dcookspi requested review from jrray and rydrman March 20, 2026 00:50
@dcookspi dcookspi force-pushed the index-3-flatbuffer-and-solver-package-spec branch from 47ea722 to af8dcc2 Compare March 20, 2026 19:29
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch 2 times, most recently from 68bb519 to 1e28bf4 Compare March 20, 2026 19:53
assert_repo_and_index_have_same_packages(repo2, indexed_repo2).await;
}

// TODO: add rest of solves sample repos to this as tests
Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repos used in these tests (so far) are copied from the solver_test.rs tests. There are only a few of them, but the others could be duplicated into more tests here, or all the test repo setups could be moved to somewhere shared between spk-storage and spk-solve.

At the moment, there is a difference between the repo setup for the standard Repository tests (that make_repo() fixture function), and the Solver tests (that use the make_repo!() macro). That difference should probably be removed in future, it's a minor irritation when adding tests.

@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from 1e28bf4 to 854446a Compare March 21, 2026 01:07
@dcookspi
Copy link
Copy Markdown
Collaborator Author

I can't replicate the test failures locally at the moment. I'll dig into them further tomororw.

The issue was related to deprecated packages and the migration-to-components feature. I've put in a temporary fix so the tests pass for now. But it should not get merged as is. A direction needs to be decided with regard to deprecated packages, solves, and what to index in order to replace the temp fix with a better solution. See #1339 (comment) comment above.

@dcookspi dcookspi force-pushed the index-3-flatbuffer-and-solver-package-spec branch from af8dcc2 to 53f24d1 Compare March 25, 2026 01:07
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch 2 times, most recently from a58142a to b32582c Compare March 25, 2026 17:08
@dcookspi dcookspi force-pushed the index-3-flatbuffer-and-solver-package-spec branch from 53f24d1 to cb733cd Compare March 27, 2026 19:09
… flabuffers index and configuration

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from b32582c to 1a43f64 Compare March 27, 2026 19:30
/// Name of the solver whose output to show when multiple solvers are being run.
pub solver_to_show: String,

/// Whether to get the solver to use repository indexes instead of
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Whether to get the solver to use repository indexes instead of
/// Whether to let the solver use repository indexes instead of

This sounds more natural IMO.

/// Whether to validate flatbuffers index data before using it.
/// Validating is safer but adds some overhead at the start of a
/// solve when using an index.
pub verify_flatbuffers_index_before_use: bool,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward to both try to be generic about what index types are available but also have "top-level" configuration options that are specific to one index type.

For that reason I would suggest having a sub-struct for containing options pertaining to the flatbuffers index.

Calling it the flatbuffers index is a separate concern. Who's to say some other index wouldn't get created and also choose to use flatbuffers for the storage representation? In practice I don't see it being likely that we'll have competing index implementations, but it is nice to leave the possibility open.

// Exclude this deprecated build.
let reason = provider
.pool
.intern_string(format!("{} is deprecated", ident));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.intern_string(format!("{} is deprecated", ident));
.intern_string(format!("{ident} is deprecated"));

Comment on lines +197 to 206
#[case::step(step_solver(), false)]
#[case::step_indexed(step_solver(), true)]
#[case::resolvo(resolvo_solver(), false)]
#[case::resolvo_indexed(resolvo_solver(), true)]
#[tokio::test]
async fn test_solver_package_with_no_recipe(
#[case] mut solver: SolverImpl,
#[case] use_index: bool,
random_build_id: BuildId,
) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[case::step(step_solver(), false)]
#[case::step_indexed(step_solver(), true)]
#[case::resolvo(resolvo_solver(), false)]
#[case::resolvo_indexed(resolvo_solver(), true)]
#[tokio::test]
async fn test_solver_package_with_no_recipe(
#[case] mut solver: SolverImpl,
#[case] use_index: bool,
random_build_id: BuildId,
) {
#[case::step(step_solver())]
#[case::resolvo(resolvo_solver())]
#[tokio::test]
async fn test_solver_package_with_no_recipe(
#[case] mut solver: SolverImpl,
#[values(true, false)] use_index: bool,
random_build_id: BuildId,
) {

I'd prefer this pattern for simplicity / extensibility. I was thinking we'd have a declarative macro to help with removing boilerplate on every test.

repo.publish_package(&spec.into(), &components)
.await
.unwrap();
let repo = wrap_repo_for_test(repo, use_index).await;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal that every test has to remember to call this, though I guess there would be a lint that use_index is unused if that's the case.

#[tokio::test]
async fn test_solver_component_embedded_multiple_versions(
#[values(step_solver(), resolvo_solver())] mut solver: SolverImpl,
#[values(false, true)] use_index: bool,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ayyy...

You could argue this is better than defining cases for the solver flavors.

Comment on lines +89 to +94
// The default number of tables seems to be 1 million, and
// that isn't enough for the current index. Tried 100
// million and that didn't error, but this is not a great
// solution long term, especially once VersionFilters are
// converted from strings.
max_tables: MAX_FLATBUFFER_TABLES,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of the idea to do the verification after writing a new index file, can we find out how many tables were used and print a warning if it is some percentage of the maximum?

}

// Create the new index file with the correct permissions
if let Err(err) = tokio::fs::write(&filepath, builder.finished_data()).await {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to write to a temp file first and rename if successful.

use crate::{IndexedRepository, RepoWalkerBuilder, RepoWalkerItem, RepositoryHandle};

// A copy of the one in spk-solve-macros because using it directly
// here cases an import loop
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// here cases an import loop
// here causes an import loop

matches!(self, Self::Runtime(_))
}

pub fn is_indexed(&self) -> bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add Variantly to the derive list you get this method for free.

@dcookspi
Copy link
Copy Markdown
Collaborator Author

I can't replicate the test failures locally at the moment. I'll dig into them further tomororw.

These have been fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pr-chain This PR doesn't target the main branch, don't merge! SPI AOI Area of interest for SPI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants