Skip to content

SOLR-18144: Create .system collection at runtime if missing#4188

Open
epugh wants to merge 8 commits intoapache:branch_9xfrom
epugh:SOLR-18144
Open

SOLR-18144: Create .system collection at runtime if missing#4188
epugh wants to merge 8 commits intoapache:branch_9xfrom
epugh:SOLR-18144

Conversation

@epugh
Copy link
Contributor

@epugh epugh commented Mar 4, 2026

https://issues.apache.org/jira/browse/SOLR-18144

Description

Fix regression that is 9x only.

Solution

Kiro nailed it in five minutes. Then I manually tested and ran the unit tests. Then, simplified the test setup, which has the side effect of confirming the fix!

Tests

existing, but then improved.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Solr 9.x regression for Schema Designer’s blob-store usage by ensuring the .system collection exists at runtime so sample-document storage and cleanup don’t fail when the collection is missing.

Changes:

  • Add a helper to create the .system collection on demand and wait for it to appear in cluster state.
  • Call the helper before deleting and storing sample docs in the blob store.
  • Treat certain blob-read failures as “no stored docs” by catching SolrException and returning an empty list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

private void ensureSystemCollectionExists() throws IOException, SolrServerException {
if (!zkStateReader().getClusterState().hasCollection(BLOB_STORE_ID)) {
log.info("Creating {} collection for blob storage", BLOB_STORE_ID);
CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, 1).process(cloudClient());
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the .system collection with a hard-coded replication factor of 1 may be less resilient than Solr's built-in auto-create path (e.g., HttpSolrCall.autoCreateSystemColl uses min(3, liveNodes) for replication factor). Consider deriving the replication factor from live node count or cluster defaults to avoid creating an under-replicated blob store collection on multi-node clusters.

Suggested change
CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, 1).process(cloudClient());
int liveNodes = zkStateReader().getClusterState().getLiveNodes().size();
int replicationFactor = Math.max(1, Math.min(3, liveNodes));
CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, replicationFactor)
.process(cloudClient());

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... okay.. I guess I think of the .system as just supporting the schema designer and does it need "all that", but on the other okay...

Comment on lines 475 to 479
void deleteStoredSampleDocs(String configSet) {
try {
ensureSystemCollectionExists();
cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + "_sample/*", 10);
} catch (IOException | SolrServerException | SolrException exc) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteStoredSampleDocs now auto-creates .system before attempting a delete. If .system is missing, deletion is effectively a no-op, but this will instead create a new system collection during cleanup flows (e.g., SchemaDesignerAPI.cleanupTemp). Consider skipping ensureSystemCollectionExists() here and treating missing collection/blob as nothing to delete (similar to getStoredSampleDocs).

Copilot uses AI. Check for mistakes.
Comment on lines 525 to +530
void storeSampleDocs(final String configSet, List<SolrInputDocument> docs) throws IOException {
try {
ensureSystemCollectionExists();
} catch (SolrServerException e) {
throw new IOException("Failed to ensure .system collection exists", e);
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds runtime creation of .system, but the existing schema designer tests always pre-create the blob store collection in @BeforeClass, so the new behavior isn't exercised. Consider updating/adding a test that starts without .system and asserts that storeSampleDocs (and/or other blob interactions) creates it successfully.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@janhoy janhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound comments from Copilot, I believe all of them.

Since this was a regression not caught by tests (since tests bootstrap the collection in a different way), perhaps change some test so that it fails without the fix, then we get increased test coverage and avoid future regressions.

@epugh epugh changed the title Create .system collection at runtime if missing SOLR-18144: Create .system collection at runtime if missing Mar 5, 2026
epugh and others added 6 commits March 5, 2026 07:21
@epugh
Copy link
Contributor Author

epugh commented Mar 5, 2026

Sound comments from Copilot, I believe all of them.

Since this was a regression not caught by tests (since tests bootstrap the collection in a different way), perhaps change some test so that it fails without the fix, then we get increased test coverage and avoid future regressions.

Sound comments from Copilot, I believe all of them.

Since this was a regression not caught by tests (since tests bootstrap the collection in a different way), perhaps change some test so that it fails without the fix, then we get increased test coverage and avoid future regressions.

Since this is only on 9x, and I am not sure any of this moves to 10x, I'm not really feeling the "add test coverage and avoid future regressiosn".. I could of course regret this. Can we live with that and merge this?

@janhoy janhoy requested a review from dsmiley March 6, 2026 07:47
@janhoy
Copy link
Contributor

janhoy commented Mar 6, 2026

Added @dsmiley as reviewer since you had opinions on the JIRA

@epugh
Copy link
Contributor Author

epugh commented Mar 6, 2026

I did think about a bats integration test to confirm the multiple steps between back end and front end that could be applied to main...

@janhoy
Copy link
Contributor

janhoy commented Mar 6, 2026

I did think about a bats integration test to confirm the multiple steps between back end and front end that could be applied to main...

Agree that as a 9x only solution, and isolated to Schema designer, this is pretty confined. Some test coverage is ok, but I'm ok with some test that reproduces the current bug (missing coll), and then becomes green. Could be a simple test that checks that .system is not there, then calls some method and verifies the collection gets created.

@github-actions github-actions bot added the tests label Mar 6, 2026
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my proposed simple solution in JIRA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants