SOLR-18144: Create .system collection at runtime if missing#4188
SOLR-18144: Create .system collection at runtime if missing#4188epugh wants to merge 8 commits intoapache:branch_9xfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a Solr 9.x regression for Schema Designer’s blob-store usage by ensuring the .system collection exists at runtime so sample-document storage and cleanup don’t fail when the collection is missing.
Changes:
- Add a helper to create the
.systemcollection on demand and wait for it to appear in cluster state. - Call the helper before deleting and storing sample docs in the blob store.
- Treat certain blob-read failures as “no stored docs” by catching
SolrExceptionand returning an empty list.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java
Outdated
Show resolved
Hide resolved
| private void ensureSystemCollectionExists() throws IOException, SolrServerException { | ||
| if (!zkStateReader().getClusterState().hasCollection(BLOB_STORE_ID)) { | ||
| log.info("Creating {} collection for blob storage", BLOB_STORE_ID); | ||
| CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, 1).process(cloudClient()); |
There was a problem hiding this comment.
Creating the .system collection with a hard-coded replication factor of 1 may be less resilient than Solr's built-in auto-create path (e.g., HttpSolrCall.autoCreateSystemColl uses min(3, liveNodes) for replication factor). Consider deriving the replication factor from live node count or cluster defaults to avoid creating an under-replicated blob store collection on multi-node clusters.
| CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, 1).process(cloudClient()); | |
| int liveNodes = zkStateReader().getClusterState().getLiveNodes().size(); | |
| int replicationFactor = Math.max(1, Math.min(3, liveNodes)); | |
| CollectionAdminRequest.createCollection(BLOB_STORE_ID, null, 1, replicationFactor) | |
| .process(cloudClient()); |
There was a problem hiding this comment.
I mean... okay.. I guess I think of the .system as just supporting the schema designer and does it need "all that", but on the other okay...
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java
Outdated
Show resolved
Hide resolved
| void deleteStoredSampleDocs(String configSet) { | ||
| try { | ||
| ensureSystemCollectionExists(); | ||
| cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + "_sample/*", 10); | ||
| } catch (IOException | SolrServerException | SolrException exc) { |
There was a problem hiding this comment.
deleteStoredSampleDocs now auto-creates .system before attempting a delete. If .system is missing, deletion is effectively a no-op, but this will instead create a new system collection during cleanup flows (e.g., SchemaDesignerAPI.cleanupTemp). Consider skipping ensureSystemCollectionExists() here and treating missing collection/blob as nothing to delete (similar to getStoredSampleDocs).
| void storeSampleDocs(final String configSet, List<SolrInputDocument> docs) throws IOException { | ||
| try { | ||
| ensureSystemCollectionExists(); | ||
| } catch (SolrServerException e) { | ||
| throw new IOException("Failed to ensure .system collection exists", e); | ||
| } |
There was a problem hiding this comment.
This PR adds runtime creation of .system, but the existing schema designer tests always pre-create the blob store collection in @BeforeClass, so the new behavior isn't exercised. Consider updating/adding a test that starts without .system and asserts that storeSampleDocs (and/or other blob interactions) creates it successfully.
janhoy
left a comment
There was a problem hiding this comment.
Sound comments from Copilot, I believe all of them.
Since this was a regression not caught by tests (since tests bootstrap the collection in a different way), perhaps change some test so that it fails without the fix, then we get increased test coverage and avoid future regressions.
…gnerConfigSetHelper.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…gnerConfigSetHelper.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Since this is only on 9x, and I am not sure any of this moves to 10x, I'm not really feeling the "add test coverage and avoid future regressiosn".. I could of course regret this. Can we live with that and merge this? |
|
Added @dsmiley as reviewer since you had opinions on the JIRA |
|
I did think about a bats integration test to confirm the multiple steps between back end and front end that could be applied to |
Agree that as a 9x only solution, and isolated to Schema designer, this is pretty confined. Some test coverage is ok, but I'm ok with some test that reproduces the current bug (missing coll), and then becomes green. Could be a simple test that checks that |
…up to not pre create the .system
dsmiley
left a comment
There was a problem hiding this comment.
Please see my proposed simple solution in JIRA.
https://issues.apache.org/jira/browse/SOLR-18144
Description
Fix regression that is 9x only.
Solution
Kiro nailed it in five minutes. Then I manually tested and ran the unit tests. Then, simplified the test setup, which has the side effect of confirming the fix!
Tests
existing, but then improved.