Support setting-aware project identifier casing with case-insensitive uniqueness#22406
Support setting-aware project identifier casing with case-insensitive uniqueness#22406
Conversation
dc0a497 to
dbed01e
Compare
855b75c to
36bacc6
Compare
54a420a to
e55e2c6
Compare
36bacc6 to
df753f1
Compare
df753f1 to
128ae0a
Compare
7bb5773 to
1babb4b
Compare
128ae0a to
6541f9b
Compare
There was a problem hiding this comment.
Pull request overview
Adds setting-aware normalization and case-insensitive uniqueness for Project.identifier, aligning storage casing with Setting::WorkPackageIdentifier and updating related autofix preview and MCP search behaviors.
Changes:
- Normalize
Project.identifierto uppercase (alphanumeric mode) or lowercase (numeric mode) and make uniqueness case-insensitive. - Add a
LOWER(identifier)unique index (including a concurrent migration) and update initial table definition for fresh installs. - Extend identifier-autofix preview classification + translations, and update MCP tool schemas/specs for case-insensitive identifier searches.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| app/models/projects/identifier.rb | Implements casing normalization, case-insensitive uniqueness, FriendlyId input normalization, and updates historical identifier checks. |
| lib/open_project/acts_as_url/adapter/op_active_record.rb | Adds optional post_process support to apply controlled casing transforms to generated slugs. |
| db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb | Adds concurrent migration switching to a LOWER(identifier) unique index. |
| db/migrate/tables/projects.rb | Updates initial schema definition to use the LOWER(identifier) unique index. |
| app/services/work_packages/identifier_autofix/preview_query.rb | Expands “problematic identifier” detection and error classification for semantic identifiers. |
| config/locales/en.yml | Adds new autofix preview error labels for additional classifications. |
| spec/services/work_packages/identifier_autofix/preview_query_spec.rb | Updates preview query specs to bypass normalization and cover new classifications. |
| spec/models/projects/identifier_spec.rb | Adds normalization and case-insensitive collision specs; switches to rails_helper. |
| spec/features/projects/create_spec.rb | Strengthens feature coverage for alphanumeric identifier auto-suggestion behavior. |
| app/services/mcp_tools/search_projects.rb | Updates tool schema description to case-insensitive identifier matching. |
| app/services/mcp_tools/search_programs.rb | Updates tool schema description to case-insensitive identifier matching. |
| app/services/mcp_tools/search_portfolios.rb | Updates tool schema description to case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_projects_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_programs_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_portfolios_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/models/project_spec.rb | Removes a stray blank line at EOF. |
spec/services/work_packages/identifier_autofix/preview_query_spec.rb
Outdated
Show resolved
Hide resolved
6541f9b to
6aedcab
Compare
Introduce case-insensitive handling for project identifiers: - Add `normalizes :identifier` to automatically upcase (alphanumeric mode) or downcase (numeric mode) identifiers on assignment - Add `parse_friendly_id` to normalize FriendlyId lookups for case-insensitive finder queries - Switch uniqueness validation to `case_sensitive: false` - Replace inline `exclusion:` validator with explicit `identifier_not_reserved` that checks case-insensitively - Consolidate alphanumeric format validators into a single `identifier_alphanumeric_format` method with early return to prevent cascading error messages - Use case-insensitive LOWER() comparison in historical identifier reservation check - Add `post_process` support to the OpActiveRecord acts_as_url adapter with an allowlist of safe transforms (upcase/downcase) - Add migration to replace the unique index on projects.identifier with a case-insensitive LOWER(identifier) index - Update table definition to match the new index Includes corresponding test additions for normalization, case- insensitive uniqueness, reserved identifier rejection, and the create_spec fixture fix for alphanumeric mode.
a1d5166 to
2ca7187
Compare
Expand PreviewQuery to detect additional alphanumeric identifier format violations: case-mismatch, underscores, leading digits, and purely numerical identifiers. Restructure error classification with FORMAT_RULES for clarity and add corresponding locale strings. Update MCP tool descriptions for search_projects, search_programs, and search_portfolios to reflect case-insensitive identifier matching, with updated test expectations.
2ca7187 to
91362fb
Compare
Separate scope-building, identifier classification, and exclusion set logic into a reusable class that both PreviewQuery (admin UI) and the future ApplyHandlesJob (batch migration) can compose from. Key changes: - ProblematicIdentifiers owns FORMAT_RULES, problematic scope, error classification, and a dual-mode exclusion set (DB-backed ExclusionSet for preview, preloaded Set for batch) - PreviewQuery becomes a thin orchestrator delegating to ProblematicIdentifiers - Add deterministic ordering (.order(:id)) to preview results - Rename :not_uppercase → :not_fully_uppercased to match scope method naming, update corresponding en.yml translation key - ExclusionSet uses raw SQL to bypass Rails normalizes on :identifier which would transform query parameters based on current setting mode
Replace the TODO stub in ProblematicIdentifiers#reserved_identifiers with a real query against the friendly_id_slugs table. Historical slugs (identifiers a project used in the past but has since changed) are now excluded from suggestion generation and classified as :reserved in error_reason output. The query excludes slugs that match a current active project identifier (those are already covered by the in_use exclusion path).
Drop the DB-backed ExclusionSet in favour of a simple Set from pluck. This is a one-off admin migration — the brief memory cost of loading all non-problematic identifiers is not worth the added complexity of a custom duck-typed wrapper. Also adds performance notes documenting index considerations for the regex scope conditions and the eager-load trade-off.
839cea0 to
79209e2
Compare
The production spec explicitly allows underscores in alphanumeric identifiers. PreviewQuery's :special_characters rule and SQL scope incorrectly flagged them. Updated regex from [^a-zA-Z0-9] to [^a-zA-Z0-9_] in both the FORMAT_RULES lambda and the contains_non_alphanumeric scope. Also added a comment to set_raw_identifier explaining why Arel.sql is necessary (update_all applies normalizes in this Rails version).
…served LOWER(slug) = LOWER(?) is unnecessary because normalizes :identifier ensures consistent casing before FriendlyId records the slug. Plain equality uses the existing composite index on (slug, sluggable_type).
Corrected error_too_long message from "fewer than 5" to "10 or fewer"
to match SEMANTIC_IDENTIFIER_MAX_LENGTH.
Replaced be_present with specific assertions on item count and
identifier in MCP search_{projects,programs,portfolios} specs to
ensure the correct resource is returned, not just any resource.
Per code review: the model should not auto-transform identifier casing on every read/write. Format validators already enforce correct casing (lowercase for numeric, uppercase for alphanumeric), and the background job will handle migrating existing identifiers. Restores LOWER() in identifier_not_historically_reserved since without normalizes, slugs and identifiers may differ in case.
Without parse_friendly_id normalizing case, MCP search lookups are case-sensitive again. Reverts description changes and removes the case-variant identifier test contexts.
These were built on the normalizes premise which has been removed. In alphanumeric mode, SuggestionGenerator handles identifier generation rather than acts_as_url. The API PR (22417) will handle the proper wiring between modes.
The underscore fix (allowing _ in identifiers per spec) needs to target ProblematicIdentifiers instead of PreviewQuery after the extraction refactor on the target branch.
0738f98 to
5c7bed1
Compare
| add_index :projects, "LOWER(identifier)", | ||
| unique: true, | ||
| name: "index_projects_on_lower_identifier", | ||
| algorithm: :concurrently, |
There was a problem hiding this comment.
I wonder about this. Do you think it's necessary to have concurrently built index here? I mean, the projects table should be rather small (<10.000), so index creation should be fast...
https://www.postgresql.org/docs/current/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished.
In a concurrent index build, the index is actually entered as an “invalid” index into the system catalogs in one transaction, then two table scans occur in two more transactions. [...]
If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index. This index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead.
Not sure what the gain is but the pain could be potentially high (having an invalid index that does not work)
There was a problem hiding this comment.
IIRC, there are instances with > 10k projects- I also see that we use concurrent indexes widely across historical migrations.
There was a problem hiding this comment.
I asked Claude to do a risk analysis on this, and the takeaway is:
- The risk of CONCURRENTLY leaving an invalid index on failure is real but manageable — you just drop the invalid index and rerun. The migration already uses if_not_exists / if_exists guards for re-runnability.
- For small tables (<10k), the write-lock from a regular index build would be imperceptible, so CONCURRENTLY isn't strictly necessary there. But for larger instances (which do exist) it avoids any write-lock window, and it's consistent with the ~19 other migrations in the codebase that use this pattern.
That said -- you surfaced a much more important issue I'd missed: if any existing projects have case-colliding identifiers (e.g., MyProject and myproject), the unique index creation will fail regardless of CONCURRENTLY or not. I'll add a duplicate-cleanup step before the index creation, similar to what AddUniquenessForLdapAuthSourceNames does.
💡 TIL! Really nice catch.
|
|
||
| expect do | ||
| Project.where(id: create(:project).id).update_all(identifier: "EXISTING") | ||
| end.to raise_error(ActiveRecord::RecordNotUnique) |
There was a problem hiding this comment.
👍 thanks for adding this test!
| let(:display_count) { described_class::DISPLAY_COUNT } | ||
|
|
||
| # Store identifiers bypassing normalizes (which would downcase/upcase them) | ||
| def set_raw_identifier(project, identifier) |
There was a problem hiding this comment.
Do we still need that? At least the comment is outdated 😅
There was a problem hiding this comment.
Good catch- updated. we still bypass validations via update_all in the units to reduce the need for explict work_packages_identifier: "numeric" | "alphanumeric" settings- when we need to simulate datasets in mixed modes.
| expect(project).to be_valid | ||
| end | ||
|
|
||
| it "rejects reserved identifiers case-insensitively" do |
| adapter: OpenProject::ActsAsUrl::Adapter::OpActiveRecord # use a custom adapter able to handle edge cases | ||
|
|
||
| ### Validators for the legacy underscored identifier format (e.g. "project_one") | ||
| validates :identifier, |
There was a problem hiding this comment.
Nit: In order to make all this simpler, how about we also centralize all legacy slug validations into an identifier_slug_format?
| length: { maximum: SEMANTIC_IDENTIFIER_MAX_LENGTH }, | ||
| if: ->(p) { p.identifier_changed? && p.identifier.present? && Setting::WorkPackageIdentifier.alphanumeric? } | ||
| # Validators for the alphanumeric identifier format (e.g. "PROJ1") | ||
| validate :identifier_alphanumeric_format, |
There was a problem hiding this comment.
Shall we already switch to semantic across the code? (We can still discuss the name further, but for now we should at least remain consistent.)
Of course this is probably better via a follow-up PR.
There was a problem hiding this comment.
I know we haven't found naming scheme that would sit 100% well, but I would definitely at least get rid of "numeric/alphanumeric", since the "numeric" part does not really click within the context of this file.
There was a problem hiding this comment.
I took the "numeric/alphanumeric" wording to be consistent with Setting::WorkPackageIdentifier.{numeric?,alphanumeric?}- which should make it easier to rename once we decide/adopt new names. I definitely prefer to keep the renaming isolated into a separate PR.
There was a problem hiding this comment.
I agree. We should try to be consistent somehow. But renaming the setting probably requires a database migration 😅 so maybe it's easier if we settle on numeric / alphanumeric even if I don't like it as much 🥲
|
|
||
| already_existing = FriendlyId::Slug | ||
| .where(slug: identifier, sluggable_type: self.class.to_s) | ||
| .where("LOWER(slug) = LOWER(?)", identifier) |
There was a problem hiding this comment.
🤔 The reason for us leveraging that friendly_id history feature was that we will be able to resolve historical URLs.
If I am not mistaken, this lowercase conversion is addressing a case that shouldn't ever happen with links generated by OpenProject. Therefore, we're really only adding tolerance for typos in the URL. This sets an odd precedent, since the non-historical URLs do require proper casing
Unless there is a specific scenario we want to support, I suggest to drop ths.
There was a problem hiding this comment.
This is only a validation. So this code is never involved when we are resolving values from URLs. This validation will only prevent clashes when an identifier is added / changed.
Consider the following scenario:
Project A with historic identifier "proj" (current identifier "sthelse")
Project B trying to set identifier "PROJ" should not work
just out of fear that it could be misleading for people if /projects/proj resolves to Project A but /projects/PROJ to Project B.
| .select(:id, :name, :identifier) | ||
| .limit(DISPLAY_COUNT) | ||
| .to_a | ||
| analysis = ProblematicIdentifiers.new |
7295e82 to
2bf5740
Compare
Adds declarative specs for case-insensitive uniqueness, max length validation, and the LOWER(identifier) unique database index.
PostgreSQL enforces uniqueness constraints at index build time — when CREATE INDEX encounters duplicate key values, the entire operation fails. For concurrent index builds (algorithm: :concurrently), this is particularly problematic: a failed build leaves behind an "invalid" index entry in pg_class that is ignored for queries but still incurs write overhead on every INSERT/UPDATE, and must be explicitly dropped before retrying. Since this migration introduces a unique index on LOWER(identifier), any pre-existing case collisions (e.g. "MyProject" and "myproject") would cause the index build to fail. This adds a deduplication step that resolves collisions before the index is created: the oldest project (lowest id) keeps its identifier unchanged while newer duplicates receive a "-N" suffix via a window function partitioned over LOWER(identifier).
Ticket
https://community.openproject.org/wp/73205
What are you trying to accomplish?
Add case-insensitive uniqueness enforcement for
Project.identifier, ensuring no two projects can have identifiers that differ only in case (e.g. "proj" and "PROJ").Includes:
What approach did you choose and why?
The core of this change is a
LOWER(identifier)functional index on theprojectstable, paired with auniqueness: { case_sensitive: false }validation.Merge checklist