Skip to content

Support setting-aware project identifier casing with case-insensitive uniqueness#22406

Open
akabiru wants to merge 15 commits intodevfrom
open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage
Open

Support setting-aware project identifier casing with case-insensitive uniqueness#22406
akabiru wants to merge 15 commits intodevfrom
open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage

Conversation

@akabiru
Copy link
Member

@akabiru akabiru commented Mar 18, 2026

Ticket

https://community.openproject.org/wp/73205

What are you trying to accomplish?

Add case-insensitive uniqueness enforcement for Project.identifier, ensuring no two projects can have identifiers that differ only in case (e.g. "proj" and "PROJ").

Includes:

What approach did you choose and why?

The core of this change is a LOWER(identifier) functional index on the projects table, paired with a uniqueness: { case_sensitive: false } validation.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

@akabiru akabiru self-assigned this Mar 19, 2026
@akabiru akabiru changed the title [#73149] Support setting-aware identifier casing with case-insensitive uniqueness [#73149] Support setting-aware project identifier casing with case-insensitive uniqueness Mar 19, 2026
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from dc0a497 to dbed01e Compare March 19, 2026 06:11
Base automatically changed from implementation/72917-save-the-previous-value-when-a-project-identifier-changes to dev March 19, 2026 08:56
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch 4 times, most recently from 855b75c to 36bacc6 Compare March 20, 2026 11:25
@akabiru akabiru changed the base branch from dev to refactor/extract-projects-identifier-concern March 20, 2026 11:27
@akabiru akabiru force-pushed the refactor/extract-projects-identifier-concern branch from 54a420a to e55e2c6 Compare March 20, 2026 12:48
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 36bacc6 to df753f1 Compare March 20, 2026 13:06
@akabiru akabiru changed the base branch from refactor/extract-projects-identifier-concern to refactor/strip-ascii-control-chars-from-identifier March 20, 2026 13:06
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from df753f1 to 128ae0a Compare March 20, 2026 13:19
@akabiru akabiru force-pushed the refactor/strip-ascii-control-chars-from-identifier branch from 7bb5773 to 1babb4b Compare March 20, 2026 13:54
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 128ae0a to 6541f9b Compare March 20, 2026 14:11
@akabiru akabiru marked this pull request as ready for review March 20, 2026 14:13
@akabiru akabiru requested a review from a team March 20, 2026 14:13
@akabiru akabiru changed the title [#73149] Support setting-aware project identifier casing with case-insensitive uniqueness Support setting-aware project identifier casing with case-insensitive uniqueness Mar 20, 2026
@opf opf deleted a comment from github-actions bot Mar 20, 2026
@akabiru akabiru requested a review from Copilot March 20, 2026 14:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds setting-aware normalization and case-insensitive uniqueness for Project.identifier, aligning storage casing with Setting::WorkPackageIdentifier and updating related autofix preview and MCP search behaviors.

Changes:

  • Normalize Project.identifier to uppercase (alphanumeric mode) or lowercase (numeric mode) and make uniqueness case-insensitive.
  • Add a LOWER(identifier) unique index (including a concurrent migration) and update initial table definition for fresh installs.
  • Extend identifier-autofix preview classification + translations, and update MCP tool schemas/specs for case-insensitive identifier searches.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
app/models/projects/identifier.rb Implements casing normalization, case-insensitive uniqueness, FriendlyId input normalization, and updates historical identifier checks.
lib/open_project/acts_as_url/adapter/op_active_record.rb Adds optional post_process support to apply controlled casing transforms to generated slugs.
db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb Adds concurrent migration switching to a LOWER(identifier) unique index.
db/migrate/tables/projects.rb Updates initial schema definition to use the LOWER(identifier) unique index.
app/services/work_packages/identifier_autofix/preview_query.rb Expands “problematic identifier” detection and error classification for semantic identifiers.
config/locales/en.yml Adds new autofix preview error labels for additional classifications.
spec/services/work_packages/identifier_autofix/preview_query_spec.rb Updates preview query specs to bypass normalization and cover new classifications.
spec/models/projects/identifier_spec.rb Adds normalization and case-insensitive collision specs; switches to rails_helper.
spec/features/projects/create_spec.rb Strengthens feature coverage for alphanumeric identifier auto-suggestion behavior.
app/services/mcp_tools/search_projects.rb Updates tool schema description to case-insensitive identifier matching.
app/services/mcp_tools/search_programs.rb Updates tool schema description to case-insensitive identifier matching.
app/services/mcp_tools/search_portfolios.rb Updates tool schema description to case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_projects_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_programs_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_portfolios_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/models/project_spec.rb Removes a stray blank line at EOF.

Base automatically changed from refactor/strip-ascii-control-chars-from-identifier to dev March 20, 2026 14:56
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 6541f9b to 6aedcab Compare March 20, 2026 15:01
@opf opf deleted a comment from github-actions bot Mar 20, 2026
Introduce case-insensitive handling for project identifiers:

- Add `normalizes :identifier` to automatically upcase (alphanumeric
  mode) or downcase (numeric mode) identifiers on assignment
- Add `parse_friendly_id` to normalize FriendlyId lookups for
  case-insensitive finder queries
- Switch uniqueness validation to `case_sensitive: false`
- Replace inline `exclusion:` validator with explicit
  `identifier_not_reserved` that checks case-insensitively
- Consolidate alphanumeric format validators into a single
  `identifier_alphanumeric_format` method with early return to
  prevent cascading error messages
- Use case-insensitive LOWER() comparison in historical identifier
  reservation check
- Add `post_process` support to the OpActiveRecord acts_as_url
  adapter with an allowlist of safe transforms (upcase/downcase)
- Add migration to replace the unique index on projects.identifier
  with a case-insensitive LOWER(identifier) index
- Update table definition to match the new index

Includes corresponding test additions for normalization, case-
insensitive uniqueness, reserved identifier rejection, and the
create_spec fixture fix for alphanumeric mode.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch 2 times, most recently from a1d5166 to 2ca7187 Compare March 20, 2026 17:33
Expand PreviewQuery to detect additional alphanumeric identifier
format violations: case-mismatch, underscores, leading digits, and
purely numerical identifiers. Restructure error classification with
FORMAT_RULES for clarity and add corresponding locale strings.

Update MCP tool descriptions for search_projects, search_programs,
and search_portfolios to reflect case-insensitive identifier
matching, with updated test expectations.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 2ca7187 to 91362fb Compare March 20, 2026 18:35
akabiru added 3 commits March 20, 2026 22:55
Separate scope-building, identifier classification, and exclusion set
logic into a reusable class that both PreviewQuery (admin UI) and the
future ApplyHandlesJob (batch migration) can compose from.

Key changes:
- ProblematicIdentifiers owns FORMAT_RULES, problematic scope, error
  classification, and a dual-mode exclusion set (DB-backed ExclusionSet
  for preview, preloaded Set for batch)
- PreviewQuery becomes a thin orchestrator delegating to
  ProblematicIdentifiers
- Add deterministic ordering (.order(:id)) to preview results
- Rename :not_uppercase → :not_fully_uppercased to match scope method
  naming, update corresponding en.yml translation key
- ExclusionSet uses raw SQL to bypass Rails normalizes on :identifier
  which would transform query parameters based on current setting mode
Replace the TODO stub in ProblematicIdentifiers#reserved_identifiers
with a real query against the friendly_id_slugs table. Historical slugs
(identifiers a project used in the past but has since changed) are now
excluded from suggestion generation and classified as :reserved in
error_reason output.

The query excludes slugs that match a current active project identifier
(those are already covered by the in_use exclusion path).
Drop the DB-backed ExclusionSet in favour of a simple Set from pluck.
This is a one-off admin migration — the brief memory cost of loading
all non-problematic identifiers is not worth the added complexity of
a custom duck-typed wrapper.

Also adds performance notes documenting index considerations for the
regex scope conditions and the eager-load trade-off.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 839cea0 to 79209e2 Compare March 23, 2026 06:22
akabiru added 7 commits March 23, 2026 17:22
The production spec explicitly allows underscores in alphanumeric
identifiers. PreviewQuery's :special_characters rule and SQL scope
incorrectly flagged them. Updated regex from [^a-zA-Z0-9] to
[^a-zA-Z0-9_] in both the FORMAT_RULES lambda and the
contains_non_alphanumeric scope.

Also added a comment to set_raw_identifier explaining why Arel.sql
is necessary (update_all applies normalizes in this Rails version).
…served

LOWER(slug) = LOWER(?) is unnecessary because normalizes :identifier
ensures consistent casing before FriendlyId records the slug. Plain
equality uses the existing composite index on (slug, sluggable_type).
Corrected error_too_long message from "fewer than 5" to "10 or fewer"
to match SEMANTIC_IDENTIFIER_MAX_LENGTH.

Replaced be_present with specific assertions on item count and
identifier in MCP search_{projects,programs,portfolios} specs to
ensure the correct resource is returned, not just any resource.
Per code review: the model should not auto-transform identifier casing
on every read/write. Format validators already enforce correct casing
(lowercase for numeric, uppercase for alphanumeric), and the background
job will handle migrating existing identifiers.

Restores LOWER() in identifier_not_historically_reserved since without
normalizes, slugs and identifiers may differ in case.
Without parse_friendly_id normalizing case, MCP search lookups are
case-sensitive again. Reverts description changes and removes the
case-variant identifier test contexts.
These were built on the normalizes premise which has been removed.
In alphanumeric mode, SuggestionGenerator handles identifier generation
rather than acts_as_url. The API PR (22417) will handle the proper
wiring between modes.
The underscore fix (allowing _ in identifiers per spec) needs to target
ProblematicIdentifiers instead of PreviewQuery after the extraction
refactor on the target branch.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 0738f98 to 5c7bed1 Compare March 23, 2026 14:25
@akabiru akabiru requested review from judithroth and thykel March 23, 2026 14:38
add_index :projects, "LOWER(identifier)",
unique: true,
name: "index_projects_on_lower_identifier",
algorithm: :concurrently,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about this. Do you think it's necessary to have concurrently built index here? I mean, the projects table should be rather small (<10.000), so index creation should be fast...

https://www.postgresql.org/docs/current/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished.

In a concurrent index build, the index is actually entered as an “invalid” index into the system catalogs in one transaction, then two table scans occur in two more transactions. [...]

If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index. This index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead.

Not sure what the gain is but the pain could be potentially high (having an invalid index that does not work)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there are instances with > 10k projects- I also see that we use concurrent indexes widely across historical migrations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Claude to do a risk analysis on this, and the takeaway is:

  • The risk of CONCURRENTLY leaving an invalid index on failure is real but manageable — you just drop the invalid index and rerun. The migration already uses if_not_exists / if_exists guards for re-runnability.
  • For small tables (<10k), the write-lock from a regular index build would be imperceptible, so CONCURRENTLY isn't strictly necessary there. But for larger instances (which do exist) it avoids any write-lock window, and it's consistent with the ~19 other migrations in the codebase that use this pattern.

That said -- you surfaced a much more important issue I'd missed: if any existing projects have case-colliding identifiers (e.g., MyProject and myproject), the unique index creation will fail regardless of CONCURRENTLY or not. I'll add a duplicate-cleanup step before the index creation, similar to what AddUniquenessForLdapAuthSourceNames does.

💡 TIL! Really nice catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


expect do
Project.where(id: create(:project).id).update_all(identifier: "EXISTING")
end.to raise_error(ActiveRecord::RecordNotUnique)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for adding this test!

let(:display_count) { described_class::DISPLAY_COUNT }

# Store identifiers bypassing normalizes (which would downcase/upcase them)
def set_raw_identifier(project, identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need that? At least the comment is outdated 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch- updated. we still bypass validations via update_all in the units to reduce the need for explict work_packages_identifier: "numeric" | "alphanumeric" settings- when we need to simulate datasets in mixed modes.

expect(project).to be_valid
end

it "rejects reserved identifiers case-insensitively" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a nice catch! 🎇

adapter: OpenProject::ActsAsUrl::Adapter::OpActiveRecord # use a custom adapter able to handle edge cases

### Validators for the legacy underscored identifier format (e.g. "project_one")
validates :identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In order to make all this simpler, how about we also centralize all legacy slug validations into an identifier_slug_format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow wym 👀

length: { maximum: SEMANTIC_IDENTIFIER_MAX_LENGTH },
if: ->(p) { p.identifier_changed? && p.identifier.present? && Setting::WorkPackageIdentifier.alphanumeric? }
# Validators for the alphanumeric identifier format (e.g. "PROJ1")
validate :identifier_alphanumeric_format,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we already switch to semantic across the code? (We can still discuss the name further, but for now we should at least remain consistent.)

Of course this is probably better via a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we haven't found naming scheme that would sit 100% well, but I would definitely at least get rid of "numeric/alphanumeric", since the "numeric" part does not really click within the context of this file.

Copy link
Member Author

@akabiru akabiru Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the "numeric/alphanumeric" wording to be consistent with Setting::WorkPackageIdentifier.{numeric?,alphanumeric?}- which should make it easier to rename once we decide/adopt new names. I definitely prefer to keep the renaming isolated into a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should try to be consistent somehow. But renaming the setting probably requires a database migration 😅 so maybe it's easier if we settle on numeric / alphanumeric even if I don't like it as much 🥲


already_existing = FriendlyId::Slug
.where(slug: identifier, sluggable_type: self.class.to_s)
.where("LOWER(slug) = LOWER(?)", identifier)
Copy link
Contributor

@thykel thykel Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 The reason for us leveraging that friendly_id history feature was that we will be able to resolve historical URLs.

If I am not mistaken, this lowercase conversion is addressing a case that shouldn't ever happen with links generated by OpenProject. Therefore, we're really only adding tolerance for typos in the URL. This sets an odd precedent, since the non-historical URLs do require proper casing

Unless there is a specific scenario we want to support, I suggest to drop ths.

Copy link
Contributor

@judithroth judithroth Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a validation. So this code is never involved when we are resolving values from URLs. This validation will only prevent clashes when an identifier is added / changed.

Consider the following scenario:
Project A with historic identifier "proj" (current identifier "sthelse")
Project B trying to set identifier "PROJ" should not work

just out of fear that it could be misleading for people if /projects/proj resolves to Project A but /projects/PROJ to Project B.

.select(:id, :name, :identifier)
.limit(DISPLAY_COUNT)
.to_a
analysis = ProblematicIdentifiers.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 7295e82 to 2bf5740 Compare March 23, 2026 15:21
akabiru added 2 commits March 23, 2026 18:53
Adds declarative specs for case-insensitive uniqueness, max length
validation, and the LOWER(identifier) unique database index.
PostgreSQL enforces uniqueness constraints at index build time — when
CREATE INDEX encounters duplicate key values, the entire operation
fails. For concurrent index builds (algorithm: :concurrently), this is
particularly problematic: a failed build leaves behind an "invalid"
index entry in pg_class that is ignored for queries but still incurs
write overhead on every INSERT/UPDATE, and must be explicitly dropped
before retrying.

Since this migration introduces a unique index on LOWER(identifier),
any pre-existing case collisions (e.g. "MyProject" and "myproject")
would cause the index build to fail. This adds a deduplication step
that resolves collisions before the index is created: the oldest
project (lowest id) keeps its identifier unchanged while newer
duplicates receive a "-N" suffix via a window function partitioned
over LOWER(identifier).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants