Skip to content

Migrate to Index editor API#12357

Draft
greg0ire wants to merge 1 commit intodoctrine:3.7.xfrom
greg0ire:fw-compat-dbal-5
Draft

Migrate to Index editor API#12357
greg0ire wants to merge 1 commit intodoctrine:3.7.xfrom
greg0ire:fw-compat-dbal-5

Conversation

@greg0ire
Copy link
Member

This was not caught before because PHPStan does not report an error when we use an internal method from the same root namespace.

This was not caught before because PHPStan does not report an error when
we use an internal method from the same root namespace.
assert($colName !== '');

return new IndexedColumn(
UnqualifiedName::unquoted($colName),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if I should use this or quoted here.

Copy link
Member

@morozov morozov Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the value of $colName. It could be first_name (unquoted) or "first_name" (quoted, the quotes are part of the value). If the ORM accepts names with quotes in the string form and wants to pass them to the DBAL as objects, it needs to parse them. See the implementation of Index#parseColumns() as a BC layer in DBAL 4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that getIndexColumns resorts to a "quote strategy", and that the default implementation supports both ways:

return isset($class->fieldMappings[$fieldName]->quoted)
? $platform->quoteSingleIdentifier($class->fieldMappings[$fieldName]->columnName)
: $class->fieldMappings[$fieldName]->columnName;

Let me parse them then.

Copy link
Member Author

@greg0ire greg0ire Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on… Parsers::getUnqualifiedNameParser() is internal. Is it OK to use it?

And should we deprecate the need for parsing, that is, disallow quoting at the ORM metadata level? Since DBAL is going to quote everything, if I understood correctly.

Copy link
Member

@morozov morozov Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "quoted" is a bit overloaded, even in the SQL standard. This article and the one it references in the beginning should answer all the questions on quoting in the DBAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on… Parsers::getUnqualifiedNameParser() is internal. Is it OK to use it?

I guess, yes, by the ORM. The plan is to migrate all DBAL APIs to names as objects – at that point parsing will be no longer necessary, but this is a lengthy process which will likely span major releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that getIndexColumns resorts to a "quote strategy", and that the default implementation supports both ways [...]

Let me parse them then.

Well, maybe you don't need to. Let's at least try to explore this possibility.

In the DefaultQuoteStrategy, the actual value of the column name is taken from one property and whether it needs to be quoted is taken from another. This is exactly the structure and the separation of concerns used in the new DBAL API.

In the AnsiQuoteStrategy, the "quoted" property is ignored, so the name effectively will be put into the SQL as is. Um... that's where you have to parse the name.

So yeah, in the worst case scenario you need to parse. Not parsing in the default case would be solely a runtime optimization, it won't allow to save the engineering effort.


return new IndexedColumn(
UnqualifiedName::unquoted($colName),
null,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether I should specify a length here.

Copy link
Member

@morozov morozov Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ORM supports $indexData['options']['lengths'], then it needs to parse them and pass the length of each column with the column definition (the BC layer for this is also implemented in Index#parseColumns()).

Copy link
Member Author

@greg0ire greg0ire Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supported, I have been able to specify a negative length in a test and see a deprecation. Then I wondered how this was even possible given I did not write code to parse the length. I dug deeper and found that even though we are creating an Index instance, we don't pass it to a DBAL API. Instead, we… extract columns from it, and pass options alongside it a few lines below:

$table->addUniqueIndex(self::getIndexedColumns($uniqIndex), is_numeric($indexName) ? null : $indexName, $indexData['options'] ?? []);

So… parsing options is useless? 😐

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's worse than that. Table::addUniqueIndex() isn't yet deprecated formally (because there's no full upgrade path) but it shouldn't be used since it mutates the table object. One of the goals of the DBAL API rework is to make the schema objects immutable.

Instead of calling Table::addUniqueIndex(), you need to:

  1. Create the index
  2. Edit the table
  3. Add the index via the editor
  4. Create a new instance of the table.

Note that the index constructor is internal, and its signature will change in 5.0 (use the editor instead).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is created like this:

$table = $schema->createTable($this->quoteStrategy->getTableName($class, $this->platform));

So now that I have a new instance created in 4., I need to 5. Replace the previous instance with the new one in $schema. Is that even possible?

Copy link
Member

@morozov morozov Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one of the reasons why the old mutable API is not yet fully deprecated. Yes, we would replace the table in the schema in the same way as we would perform any modifications in the table itself, but there's no API for that.

The Schema#createTable() method itself as part of the DBAL API doesn't make sense to me personally. It adds a table with no columns leaving the schema and the table in an invalid state. Ideally, the ORM (as well as any DBAL API consumer) should build the schema at once w/o mutations: create the unique index or constraint definitions for all tables, create all tables and then create a schema with all those tables.

You can edit tables but you cannot edit the schema (yet), so you at least need to pre-create all the tables and then create the schema.

$editor->setColumns(...$columns);
$editor->setType(IndexType::UNIQUE);
if (isset($indexData['options']['predicate'])) {
$editor->setPredicate($indexData['options']['predicate']);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other options that I should be handling here? And does predicate actually need to be handled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other options that I should be handling here?

I don't think so. The Index constructor in DBAL 4 is the source of truth about supported options.

And does predicate actually need to be handled?

To take a step back, why does the ORM translate "unique constraints" to "unique indexes"? Speaking of indexes, I don't think a unique index can have a predicate (i.e. this value must be unique if it matches this and this predicate).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the ORM translate "unique constraints" to "unique indexes"

Because it can span several columns I suppose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constraint can also span several columns. I think the real reason is that support for unique constraints was added later than in was supported by the ORM. Or just because they are functionally very close (the difference is primarily in semantics).

Copy link
Member Author

@greg0ire greg0ire Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ashamed to say I just did not know about unique constraints other than as implemented with keywords added to a single column definition. Or maybe I did but I forgot 😅

Or just because they are functionally very close (the difference is primarily in semantics).

Oh? Do constraints help with performance? If not I'd rather say that would be the primary difference.

Copy link
Member

@morozov morozov Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: semantically, unique constraints are about intention (I want the values in these columns to be unique), unique indexes are about the physical implementation (build me a B-tree or hash and make sure all values are unique and use it for the queries where appropriate).

For example (Postgres):

PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table. The index [...] is the mechanism that enforces the constraint.

IIRC, for MySQL the two are just synonyms (you can create one and then drop the other).

Do constraints help with performance?

Not the constraints themselves, but the underlying unique indexes, as a side effect, definitely. But if you want to optimize queries, it's better if you create an index because you know which queries you want to optimize and what type of index to create. If you only want to enforce uniqueness, you create a unique constraint.

Relying on unique constraints for performance is similar to using transient dependencies in your code w/o declaring the dependency explicitly. It works, but may have its issues.

@greg0ire greg0ire requested review from derrabus and morozov January 26, 2026 22:13
@morozov
Copy link
Member

morozov commented Jan 27, 2026

I believe that a purely mechanical translation of the existing ORM API to the DBAL 5 API may be painful. The DBAL 5 is very strict in terms of types and capabilities while the ORM API is as loose as the DBAL 4 API.

If the requirement is to allow loose ORM API on top of the strict DBAL API, then the solution will be to duplicate the whole BC layer from DBAL 4 to the ORM. This BC layer could be thought of as a reference implementation.

@greg0ire
Copy link
Member Author

If the requirement is to allow loose ORM API on top of the strict DBAL API, then the solution will be to duplicate the whole BC layer from DBAL 4 to the ORM. This BC layer could be thought of as a reference implementation.

ORM may become stricter in 4.0.x, but on 3.7.x I think it's going to need to stay as loose as possible. Unless we decide to do conditional breaking changes 🤔

In any case, I need to take a step back to understand when those indices come from (probably some metadata), and how loose the current API is. Thanks for the review!

@greg0ire greg0ire marked this pull request as draft January 27, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants