Skip to content

Baseline for Drupal metadataΒ #838

@Niklan

Description

@Niklan

This was initially mentioned in #818, but since it closed with the PR automatically and I can't reopen it, I'll duplicate my proposal/question.

Problem/motivation

Currently, some Drupal-specific information is collected through parsing the codebase or is simply hardcoded. This can actually lead to unwanted problems and potentially create bugs in the code instead of finding them.

A good example of this is services. The extension simply hardcodes some of them (like synthetic ones) that are defined in Drupal runtime and parses *.services.yml files to collect information about the others. This opens a whole new world of bugs and uncertainty.

Here are several examples that come to mind:

  1. The extension treats all *.services.yml files as given. This means that even services from disabled modules are considered valid. If you reference a service from a disabled module in your code, PHPStan will not detect the problem, because it assumes the service is available.

  2. It cannot detect services registered during service container compilation (in runtime). Anything added through a Compiler Pass is considered non-existent in this case.

  3. Similar to the previous point, it cannot detect services that are removed or altered from the container for any reason during a Compiler Pass.

  4. It only supports specific features for services, such as decorators and aliases. The problem is that some of these features are now available through PHP attributes, which introduces a new issue. Consider this example, which will not be processed properly:

     Drupal\foo\Bar: {}
    #[AsDecorator(decorates: 'baz_service', priority: -100)]
    final readonly class Bar {
    
      public function __construct(
        #[AutowireDecorated]
        private Baz $inner,
      ) {}
    
    }
  5. Or a new Drupal hook system based on PHP attributes. The entire src/Hook directory is essentially registered as services. You could technically parse the whole folder and achieve the same result, but why? Every time something similar happens in Drupal, it will require emulating that in the extension.

To support all these cases, the extension would essentially have to start emulating all this functionality on its own to understand what's going on. However, some parts are simply unrealistic to emulate, such as a Compiler Pass, which can significantly alter the container based on other factors like the database or specific environment variables. It's simply unrealistic to support all this functionality through emulation, as the amount of work required would be insane.

Proposed solution

This is nothing new in the PHP world: PHPStorm does this (phpstorm-meta), PHPStan does this (baseline). We can build a baseline/environment-knowledge/drupal-meta (it doesn't matter what to call it). The main idea is to create a command for the extension that will dump all the necessary information from the runtime directly.

Let's take services as an example. This command can simply load the service container from the Drupal database and then dump it in a PHP file with an array inside (exactly as it done by Chi-teck/drupal-code-generator), where all the necessary information for the extension will be stored. Such as the service name, the assigned class, whether it's a decorator, and everything like that.

Yes, it would require some manual work, and yes, at least for the first steps, it should be optional. But it solves basically all the problems. You don't need a properly bootstrapped Drupal for that in CI, because it will be in the repository as a phpstan-drupal-baseline.php, for example. It is much easier to parse, it contains all the relevant information about the project. It must be much easier to write rules for such a file than it is now, when you have to write a lot of other parsers, finders, etc.

If this approach works well, it can be extended to other subsystems, such as entity types. With dumped data, it can be properly checked in multiple cases (since entities can be altered, but the extension can't detect it).

Theoretically, this approach has great potential regarding fields. If the extension knows exactly which entity has which fields of which types, whether they are optional, what entity reference type they have, what FieldItemInterface the field type is using, and everything like that, it will likely detect many new problems in the codebases. Not to mention, it will allow writing much simpler code without worrying about phpstan-ignore-next-line simply because it doesn't understand what ->first() means for a specific field of a specific entity type.

With proper metadata, it would be possible to understand what ->first()->get('value')->getValue() means (DataType API). Currently, this forces developers to write a huge wall of code with asserts or ignore the call completely, but it could contain bugs!

For example: $node->get('field_tags')->first()->get('entity')->getValue(). Even if the field is required, it doesn't mean that the referenced tag is still present. The getValue() should return ?TermInterface in that case (which will highlight problems if there is no check for entity in the code), but it's currently impossible to explain this to PHPStan.

Having detailed metadata would allow PHPStan to correctly interpret such complex chains of method calls and provide accurate type information at each step, significantly improving code quality and reducing potential bugs.

This method could significantly improve the analysis capabilities and reduce the need for manual exceptions in the code.


This is just a proposal and some thoughts, not a call to action. It feels like the current approach is quite limited and, in fact, much more expensive to maintain and extend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions