Skip to content

Getting a repository from a specific entity manager #2131

@nepster-web

Description

@nepster-web

Hello everyone,

I have a relatively old project where I'm trying to update the doctrine and in particular dbal3 -> dbal4.

In my specific case, I'm getting rid of the old (removed) logger:

$config = $this->manager->getConnection()->getConfiguration();
$config->setSQLLogger(null); // removed in dbal4

The doctrine has opened its interfaces for connection initialization and entity manager, so I created my own factory:

<?php

declare(strict_types=1);

namespace App\Doctrine\ORM;

use App\Doctrine\DBAL\Connection;
use App\Doctrine\DBAL\ConnectionFactory;
use App\Doctrine\DBAL\DbalArrayLogger;
use Doctrine\ORM\Decorator\EntityManagerDecorator;
use Doctrine\ORM\EntityManager;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\Persistence\ManagerRegistry;

class EntityManagerFactory
{
    public function __construct(
        private readonly ManagerRegistry $registry,
        private readonly ConnectionFactory $connectionFactory
    ) {
    }

    public function getManager(): EntityManagerInterface
    {
        /** @var EntityManagerInterface $em */
        $em = $this->registry->getManager();

        return $em;
    }

    public function create(
        ManagerRegistry|EntityManagerInterface|null $registryOrEm = null,
        ?Connection $connection = null
    ): EntityManagerInterface {
        $em = $this->receiveEm($registryOrEm);

        $finalConnection = $connection ?: $em->getConnection();

        $config = $em->getConfiguration();
        $eventManager = $em->getEventManager();

        return new EntityManager($finalConnection, $config, $eventManager);
    }

    /**
     * @param ManagerRegistry|EntityManagerInterface|null $registryOrEm
     * @return array{0:EntityManager,1:DbalArrayLogger}
     * @throws \Doctrine\DBAL\Exception
     * @throws \Doctrine\ORM\Exception\MissingMappingDriverImplementation
     */
    public function createWithLogger(
        ManagerRegistry|EntityManagerInterface|null $registryOrEm = null
    ): array {
        $em = $this->receiveEm($registryOrEm);

        /** @var Connection $connection */
        $connection = $em->getConnection();

        [$newConnection, $logger] = $this->connectionFactory->createWithLogger($connection);

        $config = $em->getConfiguration();
        $eventManager = $em->getEventManager();

        return [
            new EntityManager($newConnection, $config, $eventManager),
            $logger,
        ];
    }

    public function createWithoutLogger(
        ManagerRegistry|EntityManagerInterface|null $registryOrEm = null
    ): EntityManagerInterface {
        $em = $this->receiveEm($registryOrEm);

        /** @var Connection $connection */
        $connection = $em->getConnection();

        $newConnection = $this->connectionFactory->createWithoutLogger($connection);

        $config = $em->getConfiguration();
        $eventManager = $em->getEventManager();

        return new EntityManager($newConnection, $config, $eventManager);
    }

    private function receiveEm(ManagerRegistry|EntityManagerInterface|null $registryOrEm): EntityManagerInterface
    {
        if ($registryOrEm instanceof EntityManagerInterface) {
            return $registryOrEm;
        }

        if ($registryOrEm instanceof ManagerRegistry) {
            /** @var EntityManagerInterface $em */
            $em = $registryOrEm->getManager();

            return $em;
        }

        /** @var EntityManagerInterface $em */
        $em = $this->registry->getManager();

        return $em;
    }
}

Okay, no problem, everything works fine. When I refactored my code I found some very interesting behavior.

Let's say you have a command that will be working with a large amount of data, say, taking batches of 10_000 records from a database and processing them somehow. Let's assume there's a lot of data (millions records).

#[AsCommand(name: 'app:test')]
class CopyOldMessageLogDataToClickHouseCommand extends Command
{

    private EntityManagerInterface $em;

    public function __construct(
        private readonly EntityManagerFactory $entityManagerFactory,
        ?string $name = null
    ) {
        $this->em = $this->entityManagerFactory->createWithoutLogger();

        parent::__construct($name);
    }
  
    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        foreach ($this->getData() as $items) {

            // TODO: some processing for items $items
    
            $this->em->flush();
            $this->em->clear();
    
            $this->logger->debug(
                sprintf('[%s]: Batch copied', self::class),
                [
                    'memoryUsage' => round(memory_get_usage(true) / 1048576, 2) . ' MB',
                    'emSize' => $this->em->getUnitOfWork()->size(),
                ],
            );
        }
        
        return self::SUCCESS;
    }

    public function getData(): \Generator
    {
        /** @var SomeEntityRepository $repo */
        $repo = $this->em->getRepository(SomeEntity::class);
        
        $qb = $repo
            ->createQueryBuilder('e')   
        ...
    }
}

Next, I run the command, processing starts, everything seems to work, but the memory leaks very quickly (literally x2 per iteration).
I checked the unit of work and entities are accumulated in it, but how is this possible after $this->em->clear() ?

The thing is that when we get a repository:

 $repo = $this->em->getRepository(SomeEntity::class);

ContainerRepositoryFactory will simply take it from the container, without taking into account the new entity manager:
https://github.com/doctrine/DoctrineBundle/blob/3.0.x/src/Repository/ContainerRepositoryFactory.php#L56

Thus, in our command we clean one entity manager, and the data is accumulated in another (the standard one from the container).

Well, this is a very non-obvious situation when using:

$this->em->getRepository(SomeEntity::class);

as a result, you are working with a different repository.

The problem is further aggravated by the fact that when receiving a repository from the factory:

 $repository = $this->container->get($customRepositoryName);

We don't have an open interface to replace the entity manager.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions