-
-
Notifications
You must be signed in to change notification settings - Fork 474
Description
Hello everyone,
I have a relatively old project where I'm trying to update the doctrine and in particular dbal3 -> dbal4.
In my specific case, I'm getting rid of the old (removed) logger:
$config = $this->manager->getConnection()->getConfiguration();
$config->setSQLLogger(null); // removed in dbal4The doctrine has opened its interfaces for connection initialization and entity manager, so I created my own factory:
<?php
declare(strict_types=1);
namespace App\Doctrine\ORM;
use App\Doctrine\DBAL\Connection;
use App\Doctrine\DBAL\ConnectionFactory;
use App\Doctrine\DBAL\DbalArrayLogger;
use Doctrine\ORM\Decorator\EntityManagerDecorator;
use Doctrine\ORM\EntityManager;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\Persistence\ManagerRegistry;
class EntityManagerFactory
{
public function __construct(
private readonly ManagerRegistry $registry,
private readonly ConnectionFactory $connectionFactory
) {
}
public function getManager(): EntityManagerInterface
{
/** @var EntityManagerInterface $em */
$em = $this->registry->getManager();
return $em;
}
public function create(
ManagerRegistry|EntityManagerInterface|null $registryOrEm = null,
?Connection $connection = null
): EntityManagerInterface {
$em = $this->receiveEm($registryOrEm);
$finalConnection = $connection ?: $em->getConnection();
$config = $em->getConfiguration();
$eventManager = $em->getEventManager();
return new EntityManager($finalConnection, $config, $eventManager);
}
/**
* @param ManagerRegistry|EntityManagerInterface|null $registryOrEm
* @return array{0:EntityManager,1:DbalArrayLogger}
* @throws \Doctrine\DBAL\Exception
* @throws \Doctrine\ORM\Exception\MissingMappingDriverImplementation
*/
public function createWithLogger(
ManagerRegistry|EntityManagerInterface|null $registryOrEm = null
): array {
$em = $this->receiveEm($registryOrEm);
/** @var Connection $connection */
$connection = $em->getConnection();
[$newConnection, $logger] = $this->connectionFactory->createWithLogger($connection);
$config = $em->getConfiguration();
$eventManager = $em->getEventManager();
return [
new EntityManager($newConnection, $config, $eventManager),
$logger,
];
}
public function createWithoutLogger(
ManagerRegistry|EntityManagerInterface|null $registryOrEm = null
): EntityManagerInterface {
$em = $this->receiveEm($registryOrEm);
/** @var Connection $connection */
$connection = $em->getConnection();
$newConnection = $this->connectionFactory->createWithoutLogger($connection);
$config = $em->getConfiguration();
$eventManager = $em->getEventManager();
return new EntityManager($newConnection, $config, $eventManager);
}
private function receiveEm(ManagerRegistry|EntityManagerInterface|null $registryOrEm): EntityManagerInterface
{
if ($registryOrEm instanceof EntityManagerInterface) {
return $registryOrEm;
}
if ($registryOrEm instanceof ManagerRegistry) {
/** @var EntityManagerInterface $em */
$em = $registryOrEm->getManager();
return $em;
}
/** @var EntityManagerInterface $em */
$em = $this->registry->getManager();
return $em;
}
}Okay, no problem, everything works fine. When I refactored my code I found some very interesting behavior.
Let's say you have a command that will be working with a large amount of data, say, taking batches of 10_000 records from a database and processing them somehow. Let's assume there's a lot of data (millions records).
#[AsCommand(name: 'app:test')]
class CopyOldMessageLogDataToClickHouseCommand extends Command
{
private EntityManagerInterface $em;
public function __construct(
private readonly EntityManagerFactory $entityManagerFactory,
?string $name = null
) {
$this->em = $this->entityManagerFactory->createWithoutLogger();
parent::__construct($name);
}
protected function execute(InputInterface $input, OutputInterface $output): int
{
foreach ($this->getData() as $items) {
// TODO: some processing for items $items
$this->em->flush();
$this->em->clear();
$this->logger->debug(
sprintf('[%s]: Batch copied', self::class),
[
'memoryUsage' => round(memory_get_usage(true) / 1048576, 2) . ' MB',
'emSize' => $this->em->getUnitOfWork()->size(),
],
);
}
return self::SUCCESS;
}
public function getData(): \Generator
{
/** @var SomeEntityRepository $repo */
$repo = $this->em->getRepository(SomeEntity::class);
$qb = $repo
->createQueryBuilder('e')
...
}
}Next, I run the command, processing starts, everything seems to work, but the memory leaks very quickly (literally x2 per iteration).
I checked the unit of work and entities are accumulated in it, but how is this possible after $this->em->clear() ?
The thing is that when we get a repository:
$repo = $this->em->getRepository(SomeEntity::class);
ContainerRepositoryFactory will simply take it from the container, without taking into account the new entity manager:
https://github.com/doctrine/DoctrineBundle/blob/3.0.x/src/Repository/ContainerRepositoryFactory.php#L56
Thus, in our command we clean one entity manager, and the data is accumulated in another (the standard one from the container).
Well, this is a very non-obvious situation when using:
$this->em->getRepository(SomeEntity::class);
as a result, you are working with a different repository.
The problem is further aggravated by the fact that when receiving a repository from the factory:
$repository = $this->container->get($customRepositoryName);
We don't have an open interface to replace the entity manager.