Skip to content

I2C Tx DMA race condition causes no Tx-complete IRQ #98

@george-palmsens

Description

@george-palmsens

Describe the set-up
STM32H753 in custom product, compiled with arm-none-eabi-gcc 14.3 rel1

Describe the bug

if (dmaxferstatus == HAL_OK)
{
/* Send Slave Address */
/* Set NBYTES to write and reload if hi2c->XferCount > MAX_NBYTE_SIZE and generate RESTART */
I2C_TransferConfig(hi2c, DevAddress, (uint8_t)(hi2c->XferSize + 1U),
xfermode, I2C_GENERATE_START_WRITE);
/* Update XferCount value */
hi2c->XferCount -= hi2c->XferSize;
/* Process Unlocked */
__HAL_UNLOCK(hi2c);
/* Note : The I2C interrupts must be enabled after unlocking current process
to avoid the risk of I2C interrupt handle execution before current
process unlock */
/* Enable ERR and NACK interrupts */
I2C_Enable_IRQ(hi2c, I2C_XFER_ERROR_IT);
/* Enable DMA Request */
hi2c->Instance->CR1 |= I2C_CR1_TXDMAEN;
}

The modification to CR1 on line 2075 races with the IRQ that is enabled on 2072.

Even though the TX DMA has not been set up, the transfer has. That means that the peripheral will begin the addressing phase, which may lead to a NACK interrupt.

If that NACK interrupt occurs between the read and write of the |= operation on 2075, then line 2075 ends up reverting the interrupt enable bits configured by the IRQ handler.

In practice, this means that the NACK and ERR interrupts are reenabled, and the STOP interrupt is cleared. This is then a critical failure because a NACK has already occurred, and the HAL needs to see a STOP in order to call the Tx complete callback.

In the end, this leads to a situation where the Tx complete callback is never called and you have to abort the HAL to get back to a working state.

How To Reproduce

This is heavily time dependant. This use case is with -Og optimisation, a 210MHz sysclk, and a 52.5MHz APB clock.

This issue can be most easily exposed by adding a delay.

Change the line:

hi2c->Instance->CR1 |= I2C_CR1_TXDMAEN;

To:

uint32_t const stored = hi2c->Instance->CR1;
HAL_Delay(10);
if (hi2c->Instance->CR1 != stored) {
  __BKPT();
}
hi2c->Instance->CR1 |= I2C_CR1_TXDMAEN;

Then perform an I2C transmit to an address that is not present on the bus, and therefore immediately generates a NACK.

If the code were race free, the transformation above would never fail. In this case it will fail reliably.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghalHAL-LL driver-related issue or pull-request.i2cInter-Integrated Circuit interface

Type

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions