Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 44 additions & 5 deletions core/drivers/zynqmp_huk.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@
#include <drivers/zynqmp_huk.h>
#include <drivers/zynqmp_pm.h>
#include <io.h>
#include <kernel/delay.h>
#include <kernel/tee_common_otp.h>
#include <mm/core_memprot.h>
#include <string_ext.h>
#include <tee/cache.h>
#include <tee/tee_cryp_utl.h>
#include <trace.h>
#include <utee_defines.h>
Expand Down Expand Up @@ -114,6 +116,7 @@ TEE_Result tee_otp_get_hw_unique_key(struct tee_hw_unique_key *hwkey)
uint8_t dst[ZYNQMP_CSU_AES_DST_LEN(sizeof(src))]
__aligned_csuaes = { 0 };
TEE_Result ret = TEE_ERROR_GENERIC;
uint32_t retry_counter = 0;
uint32_t status = 0;

static_assert(sizeof(device_dna) == ZYNQMP_GCM_IV_SIZE);
Expand Down Expand Up @@ -201,11 +204,47 @@ TEE_Result tee_otp_get_hw_unique_key(struct tee_hw_unique_key *hwkey)
goto cleanup;
}

if (memcmp(src, sha, sizeof(sha))) {
EMSG("PUF not ready, can't create HUK");
ret = TEE_ERROR_GENERIC;
goto cleanup;
}
/*
* This retry loop here is a workaround for hardware issue where
* CSU-DMA transfer is not yet delivered to DDR memory (in variable
* 'src').
*
* CSU-AES/CSU-DMA code is implemented in a way that it waits for DONE
* signal which indicates that memory transfer has left the DMA engine.
*/
retry_counter = 0;
do {
if (memcmp(src, sha, sizeof(sha)) == 0)
break;

/*
* If the transfer is not completed within 1 second then there
* is a real failure.
*/
if (retry_counter >= 1000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: move this to while() condition and check for errors via checking the retry counter afterwards. Does not need to be applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK to change the construct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please could you extract the workaround section into a workaround function? makes it easier to understand

EMSG("PUF not ready, can't create HUK");
ret = TEE_ERROR_GENERIC;
goto cleanup;
}

/*
* Give CPU's internal fifos some time to transfer data to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the CSUDMA transfer to RAM that is incorrectly signaled as completed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is black box more or less for us so we are somewhat guessing in here :|

Current theory is that there are several internal FIFOs between the domains and those "splits" the bus ack so that the sender (CSUDMA) thinks that they are complete. And now some internal timing on larger chip (or so) has changed so then the delay is so slightly larger for result to arrive in DDR.

In our testing we only got one round i.e. 1 ms enough but for went for larger retry.

That being said -- other operations also for the CSUDMA are subjects for the same issue. Would really like to hear from AMD/Xilinx folks what is the right thing to do in here.

We had similar issue with FSBL where we decided to disable CSUDMA usage altogether for "memcpy" operation optimization. This happened also in smaller chips.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. we have also observed similar issue when transferring data from FPGA to DDR that some data might just take a bit to transfer after we get IRQ so we had to implement some waiting logic and magic & counter checks to check that we know that data has finally arrived as complete before staring to work with that. This we concluded as normal operation. This could be the similar FIFO issue in both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prepared a different kind of change that still needs a testing in secured hardware -- but more or less I decided that 1 ms should be all enough for internal FIFO's to work their way -- and I relocated the fixup in CSU DMA driver:

diff --git a/core/drivers/zynqmp_csudma.c b/core/drivers/zynqmp_csudma.c
index 4e3b59c9c..e20a2798e 100644
--- a/core/drivers/zynqmp_csudma.c
+++ b/core/drivers/zynqmp_csudma.c
@@ -74,6 +74,28 @@ TEE_Result zynqmp_csudma_sync(enum zynqmp_csudma_channel channel)
                status = io_read32(dma + CSUDMA_I_STS_OFFSET);
                if ((status & CSUDMA_IXR_DONE_MASK) == CSUDMA_IXR_DONE_MASK) {
                        csudma_clear_intr(channel, CSUDMA_IXR_DONE_MASK);
+
+                       if (channel == ZYNQMP_CSUDMA_DST_CHANNEL) {
+                               /*
+                               * This retry extra delay here is a workaround for hardware
+                               * issue where CSU-DMA transfer is not yet delivered to DDR
+                               * memory.
+                               *
+                               * Even thou there is now a signal that transfer has left the
+                               * DMA engine there are still other CPU's internal FIFOs that
+                               * needs some time to transfer data to the final destination.
+                               *
+                               * 1 millisecond delay should be more than enough to transfer
+                               * the remaining data.
+                               *
+                               * Technically this should only be needed on DDR destination but
+                               * as operations in OP-TEE are few, the extra delay for crypto
+                               * destination should not cause much an issue. "Guarantee" that
+                               * the data is in destination before returning to caller is the
+                               * driver here.
+                               */
+                               mdelay(1);
+                       }
                        return TEE_SUCCESS;
                }
        }

What do you think -- should we actually panic() if HUK generation fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense. As long as the CSUDMA is not used excessively on the OP-TEE side this should not impact performance. I agree that getting some feedback from Xilinx/AMD would be great here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that alternative change did not work on the hardware tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based the csudma driver on https://github.com/Xilinx/embeddedsw/tree/master/XilinxProcessorIPLib/drivers/csudma . Could you check if there is any fix for the issue you are reporting there ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can check -- their CSU-DMA memcpy() has been broken for some releases -- we have a patch to workaround that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that alternative change did not work on the hardware tests.

Will retry this with cache line fix that had insufficient amount of bytes for size.

* DDR memory.
*/
mdelay(1);

/*
* Invalidate src buffer from caches as memcmp above has
* already loaded data from DDR to cache so that next compare
* will be with fresh data.
*/
ret = cache_operation(TEE_CACHEINVALIDATE, src, sizeof(src));
if (ret) {
EMSG("Failed to invalidate cache");
goto cleanup;
}
retry_counter++;
} while (1);

IMSG("HUK ready");

Expand Down
Loading