Skip to content

MemECC - misaligned memory access masks out ECC bits #2341

@billcau

Description

@billcau

I'm trying out some formal verification work on the ibex RTL code base and came across a possible issue that seems real and present. When MemECC is enabled, misaligned memory access instructions like byte and halfword stores can result in wrongly computed ECC bits being stored. Here are the details:

  • ibex_load_store_unit keeps the normal byte-enable logic even when MemECC=1 (src/rtl/ibex_load_store_unit.sv), but the ECC encoder prim_secded_inv_39_32_enc always emits parity for the entire 32‑bit word (src/rtl/ibex_load_store_unit.sv). During a byte/half-word store, only the selected bytes are updated in memory while the parity bits are recomputed for the whole register operand, so the stored ECC subsequently no longer matches the actual word contents—ECC is effectively masked whenever partial writes occur.

  • Root cause and impact: ECC parity is derived from the RS2 operand irrespective of data_be, so any store narrower than a word leaves memory with mismatched parity bits. Future loads from that word will trigger false ECC errors, and conversely legitimate ECC faults may be masked because parity no longer reflects the real contents. The only safe options seem to be:
    (a) force all stores to become full-word writes when MemECC=1 (e.g., gate data_be to 4'b1111 and require software to issue word stores) or
    (b) implement a read‑modify‑write path that merges the existing word with the new byte/half-word before recomputing ECC.

On reads—the decoder just checks whether the 32‑bit data word and its 7 parity bits still agree with the SECDED codeword that was written earlier. Those parity bits are created when store happens. In ibex, the ECC encoder always takes the entire 32‑bit write data (the RS2 operand after byte-lane shuffling) and regenerates a fresh 39‑bit word, regardless of data_be. When the LSU issues a sub‑word write (byte/half-word), only the selected lanes make it through to the memory array, but the seven ECC bits we just produced still get written in full. The untouched lanes keep their old data, so the 32 data bits present in memory no longer match the new parity bits. The next load from that word sees “uncorrectable ECC error” even though nothing corrupted between write and read—the mismatch was baked in by the partial store. The ECC checker would flag false errors on later reads (and conversely could hide real corruption if software keeps rewriting bytes).

The ECC bits are stored—unconditionally. On a store, data_wdata_core carries both the 32‑bit data and the 7 ECC bits to the top level (src/rtl/ibex_top.sv). The byte lanes go through data_be so only the selected bytes are written into the 4×8b data array (src/rtl/ibex_load_store_unit.sv). But the ECC path never sees data_be: prim_secded_inv_39_32_enc always encodes the entire 32‑bit data_wdata word and produces seven parity bits (src/rtl/ibex_load_store_unit.sv), and those parity bits are written in full every time (src/rtl/ibex_top.sv). So after a byte store you end up with three old bytes plus one new byte in the data array, but the ECC array contains the parity bits corresponding to “all four bytes = the new RS2 values.” That mismatch guarantees the next read fails the ECC check.

I would think this issue might have been easy enough to catch long ago and what I had encountered here is due to some misconfiguration or outdated sources. I would love to hear your feedback about this.

cheers,
--bill

Observed Behavior

Expected Behavior

Steps to reproduce the issue

My Environment

EDA tool and version:

Operating system:

Version of the Ibex source code:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions