Skip to content

MemECC - possible spurious store-response alerts #2342

@billcau

Description

@billcau

As I'm continuing some formal verification work looking into the ECC functionality in Ibex, I hit another issue here with my property below:

  // Property: no_store_ecc_alert
  // Intent: Store acknowledgements must never raise the store integrity error output because the
  //          ECC decoder only inspects read data.
  // Source: ibex_load_store_unit.sv lines 520-555 (analysis of MemECC handling).
  // Timing: Same-cycle implication on the LSU clk domain (posedge clk_i, active-low reset).
  // Status: intended_to_pass (BUG_WITH_ASSERTION once it fails).
  generate
    if (MemECC) begin : gen_store_ecc_assert
      property no_store_ecc_alert;
        @(posedge clk_i) disable iff (!rst_ni)
          (data_rvalid_i && data_we_q) |-> !store_resp_intg_err_o;
      endproperty
      ast_no_store_ecc_alert: assert property (no_store_ecc_alert);
    end
  endgenerate

Here are some thoughts/questions I was thinking and the detailed analysis:

  • Store acknowledgement marker: In ibex_load_store_unit, the FSM registers the operation direction in data_we_q (latched when the request issues). When the bus returns data_rvalid_i, the LSU still has data_we_q telling whether that outstanding access was a store (1 means store). See data_we_q assignments/uses in src/rtl/ibex_load_store_unit.sv.
  • “Arbitrary” data on store responses: The external bus returns a 32b payload even for writes. That payload is don't-care (typically the previous bus contents or zero), but because the LSU feeds data_rdata_i—including its 7 ECC bits—into prim_secded_inv_39_32_dec, any random combination of parity bits yields a non-zero syndrome (data_intg_err=1). With data_we_q=1, the logic sets store_resp_intg_err_o = data_intg_err & data_rvalid_i & data_we_q, so every store response can assert the integrity flag regardless of actual memory contents.
  • Why the alert exists even though stores can’t be checked: The design seems to implement symmetric reporting of “integrity errors on responses”, but it overlooked that the only integrity checker instantiated comes from the read-data path. There is no encoder/decoder on the returning data for stores—only the outgoing write data goes through prim_secded_inv_39_32_enc. Therefore, store_resp_intg_err_o has no real meaning; it simply mirrors the load-side data_intg_err. Because ibex_core ORs this flag into alert_major_bus_o (src/rtl/ibex_core.sv), the spurious ECC flag propagates up as an alert even though no ECC verification was performed for the store.
  • data_rvalid_i does toggle for stores in the design. The LSU is treating the incoming data_rvalid_i as “response complete” for both loads and stores (look at l_su_resp_valid_o = (ls_fsm_cs == IDLE) & data_rvalid_i & …_ in src/rtl/ibex_load_store_unit.sv). So a compliant bus will raise data_rvalid_i to acknowledge a write, even though it doesn’t return meaningful data. My formal run left the bus unconstrained, but the failure it found is exactly how the RTL runs: on a store response, data_rvalid_i=1, data_we_q=1 (latched direction), and the payload bits are don’t-care. The ECC decoder still chews on those 39 bits (prim_secded_inv_39_32_dec). Any junk parity/data combination gives a non-zero syndrome → data_intg_err=1. Because the code simply does store_resp_intg_err_o = data_intg_err & data_rvalid_i & data_we_q, the store acknowledgement propagates that bogus error all the way to alert_major_bus_o. So the bug isn’t a formal-only artifact—it’s the natural hardware behavior unless the bus HW “kindly” echoes a valid ECC word for every store, which it doesn’t by spec.

Looking at the actual datapath:

  • Read side only: The only ECC checker in the LSU is prim_secded_inv_39_32_dec u_data_intg_dec at src/rtl/ibex_load_store_unit.sv. Its input is data_rdata_i (bus read data). The comments immediately above it say “Read data integrity check … data_intg_err = |ecc_err;”. That’s the sole source of data_intg_err.
  • Write side: When MemECC is enabled, the write path instantiates prim_secded_inv_39_32_enc at src/rtl/ibex_load_store_unit.sv to generate parity bits for data_wdata_o. There is no decoder on returning store data; the protocol doesn’t send meaningful data back for writes.
  • Alert wiring: store_resp_intg_err_o is literally data_intg_err & data_rvalid_i & data_we_q (src/rtl/ibex_load_store_unit.sv). Since data_intg_err comes only from the read decoder, tying it to a write response has no physical basis. The intent is spelled out in the comment block just above: “Integrity errors are their own category… The data_intg_err signal is generated combinatorially from the incoming data_rdata_i.” No mention of a store checker because there isn’t one. So the design itself dictates “ECC errors are valid only during reads”: the checker exists only on read data, and the documentation around it explicitly says it uses data_rdata_i (read path) to generate errors. The store alert output simply reuses that read-only flag, which is why it seems bogus.

So what use is store_resp_intg_err_o? if the data_intg_err is only trigger from reads, then what possibly can the store_resp_intg_err_o mean?

• It effectively looks to be meaningless. store_resp_intg_err_o is meant to flag an integrity failure on a store response, but the circuit never actually checks store data— data_intg_err is computed exclusively from the read-return bus (prim_secded_inv_39_32_dec on data_rdata_i). When a write completes the bus still raises data_rvalid_i, but the returned “data” is don’t-care; feeding that junk through the read decoder is what produces the false alert. So store_resp_intg_err_o is just data_intg_err & data_we_q, i.e. “whatever the read checker said, but only when the outstanding transaction happened to be a store.” That’s why the assertion fails and why the top-level alert OR tree propagates bogus store faults: the signal doesn’t represent any real store-side ECC check currently.

The possible fix might either to drop store_resp_intg_err_o (tie it low) or to implement a genuine write-data return check—but the current RTL clearly intends only the read path to validate ECC.

Observed Behavior

Expected Behavior

Steps to reproduce the issue

My Environment

EDA tool and version:

Operating system:

Version of the Ibex source code:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions