Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MemECC - possible spurious store-response alerts #2342

Copy link
Copy link
@billcau

Description

@billcau
Issue body actions

As I'm continuing some formal verification work looking into the ECC functionality in Ibex, I hit another issue here with my property below:

  // Property: no_store_ecc_alert
  // Intent: Store acknowledgements must never raise the store integrity error output because the
  //          ECC decoder only inspects read data.
  // Source: ibex_load_store_unit.sv lines 520-555 (analysis of MemECC handling).
  // Timing: Same-cycle implication on the LSU clk domain (posedge clk_i, active-low reset).
  // Status: intended_to_pass (BUG_WITH_ASSERTION once it fails).
  generate
    if (MemECC) begin : gen_store_ecc_assert
      property no_store_ecc_alert;
        @(posedge clk_i) disable iff (!rst_ni)
          (data_rvalid_i && data_we_q) |-> !store_resp_intg_err_o;
      endproperty
      ast_no_store_ecc_alert: assert property (no_store_ecc_alert);
    end
  endgenerate

Here are some thoughts/questions I was thinking and the detailed analysis:

  • Store acknowledgement marker: In ibex_load_store_unit, the FSM registers the operation direction in data_we_q (latched when the request issues). When the bus returns data_rvalid_i, the LSU still has data_we_q telling whether that outstanding access was a store (1 means store). See data_we_q assignments/uses in src/rtl/ibex_load_store_unit.sv.
  • “Arbitrary” data on store responses: The external bus returns a 32b payload even for writes. That payload is don't-care (typically the previous bus contents or zero), but because the LSU feeds data_rdata_i—including its 7 ECC bits—into prim_secded_inv_39_32_dec, any random combination of parity bits yields a non-zero syndrome (data_intg_err=1). With data_we_q=1, the logic sets store_resp_intg_err_o = data_intg_err & data_rvalid_i & data_we_q, so every store response can assert the integrity flag regardless of actual memory contents.
  • Why the alert exists even though stores can’t be checked: The design seems to implement symmetric reporting of “integrity errors on responses”, but it overlooked that the only integrity checker instantiated comes from the read-data path. There is no encoder/decoder on the returning data for stores—only the outgoing write data goes through prim_secded_inv_39_32_enc. Therefore, store_resp_intg_err_o has no real meaning; it simply mirrors the load-side data_intg_err. Because ibex_core ORs this flag into alert_major_bus_o (src/rtl/ibex_core.sv), the spurious ECC flag propagates up as an alert even though no ECC verification was performed for the store.
  • data_rvalid_i does toggle for stores in the design. The LSU is treating the incoming data_rvalid_i as “response complete” for both loads and stores (look at l_su_resp_valid_o = (ls_fsm_cs == IDLE) & data_rvalid_i & …_ in src/rtl/ibex_load_store_unit.sv). So a compliant bus will raise data_rvalid_i to acknowledge a write, even though it doesn’t return meaningful data. My formal run left the bus unconstrained, but the failure it found is exactly how the RTL runs: on a store response, data_rvalid_i=1, data_we_q=1 (latched direction), and the payload bits are don’t-care. The ECC decoder still chews on those 39 bits (prim_secded_inv_39_32_dec). Any junk parity/data combination gives a non-zero syndrome → data_intg_err=1. Because the code simply does store_resp_intg_err_o = data_intg_err & data_rvalid_i & data_we_q, the store acknowledgement propagates that bogus error all the way to alert_major_bus_o. So the bug isn’t a formal-only artifact—it’s the natural hardware behavior unless the bus HW “kindly” echoes a valid ECC word for every store, which it doesn’t by spec.

Looking at the actual datapath:

  • Read side only: The only ECC checker in the LSU is prim_secded_inv_39_32_dec u_data_intg_dec at src/rtl/ibex_load_store_unit.sv. Its input is data_rdata_i (bus read data). The comments immediately above it say “Read data integrity check … data_intg_err = |ecc_err;”. That’s the sole source of data_intg_err.
  • Write side: When MemECC is enabled, the write path instantiates prim_secded_inv_39_32_enc at src/rtl/ibex_load_store_unit.sv to generate parity bits for data_wdata_o. There is no decoder on returning store data; the protocol doesn’t send meaningful data back for writes.
  • Alert wiring: store_resp_intg_err_o is literally data_intg_err & data_rvalid_i & data_we_q (src/rtl/ibex_load_store_unit.sv). Since data_intg_err comes only from the read decoder, tying it to a write response has no physical basis. The intent is spelled out in the comment block just above: “Integrity errors are their own category… The data_intg_err signal is generated combinatorially from the incoming data_rdata_i.” No mention of a store checker because there isn’t one. So the design itself dictates “ECC errors are valid only during reads”: the checker exists only on read data, and the documentation around it explicitly says it uses data_rdata_i (read path) to generate errors. The store alert output simply reuses that read-only flag, which is why it seems bogus.

So what use is store_resp_intg_err_o? if the data_intg_err is only trigger from reads, then what possibly can the store_resp_intg_err_o mean?

• It effectively looks to be meaningless. store_resp_intg_err_o is meant to flag an integrity failure on a store response, but the circuit never actually checks store data— data_intg_err is computed exclusively from the read-return bus (prim_secded_inv_39_32_dec on data_rdata_i). When a write completes the bus still raises data_rvalid_i, but the returned “data” is don’t-care; feeding that junk through the read decoder is what produces the false alert. So store_resp_intg_err_o is just data_intg_err & data_we_q, i.e. “whatever the read checker said, but only when the outstanding transaction happened to be a store.” That’s why the assertion fails and why the top-level alert OR tree propagates bogus store faults: the signal doesn’t represent any real store-side ECC check currently.

The possible fix might either to drop store_resp_intg_err_o (tie it low) or to implement a genuine write-data return check—but the current RTL clearly intends only the read path to validate ECC.

Observed Behavior

Expected Behavior

Steps to reproduce the issue

My Environment

EDA tool and version:

Operating system:

Version of the Ibex source code:

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.