The-OpenROAD-Project · maliberty · Nov 11, 2025 · Oct 27, 2025 · Oct 27, 2025 · Nov 7, 2025
diff --git a/docs/user/FlowVariables.md b/docs/user/FlowVariables.md
@@ -242,9 +242,11 @@ configuration file.
 | <a name="SYNTH_HDL_FRONTEND"></a>SYNTH_HDL_FRONTEND| Select an alternative language frontend to ingest the design. Available option is "slang". If the variable is empty, design is read with the Yosys read_verilog command.| |
 | <a name="SYNTH_HIERARCHICAL"></a>SYNTH_HIERARCHICAL| Enable to Synthesis hierarchically, otherwise considered flat synthesis.| 0|
 | <a name="SYNTH_HIER_SEPARATOR"></a>SYNTH_HIER_SEPARATOR| Separator used for the synthesis flatten stage.| .|
+| <a name="SYNTH_KEEP_MOCKED_MEMORIES"></a>SYNTH_KEEP_MOCKED_MEMORIES| Keeping the mocked memories(not flattening them), preserves some of the access logic complexity and avoids optimizations outside of the mocked memory.| 1|
 | <a name="SYNTH_KEEP_MODULES"></a>SYNTH_KEEP_MODULES| Mark modules to keep from getting removed in flattening.| |
-| <a name="SYNTH_MEMORY_MAX_BITS"></a>SYNTH_MEMORY_MAX_BITS| Maximum number of bits for memory synthesis.| 4096|
+| <a name="SYNTH_MEMORY_MAX_BITS"></a>SYNTH_MEMORY_MAX_BITS| Maximum number of bits for memory synthesis. Ideally, real RAM or realistic fakeram should be used for RAMs much larger than 1024 bits. To temporarily ignore the RAM concerns and investigate other aspects of the design, consider setting `SYNTH_MOCK_LARGE_MEMORIES=1`, or adjusting `SYNTH_MEMORY_MAX_BITS`.| 4096|
 | <a name="SYNTH_MINIMUM_KEEP_SIZE"></a>SYNTH_MINIMUM_KEEP_SIZE| For hierarchical synthesis, we keep modules of larger area than given by this variable and flatten smaller modules. The area unit used is the size of a basic nand2 gate from the platform's standard cell library. The default value is platform specific.| 0|
+| <a name="SYNTH_MOCK_LARGE_MEMORIES"></a>SYNTH_MOCK_LARGE_MEMORIES| Reduce Yosys inferred memories larger than SYNTH_MEMORY_MAX_BITS to 1 row. Yosys will generally infer memories from behavioral Verilog code, whether the memories are in standalone modules or instantiated within some larger module. fakeram and empty Verilog memories(blackboxes) of memories will not be inferred memories by Yosys and are therefore not affected by this variable. This is useful and convenient to separate the concern of instantiating and placing memories from investigating other issues with a design, though it comes at the expense of the increased accuracy that using realistic fakemem would provide. Memories with a single 1 row will of course have unrealistically good timing and area characteristics, but timing will still correctly terminate in a register. Large port memories, typically register files, will still have the retain a lot of the port logic that can be useful to investigate issues. This can be especially useful during development of designs where the behavioral model comes first and suitable memories are matched up when the design RTL is stable. A typical use case would be Chisel which will generate a behavioral model for a memories with the required clocks, ports, etc. in addition to a computer readable file with the specification of the memories that is used to [automatically](https://chipyard.readthedocs.io/en/stable/Tools/Barstools.html/) match up suitable memory macros later in the flow. During an architectural screening study, a large range of memory configurations can be investigated quickly with this option, without getting bogged down in the concern of how to realize the memories in silicon for emphemral RTL configurations that exist only long enough to run through the ORFS flow to create a table of some characteristics of a design configuration.| 0|
 | <a name="SYNTH_NETLIST_FILES"></a>SYNTH_NETLIST_FILES| Skips synthesis and uses the supplied netlist files. If the netlist files contains duplicate modules, which can happen when using hierarchical synthesis on indvidual netlist files and combining here, subsequent modules are silently ignored and only the first module is used.| |
 | <a name="SYNTH_OPT_HIER"></a>SYNTH_OPT_HIER| Optimize constants across hierarchical boundaries.| |
 | <a name="SYNTH_RETIME_MODULES"></a>SYNTH_RETIME_MODULES| *This is an experimental option and may cause adverse effects.* *No effort has been made to check if the retimed RTL is logically equivalent to the non-retimed RTL.* List of modules to apply automatic retiming to. These modules must not get dissolved and as such they should either be the top module or be included in SYNTH_KEEP_MODULES. The main use case is to quickly identify if performance can be improved by manually retiming the input RTL. Retiming will treat module ports like register endpoints/startpoints. The objective function of retiming isn't informed by SDC, even the clock period is ignored. As such, retiming will optimize for best delay at potentially high register number cost. Automatic retiming can produce suboptimal results as its timing model is crude and it doesn't find the optimal distribution of registers on long pipelines. See OR discussion #8080.| |
@@ -281,9 +283,11 @@ configuration file.
 - [SYNTH_GUT](#SYNTH_GUT)
 - [SYNTH_HDL_FRONTEND](#SYNTH_HDL_FRONTEND)
 - [SYNTH_HIERARCHICAL](#SYNTH_HIERARCHICAL)
+- [SYNTH_KEEP_MOCKED_MEMORIES](#SYNTH_KEEP_MOCKED_MEMORIES)
 - [SYNTH_KEEP_MODULES](#SYNTH_KEEP_MODULES)
 - [SYNTH_MEMORY_MAX_BITS](#SYNTH_MEMORY_MAX_BITS)
 - [SYNTH_MINIMUM_KEEP_SIZE](#SYNTH_MINIMUM_KEEP_SIZE)
+- [SYNTH_MOCK_LARGE_MEMORIES](#SYNTH_MOCK_LARGE_MEMORIES)
 - [SYNTH_NETLIST_FILES](#SYNTH_NETLIST_FILES)
 - [SYNTH_OPT_HIER](#SYNTH_OPT_HIER)
 - [SYNTH_RETIME_MODULES](#SYNTH_RETIME_MODULES)

diff --git a/docs/user/LargeDesigns.md b/docs/user/LargeDesigns.md
@@ -0,0 +1,26 @@
+# Tips on building large design
+
+Large designs can quickly result in unmanageable turnaround times for tweaking and fixing if the design contains behavioral memory models, because these memories are by default translated to flip flops.
+
+ORFS has a `SYNTH_MEMORY_MAX_BITS` that limits the size of inferred memories that are translated to flip flops to avoid doomed synthesis runs that will "running forever", instead ORFS will error out early, normally within minutes.
+
+Behavioral models of memories are used in simulation and FPGA tools oftentimes automatically combine hard memory macros with some extra logic to match the behavioral model. OpenROAD does not do such automatic memory inference and matching against real memories or fakemem.
+
+## Doing a screening build
+
+Before deciding how to set up a flow, it is useful to do a "screening build". All we're intersted in here is to know which modules we have and their relative sizes. This can help us identify memories that have not been successfully inferred by Yosys, which will manifest itself as very long synthesis times and appear in the OpenROAD hierarchical view with a large number of instances.
+
+The [minimal build configuration](flow/designs/asap7/minimal/README.md)
+ can be useful to do a screening build.
+
+Options useful for a screening build are, check out [config.mk](flow/designs/asap7/minimal/config.mk):
+
+- `SYNTH_HIERARCHICAL=1` and `SYNTH_MINIMUM_KEEP_SIZE=0`, to see all modules in the hierarchical OpenROAD view
+- `SYNTH_MEMORY_MAX_BITS=1024`, set a low threshold initially to get an error with list of memories in the system that will need to be dealt with in some way
+- `SYNTH_MOCK_LARGE_MEMORIES=1` enabled after first seeing the error report with memories. This sets the number of rows in memories larger than `SYNTH_MEMORY_MAX_BITS` to 1, so that synthesis will complete.
+
+## Next steps on memories
+
+- If you're taping out, write some wrapper Verilog for real memories.
+- fakemem can be a good option if available for your PDK. fakemem also needs manually written Verilog wrappers, just like real memories.
+- For architectural exploration, `SYNTH_MOCK_LARGE_MEMORIES=1` could give you adequate timing accuracy and is convenient.
diff --git a/flow/designs/sky130hd/microwatt/README.md b/flow/designs/sky130hd/microwatt/README.md
@@ -0,0 +1,190 @@
+# Mocking vs fake memories
+
+Configuring fake memories for this design would speed up the ORFS flow and increase accuracy of results, but some effort is required. Two methods for learning something about a design without setting up SRAM are explained here.
+
+## Synthesis with large flip flop memories
+
+By default `SYNTH_MEMORY_MAX_BITS=42000` since no fake memories have been configured for this example design. This is simple, but results in slow builds and unrealistically large amount of flip flops and singificantly slower timing than fakeram or real RAMs.
+
+![RAMs as flops histogram](histogram-sram-as-flops.png)
+
+## Results with `SYNTH_MOCK_LARGE_MEMORIES=1`
+
+To ensure a quick synthesis run and to better understand the design without being slowed down by large memory blocks, we set a `SYNTH_MEMORY_MAX_BITS=1024`. This helps us bypass potential memory-related issues and focus on other ORFS flow issues of the design.
+
+During synthesis, certain modules are reported in error messages when `SYNTH_MEMORY_MAX_BITS=1024` and `SYNTH_MOCK_LARGE_MEMORIES=0`. By explicitly listing these modules in `SYNTH_KEEP_MODULES`, we avoid further optimizations outside of the mocked memories that could obscure the behavior of the rest of the design.
+
+The goal of these settings is to enable a rapid exploration of the flow, providing insights into the design while minimizing complications from large memory structures.
+
+![Mocked memory histogram](histogram-mock-memory.png)
+
+## Other ways to speed up synthesis
+
+There is a small advantage in synthesis time for `SYNTH_MOCK_LARGE_MEMORIES=1`, it shaves off ca. 1 minute on a 3 minute build on a test on a laptop. However, larger designs can have synthesis run into hours if memories are not managed with a bit of care.
+
+For synthesis, yosys-abc actually takes most of the time and SRAMs don't generally change in a design, even if other RTL development continues. It is possible to keep the synthesized netlist for SRAMs list them in `SYNTH_BLACKBOXES` and simply concatenate the already built netlists onto the `1_synth.v` files before continuing the flow.
+
+If large modules that change rarely are kept in a large design and only a small part of the design changes during RTL development, then it is possible to set up a build flow that completes in minutes instead of hours.
+
+## A/B run times
+
+The difference in run-times for mocking and simply instantiating larger flip flop based RAMs is not large on this design, but on designs with bigger SRAMs, the difference can be substantial.
+
+    make DESIGN_CONFIG=designs/sky130hd/microwatt/config.mk SYNTH_MOCK_LARGE_MEMORIES=1 FLOW_VARIANT=mock
+    make DESIGN_CONFIG=designs/sky130hd/microwatt/config.mk
+
+| Step                      | Mock RAM/s | Default/s |
+|---------------------------|------------|-----------|
+| 1_1_yosys_canonicalize    | 4          | 4         |
+| 1_2_yosys                 | 161        | 182       |
+| 1_3_synth                 |            | 1         |
+| 2_1_floorplan             | 72         | 83        |
+| 2_2_floorplan_macro       | 16         | 16        |
+| 2_3_floorplan_tapcell     | 1          | 0         |
+| 2_4_floorplan_pdn         | 7          | 9         |
+| 3_1_place_gp_skip_io      | 43         | 45        |
+| 3_2_place_iop             | 1          | 1         |
+| 3_3_place_gp              | 331        | 327       |
+| 3_4_place_resized         | 68         | 65        |
+| 3_5_place_dp              | 79         | 74        |
+| 4_1_cts                   | 152        | 180       |
+| 5_1_grt                   | 385        | 404       |
+| 5_2_route                 | 3827       | 3960      |
+
+## `SYNTH_MOCK_LARGE_MEMORIES=1` worst ext_clk path
+
+```
+Startpoint: soc0/processor/icache_0/rams:1.way/cache_ram_0
+            (rising edge-triggered flip-flop clocked by ext_clk)
+Endpoint: soc0/processor/icache_0/_163_[147]$_DFFE_PP_
+          (rising edge-triggered flip-flop clocked by ext_clk)
+Path Group: ext_clk
+Path Type: max
+
+  Delay    Time   Description
+---------------------------------------------------------
+   0.00    0.00   clock ext_clk (rise edge)
+   4.07    4.07   clock network delay (propagated)
+   0.00    4.07 ^ soc0/processor/icache_0/rams:1.way/cache_ram_0/CLK (RAM32_1RW1R)
+  11.44   15.51 v soc0/processor/icache_0/rams:1.way/cache_ram_0/Do1[31] (RAM32_1RW1R)
+   0.62   16.14 v soc0/processor/icache_0/rams:1.way/_43_/X (sky130_fd_sc_hd__mux2_4)
+   0.42   16.56 v soc0/processor/icache_0/_2550_/X (sky130_fd_sc_hd__mux2_4)
+   0.19   16.75 v place24125/X (sky130_fd_sc_hd__buf_12)
+   0.15   16.90 v soc0/processor/decode1_0/_2318_/Y (sky130_fd_sc_hd__nand2b_4)
+   0.38   17.27 v soc0/processor/decode1_0/_2375_/X (sky130_fd_sc_hd__or3_4)
+   0.14   17.41 ^ soc0/processor/decode1_0/_2737_/Y (sky130_fd_sc_hd__nor2_4)
+   0.07   17.48 v soc0/processor/decode1_0/_2738_/Y (sky130_fd_sc_hd__inv_2)
+   0.19   17.67 ^ soc0/processor/decode1_0/_2740_/Y (sky130_fd_sc_hd__a21oi_4)
+   0.24   17.91 v soc0/processor/decode1_0/_3744_/Y (sky130_fd_sc_hd__nand4b_1)
+   0.16   18.06 ^ soc0/processor/decode1_0/_4248_/Y (sky130_fd_sc_hd__nand2b_1)
+   0.16   18.22 ^ soc0/processor/_318_/X (sky130_fd_sc_hd__or2_4)
+   0.05   18.27 v soc0/processor/icache_0/_2130_/Y (sky130_fd_sc_hd__nor2_4)
+   0.09   18.36 ^ soc0/processor/icache_0/_2131_/Y (sky130_fd_sc_hd__nand2_4)
+   0.05   18.41 v soc0/processor/icache_0/_2132_/Y (sky130_fd_sc_hd__a211oi_4)
+   0.19   18.60 v rebuffer29423/X (sky130_fd_sc_hd__buf_12)
+   0.11   18.72 ^ soc0/processor/icache_0/_2133_/Y (sky130_fd_sc_hd__nand2_8)
+   0.06   18.78 v soc0/processor/icache_0/_2155_/Y (sky130_fd_sc_hd__inv_12)
+   0.14   18.92 v place19388/X (sky130_fd_sc_hd__buf_12)
+   0.14   19.06 v place19392/X (sky130_fd_sc_hd__buf_12)
+   0.14   19.19 v place19394/X (sky130_fd_sc_hd__buf_12)
+   0.16   19.35 v soc0/processor/icache_0/_2494_/X (sky130_fd_sc_hd__and2_4)
+   0.00   19.35 v soc0/processor/icache_0/_163_[147]$_DFFE_PP_/D (sky130_fd_sc_hd__edfxtp_1)
+          19.35   data arrival time
+
+  15.00   15.00   clock ext_clk (rise edge)
+   3.42   18.42   clock network delay (propagated)
+  -0.25   18.17   clock uncertainty
+   0.14   18.31   clock reconvergence pessimism
+          18.31 ^ soc0/processor/icache_0/_163_[147]$_DFFE_PP_/CLK (sky130_fd_sc_hd__edfxtp_1)
+  -0.24   18.07   library setup time
+          18.07   data required time
+---------------------------------------------------------
+          18.07   data required time
+         -19.35   data arrival time
+---------------------------------------------------------
+          -1.27   slack (VIOLATED)
+```
+
+## `SYNTH_MOCK_LARGE_MEMORIES=0` worst ext_clk path
+
+As can be seen, there's no significant difference in the worst negative slack path for ext_clk.
+
+```
+Startpoint: soc0/processor/icache_0/rams:1.way/cache_ram_0
+            (rising edge-triggered flip-flop clocked by ext_clk)
+Endpoint: soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_
+          (rising edge-triggered flip-flop clocked by ext_clk)
+Path Group: ext_clk
+Path Type: max
+
+  Delay    Time   Description
+---------------------------------------------------------
+   0.00    0.00   clock ext_clk (rise edge)
+   4.04    4.04   clock network delay (propagated)
+   0.00    4.04 ^ soc0/processor/icache_0/rams:1.way/cache_ram_0/CLK (RAM32_1RW1R)
+  11.44   15.48 v soc0/processor/icache_0/rams:1.way/cache_ram_0/Do1[59] (RAM32_1RW1R)
+   0.64   16.12 v soc0/processor/icache_0/rams:1.way/_76_/X (sky130_fd_sc_hd__mux2_4)
+   0.36   16.48 v soc0/processor/icache_0/_2544_/X (sky130_fd_sc_hd__mux2_4)
+   0.15   16.64 v place27067/X (sky130_fd_sc_hd__buf_6)
+   0.06   16.70 ^ soc0/processor/decode1_0/_3560_/Y (sky130_fd_sc_hd__inv_4)
+   0.16   16.85 ^ soc0/processor/decode1_0/_6875_/COUT (sky130_fd_sc_hd__ha_4)
+   0.07   16.92 v soc0/processor/decode1_0/_3695_/Y (sky130_fd_sc_hd__nand2b_4)
+   0.39   17.31 ^ soc0/processor/decode1_0/_3696_/Y (sky130_fd_sc_hd__nor3_4)
+   0.20   17.51 ^ place24130/X (sky130_fd_sc_hd__buf_6)
+   0.06   17.57 v soc0/processor/decode1_0/_5317_/Y (sky130_fd_sc_hd__nand2_4)
+   0.30   17.87 ^ soc0/processor/decode1_0/_5318_/Y (sky130_fd_sc_hd__a21oi_4)
+   0.19   18.06 ^ place23148/X (sky130_fd_sc_hd__buf_6)
+   0.22   18.28 v soc0/processor/decode1_0/_6350_/Y (sky130_fd_sc_hd__nand4b_1)
+   0.29   18.57 v place22875/X (sky130_fd_sc_hd__buf_6)
+   0.10   18.67 ^ soc0/processor/decode1_0/_6854_/Y (sky130_fd_sc_hd__nand2b_4)
+   0.15   18.82 ^ soc0/processor/_318_/X (sky130_fd_sc_hd__or2_4)
+   0.12   18.94 ^ place22433/X (sky130_fd_sc_hd__buf_12)
+   0.05   18.99 v soc0/processor/icache_0/_2130_/Y (sky130_fd_sc_hd__nor2_4)
+   0.16   19.15 v place22148/X (sky130_fd_sc_hd__buf_6)
+   0.08   19.23 ^ soc0/processor/icache_0/_2131_/Y (sky130_fd_sc_hd__nand2_4)
+   0.07   19.30 v soc0/processor/icache_0/_2132_/Y (sky130_fd_sc_hd__a211oi_4)
+   0.18   19.48 v place21617/X (sky130_fd_sc_hd__buf_12)
+   0.11   19.59 ^ soc0/processor/icache_0/_2133_/Y (sky130_fd_sc_hd__nand2_8)
+   0.06   19.65 v soc0/processor/icache_0/_2155_/Y (sky130_fd_sc_hd__inv_8)
+   0.15   19.80 v place21391/X (sky130_fd_sc_hd__buf_12)
+   0.13   19.93 v place21403/X (sky130_fd_sc_hd__buf_12)
+   0.14   20.07 v rebuffer32771/X (sky130_fd_sc_hd__buf_4)
+   0.19   20.26 ^ soc0/processor/icache_0/_2285_/Y (sky130_fd_sc_hd__mux2i_1)
+   0.15   20.40 ^ place21348/X (sky130_fd_sc_hd__buf_4)
+   0.04   20.44 v soc0/processor/icache_0/_2286_/Y (sky130_fd_sc_hd__nor2_1)
+   0.00   20.44 v soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_/D (sky130_fd_sc_hd__dfxtp_1)
+          20.44   data arrival time
+
+  15.00   15.00   clock ext_clk (rise edge)
+   3.53   18.53   clock network delay (propagated)
+  -0.25   18.28   clock uncertainty
+   0.14   18.42   clock reconvergence pessimism
+          18.42 ^ soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_/CLK (sky130_fd_sc_hd__dfxtp_1)
+  -0.11   18.31   library setup time
+          18.31   data required time
+---------------------------------------------------------
+          18.31   data required time
+         -20.44   data arrival time
+---------------------------------------------------------
+          -2.13   slack (VIOLATED)
+```
+
+## Histogram of mocked memories
+
+It can be useful to look at the Endpoint Slack Histogram of mocked memories to examine if some of the paths are overly optimistic with a single row mocked memory or if the paths through the memory has problems even with a single row memory:
+
+Create a path group for the memories in question:
+
+    group_path -through *decode1_0* -name mocked
+
+Now that path group is in the dropdown in the Endpoint Slack Histogram:
+
+![Mocked memory Endpoint Slack Histogram](mocked-histogram.png)
+
+## Conclusion
+
+Above there's no visible difference in the Endpoint Slack histogram for the two approaches. In other words, the design doesn't appear to be terribly sensitive to how RAMs are mocked, other factors dominate and merit further investigation.
+
+ORFS is built on `make`, which shines for simple, fast flows. For larger, complicated, designs and with flows that take a long time to run, it is worth looking beyond `make` to [bazel-orfs](https://github.com/The-OpenROAD-Project/bazel-orfs)
+
+
diff --git a/flow/designs/sky130hd/microwatt/config.mk b/flow/designs/sky130hd/microwatt/config.mk
@@ -36,7 +36,15 @@ export SETUP_SLACK_MARGIN = 0.2
 # GRT non-default config
 export FASTROUTE_TCL = $(DESIGN_HOME)/$(PLATFORM)/$(DESIGN_NICKNAME)/fastroute.tcl
 
-# This is high, some SRAMs should probably be converted
-# to real SRAMs and not instantiated as flops
-export SYNTH_MEMORY_MAX_BITS = 42000
-
+ifeq ($(SYNTH_MOCK_LARGE_MEMORIES),1)
+    # ca. 3 minutes to run make synth
+    #
+    # These module names comes from the error report when setting SYNTH_MEMORY_MAX_BITS=2048
+    # and SYNTH_MOCK_LARGE_MEMORIES=0
+    #
+    # The goal is to run through the flow quickly to learn what we can
+    # about the design without getting bogged down in memory issues.
+    export SYNTH_MEMORY_MAX_BITS ?= 1024
+else
+    export SYNTH_MEMORY_MAX_BITS ?= 42000
+endif
diff --git a/flow/designs/sky130hd/microwatt/histogram-mock-memory.png b/flow/designs/sky130hd/microwatt/histogram-mock-memory.png
diff --git a/flow/designs/sky130hd/microwatt/histogram-sram-as-flops.png b/flow/designs/sky130hd/microwatt/histogram-sram-as-flops.png
diff --git a/flow/designs/sky130hd/microwatt/mocked-histogram.png b/flow/designs/sky130hd/microwatt/mocked-histogram.png