Skip to content

Commit e511371

Browse files
committed
synth: mock memories doc and example
Signed-off-by: Øyvind Harboe <[email protected]>
1 parent bf47644 commit e511371

File tree

6 files changed

+217
-6
lines changed

6 files changed

+217
-6
lines changed

docs/user/FlowVariables.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -243,9 +243,9 @@ configuration file.
243243
| <a name="SYNTH_HIERARCHICAL"></a>SYNTH_HIERARCHICAL| Enable to Synthesis hierarchically, otherwise considered flat synthesis.| 0|
244244
| <a name="SYNTH_HIER_SEPARATOR"></a>SYNTH_HIER_SEPARATOR| Separator used for the synthesis flatten stage.| .|
245245
| <a name="SYNTH_KEEP_MODULES"></a>SYNTH_KEEP_MODULES| Mark modules to keep from getting removed in flattening.| |
246-
| <a name="SYNTH_MEMORY_MAX_BITS"></a>SYNTH_MEMORY_MAX_BITS| Maximum number of bits for memory synthesis.| 4096|
246+
| <a name="SYNTH_MEMORY_MAX_BITS"></a>SYNTH_MEMORY_MAX_BITS| Maximum number of bits for memory synthesis. Ideally, real RAM or realistic fakeram should be used for RAMs much larger than 1024 bits. To temporarily ignore the RAM concerns and investigate other aspects of the design, consider setting `SYNTH_MOCK_LARGE_MEMORIES=1`, or adjusting `SYNTH_MEMORY_MAX_BITS`.| 4096|
247247
| <a name="SYNTH_MINIMUM_KEEP_SIZE"></a>SYNTH_MINIMUM_KEEP_SIZE| For hierarchical synthesis, we keep modules of larger area than given by this variable and flatten smaller modules. The area unit used is the size of a basic nand2 gate from the platform's standard cell library. The default value is platform specific.| 0|
248-
| <a name="SYNTH_MOCK_LARGE_MEMORIES"></a>SYNTH_MOCK_LARGE_MEMORIES| Reduce memories larger than SYNTH_MEMORY_MAX_BITS to 1 row. This is useful to separate the concern of instantiating and placing memories from investigating other issues with a design. Memories with a single 1 row will of course have unrealistically good timing and area characteristics, but timing will still correctly terminate in a register. Also, large port memories, typically register files, will still have the retain a lot of the port logic that can be useful to investigate issues.| 0|
248+
| <a name="SYNTH_MOCK_LARGE_MEMORIES"></a>SYNTH_MOCK_LARGE_MEMORIES| Reduce memories larger than SYNTH_MEMORY_MAX_BITS to 1 row. This is useful and convenient to separate the concern of instantiating and placing memories from investigating other issues with a design, though it comes at the expense of the increased accuracy that using realistic fakemem would provide. Memories with a single 1 row will of course have unrealistically good timing and area characteristics, but timing will still correctly terminate in a register. Large port memories, typically register files, will still have the retain a lot of the port logic that can be useful to investigate issues. Consider using SYNTH_KEEP_MODULES to keep the modules of the mocked memories so that code outside the mocked memories is not optimized as a consequence of mocking a memory, yielding better insight into issues running the rest of the design through the ORFS flow.| 0|
249249
| <a name="SYNTH_NETLIST_FILES"></a>SYNTH_NETLIST_FILES| Skips synthesis and uses the supplied netlist files. If the netlist files contains duplicate modules, which can happen when using hierarchical synthesis on indvidual netlist files and combining here, subsequent modules are silently ignored and only the first module is used.| |
250250
| <a name="SYNTH_OPT_HIER"></a>SYNTH_OPT_HIER| Optimize constants across hierarchical boundaries.| |
251251
| <a name="SYNTH_RETIME_MODULES"></a>SYNTH_RETIME_MODULES| *This is an experimental option and may cause adverse effects.* *No effort has been made to check if the retimed RTL is logically equivalent to the non-retimed RTL.* List of modules to apply automatic retiming to. These modules must not get dissolved and as such they should either be the top module or be included in SYNTH_KEEP_MODULES. The main use case is to quickly identify if performance can be improved by manually retiming the input RTL. Retiming will treat module ports like register endpoints/startpoints. The objective function of retiming isn't informed by SDC, even the clock period is ignored. As such, retiming will optimize for best delay at potentially high register number cost. Automatic retiming can produce suboptimal results as its timing model is crude and it doesn't find the optimal distribution of registers on long pipelines. See OR discussion #8080.| |
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# Mocking vs fake memories
2+
3+
Configuring fake memories for this design would speed up the ORFS flow and increase accuracy of results, but some effort is required. Two methods for learning something about a design without setting up SRAM are explained here.
4+
5+
## Synthesis with large flip flop memories
6+
7+
By default `SYNTH_MEMORY_MAX_BITS=42000` since no fake memories have been configured for this example design. This is simple, but results in slow builds and unrealistically large amount of flip flops and singificantly slower timing than fakeram or real RAMs.
8+
9+
![RAMs as flops histogram](histogram-sram-as-flops.png)
10+
11+
## Results with `SYNTH_MOCK_LARGE_MEMORIES=1`
12+
13+
To ensure a quick synthesis run and to better understand the design without being slowed down by large memory blocks, we set a `SYNTH_MEMORY_MAX_BITS=1024`. This helps us bypass potential memory-related issues and focus on other ORFS flow issues of the design.
14+
15+
During synthesis, certain modules are reported in error messages when `SYNTH_MEMORY_MAX_BITS=1024` and `SYNTH_MOCK_LARGE_MEMORIES=0`. By explicitly listing these modules in `SYNTH_KEEP_MODULES`, we avoid further optimizations outside of the mocked memories that could obscure the behavior of the rest of the design.
16+
17+
The goal of these settings is to enable a rapid exploration of the flow, providing insights into the design while minimizing complications from large memory structures.
18+
19+
![Mocked memory histogram](histogram-mock-memory.png)
20+
21+
## Other ways to speed up synthesis
22+
23+
There is a small advantage in synthesis time for `SYNTH_MOCK_LARGE_MEMORIES=1`, it shaves off ca. 1 minute on a 3 minute build on a test on a laptop. However, larger designs can have synthesis run into hours if memories are not managed with a bit of care.
24+
25+
For synthesis, yosys-abc actually takes most of the time and SRAMs don't generally change in a design, even if other RTL development continues. It is possible to keep the synthesized netlist for SRAMs list them in `SYNTH_BLACKBOXES` and simply concatenate the already built netlists onto the `1_synth.v` files before continuing the flow.
26+
27+
If large modules that change rarely are kept in a large design and only a small part of the design changes during RTL development, then it is possible to set up a build flow that completes in minutes instead of hours.
28+
29+
## A/B run times
30+
31+
The difference in run-times for mocking and simply instantiating larger flip flop based RAMs is not large on this design, but on designs with bigger SRAMs, the difference can be substantial.
32+
33+
make DESIGN_CONFIG=designs/sky130hd/microwatt/config.mk SYNTH_MOCK_LARGE_MEMORIES=1 FLOW_VARIANT=mock
34+
make DESIGN_CONFIG=designs/sky130hd/microwatt/config.mk
35+
36+
| Step | Mock RAM/s | Default/s |
37+
|---------------------------|------------|-----------|
38+
| 1_1_yosys_canonicalize | 4 | 4 |
39+
| 1_2_yosys | 161 | 182 |
40+
| 1_3_synth | | 1 |
41+
| 2_1_floorplan | 72 | 83 |
42+
| 2_2_floorplan_macro | 16 | 16 |
43+
| 2_3_floorplan_tapcell | 1 | 0 |
44+
| 2_4_floorplan_pdn | 7 | 9 |
45+
| 3_1_place_gp_skip_io | 43 | 45 |
46+
| 3_2_place_iop | 1 | 1 |
47+
| 3_3_place_gp | 331 | 327 |
48+
| 3_4_place_resized | 68 | 65 |
49+
| 3_5_place_dp | 79 | 74 |
50+
| 4_1_cts | 152 | 180 |
51+
| 5_1_grt | 385 | 404 |
52+
| 5_2_route | 3827 | 3960 |
53+
54+
## `SYNTH_MOCK_LARGE_MEMORIES=1` worst ext_clk path
55+
56+
```
57+
Startpoint: soc0/processor/icache_0/rams:1.way/cache_ram_0
58+
(rising edge-triggered flip-flop clocked by ext_clk)
59+
Endpoint: soc0/processor/icache_0/_163_[147]$_DFFE_PP_
60+
(rising edge-triggered flip-flop clocked by ext_clk)
61+
Path Group: ext_clk
62+
Path Type: max
63+
64+
Delay Time Description
65+
---------------------------------------------------------
66+
0.00 0.00 clock ext_clk (rise edge)
67+
4.07 4.07 clock network delay (propagated)
68+
0.00 4.07 ^ soc0/processor/icache_0/rams:1.way/cache_ram_0/CLK (RAM32_1RW1R)
69+
11.44 15.51 v soc0/processor/icache_0/rams:1.way/cache_ram_0/Do1[31] (RAM32_1RW1R)
70+
0.62 16.14 v soc0/processor/icache_0/rams:1.way/_43_/X (sky130_fd_sc_hd__mux2_4)
71+
0.42 16.56 v soc0/processor/icache_0/_2550_/X (sky130_fd_sc_hd__mux2_4)
72+
0.19 16.75 v place24125/X (sky130_fd_sc_hd__buf_12)
73+
0.15 16.90 v soc0/processor/decode1_0/_2318_/Y (sky130_fd_sc_hd__nand2b_4)
74+
0.38 17.27 v soc0/processor/decode1_0/_2375_/X (sky130_fd_sc_hd__or3_4)
75+
0.14 17.41 ^ soc0/processor/decode1_0/_2737_/Y (sky130_fd_sc_hd__nor2_4)
76+
0.07 17.48 v soc0/processor/decode1_0/_2738_/Y (sky130_fd_sc_hd__inv_2)
77+
0.19 17.67 ^ soc0/processor/decode1_0/_2740_/Y (sky130_fd_sc_hd__a21oi_4)
78+
0.24 17.91 v soc0/processor/decode1_0/_3744_/Y (sky130_fd_sc_hd__nand4b_1)
79+
0.16 18.06 ^ soc0/processor/decode1_0/_4248_/Y (sky130_fd_sc_hd__nand2b_1)
80+
0.16 18.22 ^ soc0/processor/_318_/X (sky130_fd_sc_hd__or2_4)
81+
0.05 18.27 v soc0/processor/icache_0/_2130_/Y (sky130_fd_sc_hd__nor2_4)
82+
0.09 18.36 ^ soc0/processor/icache_0/_2131_/Y (sky130_fd_sc_hd__nand2_4)
83+
0.05 18.41 v soc0/processor/icache_0/_2132_/Y (sky130_fd_sc_hd__a211oi_4)
84+
0.19 18.60 v rebuffer29423/X (sky130_fd_sc_hd__buf_12)
85+
0.11 18.72 ^ soc0/processor/icache_0/_2133_/Y (sky130_fd_sc_hd__nand2_8)
86+
0.06 18.78 v soc0/processor/icache_0/_2155_/Y (sky130_fd_sc_hd__inv_12)
87+
0.14 18.92 v place19388/X (sky130_fd_sc_hd__buf_12)
88+
0.14 19.06 v place19392/X (sky130_fd_sc_hd__buf_12)
89+
0.14 19.19 v place19394/X (sky130_fd_sc_hd__buf_12)
90+
0.16 19.35 v soc0/processor/icache_0/_2494_/X (sky130_fd_sc_hd__and2_4)
91+
0.00 19.35 v soc0/processor/icache_0/_163_[147]$_DFFE_PP_/D (sky130_fd_sc_hd__edfxtp_1)
92+
19.35 data arrival time
93+
94+
15.00 15.00 clock ext_clk (rise edge)
95+
3.42 18.42 clock network delay (propagated)
96+
-0.25 18.17 clock uncertainty
97+
0.14 18.31 clock reconvergence pessimism
98+
18.31 ^ soc0/processor/icache_0/_163_[147]$_DFFE_PP_/CLK (sky130_fd_sc_hd__edfxtp_1)
99+
-0.24 18.07 library setup time
100+
18.07 data required time
101+
---------------------------------------------------------
102+
18.07 data required time
103+
-19.35 data arrival time
104+
---------------------------------------------------------
105+
-1.27 slack (VIOLATED)
106+
```
107+
108+
## `SYNTH_MOCK_LARGE_MEMORIES=0` worst ext_clk path
109+
110+
As can be seen, there's no significant difference in the worst negative slack path for ext_clk.
111+
112+
```
113+
Startpoint: soc0/processor/icache_0/rams:1.way/cache_ram_0
114+
(rising edge-triggered flip-flop clocked by ext_clk)
115+
Endpoint: soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_
116+
(rising edge-triggered flip-flop clocked by ext_clk)
117+
Path Group: ext_clk
118+
Path Type: max
119+
120+
Delay Time Description
121+
---------------------------------------------------------
122+
0.00 0.00 clock ext_clk (rise edge)
123+
4.04 4.04 clock network delay (propagated)
124+
0.00 4.04 ^ soc0/processor/icache_0/rams:1.way/cache_ram_0/CLK (RAM32_1RW1R)
125+
11.44 15.48 v soc0/processor/icache_0/rams:1.way/cache_ram_0/Do1[59] (RAM32_1RW1R)
126+
0.64 16.12 v soc0/processor/icache_0/rams:1.way/_76_/X (sky130_fd_sc_hd__mux2_4)
127+
0.36 16.48 v soc0/processor/icache_0/_2544_/X (sky130_fd_sc_hd__mux2_4)
128+
0.15 16.64 v place27067/X (sky130_fd_sc_hd__buf_6)
129+
0.06 16.70 ^ soc0/processor/decode1_0/_3560_/Y (sky130_fd_sc_hd__inv_4)
130+
0.16 16.85 ^ soc0/processor/decode1_0/_6875_/COUT (sky130_fd_sc_hd__ha_4)
131+
0.07 16.92 v soc0/processor/decode1_0/_3695_/Y (sky130_fd_sc_hd__nand2b_4)
132+
0.39 17.31 ^ soc0/processor/decode1_0/_3696_/Y (sky130_fd_sc_hd__nor3_4)
133+
0.20 17.51 ^ place24130/X (sky130_fd_sc_hd__buf_6)
134+
0.06 17.57 v soc0/processor/decode1_0/_5317_/Y (sky130_fd_sc_hd__nand2_4)
135+
0.30 17.87 ^ soc0/processor/decode1_0/_5318_/Y (sky130_fd_sc_hd__a21oi_4)
136+
0.19 18.06 ^ place23148/X (sky130_fd_sc_hd__buf_6)
137+
0.22 18.28 v soc0/processor/decode1_0/_6350_/Y (sky130_fd_sc_hd__nand4b_1)
138+
0.29 18.57 v place22875/X (sky130_fd_sc_hd__buf_6)
139+
0.10 18.67 ^ soc0/processor/decode1_0/_6854_/Y (sky130_fd_sc_hd__nand2b_4)
140+
0.15 18.82 ^ soc0/processor/_318_/X (sky130_fd_sc_hd__or2_4)
141+
0.12 18.94 ^ place22433/X (sky130_fd_sc_hd__buf_12)
142+
0.05 18.99 v soc0/processor/icache_0/_2130_/Y (sky130_fd_sc_hd__nor2_4)
143+
0.16 19.15 v place22148/X (sky130_fd_sc_hd__buf_6)
144+
0.08 19.23 ^ soc0/processor/icache_0/_2131_/Y (sky130_fd_sc_hd__nand2_4)
145+
0.07 19.30 v soc0/processor/icache_0/_2132_/Y (sky130_fd_sc_hd__a211oi_4)
146+
0.18 19.48 v place21617/X (sky130_fd_sc_hd__buf_12)
147+
0.11 19.59 ^ soc0/processor/icache_0/_2133_/Y (sky130_fd_sc_hd__nand2_8)
148+
0.06 19.65 v soc0/processor/icache_0/_2155_/Y (sky130_fd_sc_hd__inv_8)
149+
0.15 19.80 v place21391/X (sky130_fd_sc_hd__buf_12)
150+
0.13 19.93 v place21403/X (sky130_fd_sc_hd__buf_12)
151+
0.14 20.07 v rebuffer32771/X (sky130_fd_sc_hd__buf_4)
152+
0.19 20.26 ^ soc0/processor/icache_0/_2285_/Y (sky130_fd_sc_hd__mux2i_1)
153+
0.15 20.40 ^ place21348/X (sky130_fd_sc_hd__buf_4)
154+
0.04 20.44 v soc0/processor/icache_0/_2286_/Y (sky130_fd_sc_hd__nor2_1)
155+
0.00 20.44 v soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_/D (sky130_fd_sc_hd__dfxtp_1)
156+
20.44 data arrival time
157+
158+
15.00 15.00 clock ext_clk (rise edge)
159+
3.53 18.53 clock network delay (propagated)
160+
-0.25 18.28 clock uncertainty
161+
0.14 18.42 clock reconvergence pessimism
162+
18.42 ^ soc0/processor/icache_0/_163_[14]$_SDFFE_PP0P_/CLK (sky130_fd_sc_hd__dfxtp_1)
163+
-0.11 18.31 library setup time
164+
18.31 data required time
165+
---------------------------------------------------------
166+
18.31 data required time
167+
-20.44 data arrival time
168+
---------------------------------------------------------
169+
-2.13 slack (VIOLATED)
170+
```
171+
172+
## Conclusion
173+
174+
Above there's no visible difference in the Endpoint Slack histogram for the two approaches. In other words, the design doesn't appear to be terribly sensitive to how RAMs are mocked, other factors dominate and merit further investigation.
175+
176+
ORFS is built on `make`, which shines for simple, fast flows. For larger, complicated, designs and with flows that take a long time to run, it is worth looking beyond `make` to [bazel-orfs](https://github.com/The-OpenROAD-Project/bazel-orfs)
177+
178+

flow/designs/sky130hd/microwatt/config.mk

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,23 @@ export SETUP_SLACK_MARGIN = 0.2
3636
# GRT non-default config
3737
export FASTROUTE_TCL = $(DESIGN_HOME)/$(PLATFORM)/$(DESIGN_NICKNAME)/fastroute.tcl
3838

39-
export SYNTH_MOCK_LARGE_MEMORIES = 1
39+
ifeq ($(SYNTH_MOCK_LARGE_MEMORIES),1)
40+
# ca. 3 minutes to run make synth
41+
#
42+
# These module names comes from the error report when setting SYNTH_MEMORY_MAX_BITS=2048
43+
# and SYNTH_MOCK_LARGE_MEMORIES=0
44+
#
45+
# Keeping them avoids mocking them away, which would lead to further optimizations
46+
# that would obscure what is going on in the rest of the design.
47+
export SYNTH_KEEP_MODULES=decode1_0_bf8b4530d8d246dd74ac53a13471bba17941dff7 \
48+
decode1_0_bf8b4530d8d246dd74ac53a13471bba17941dff7 \
49+
fpu \
50+
decode1_0_bf8b4530d8d246dd74ac53a13471bba17941dff7 \
51+
decode1_0_bf8b4530d8d246dd74ac53a13471bba17941dff7 \
52+
decode1_0_bf8b4530d8d246dd74ac53a13471bba17941dff7
53+
# The goal is to run through the flow quickly to learn what we can
54+
# about the design without getting bogged down in memory issues.
55+
export SYNTH_MEMORY_MAX_BITS ?= 1024
56+
else
57+
export SYNTH_MEMORY_MAX_BITS ?= 42000
58+
endif
24.8 KB
Loading
37.8 KB
Loading

flow/scripts/variables.yaml

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -174,22 +174,36 @@ SYNTH_HIERARCHICAL:
174174
SYNTH_MEMORY_MAX_BITS:
175175
description: >
176176
Maximum number of bits for memory synthesis.
177+
178+
Ideally, real RAM or realistic fakeram should be used for RAMs
179+
much larger than 1024 bits.
180+
181+
To temporarily ignore the RAM concerns and investigate other
182+
aspects of the design, consider setting `SYNTH_MOCK_LARGE_MEMORIES=1`,
183+
or adjusting `SYNTH_MEMORY_MAX_BITS`.
177184
default: 4096
178185
stages:
179186
- synth
180187
SYNTH_MOCK_LARGE_MEMORIES:
181188
description: >
182189
Reduce memories larger than SYNTH_MEMORY_MAX_BITS to 1 row.
183190
184-
This is useful to separate the concern of instantiating and placing
185-
memories from investigating other issues with a design.
191+
This is useful and convenient to separate the concern of instantiating
192+
and placing memories from investigating other issues with a design,
193+
though it comes at the expense of the increased accuracy that using
194+
realistic fakemem would provide.
186195
187196
Memories with a single 1 row will of course have unrealistically good
188197
timing and area characteristics, but timing will still correctly terminate
189198
in a register.
190199
191-
Also, large port memories, typically register files, will still have the
200+
Large port memories, typically register files, will still have the
192201
retain a lot of the port logic that can be useful to investigate issues.
202+
203+
Consider using SYNTH_KEEP_MODULES to keep the modules of the mocked
204+
memories so that code outside the mocked memories is not
205+
optimized as a consequence of mocking a memory, yielding better insight
206+
into issues running the rest of the design through the ORFS flow.
193207
default: 0
194208
stages:
195209
- synth

0 commit comments

Comments
 (0)