ram: initial complete version#8124
Conversation
braydenlouie
commented
Aug 26, 2025
- Added hierarchical decoder
- Tested on x2, x4, and x8 sizes
- Added input buffers
- Created dbRows
- Added fill cells through Tcl
- Started work on adding bPins
- Verified through RTL simulation that behavioral logic is correct
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
|
what is the behavioral model of these SRAMs.? add a cmd to write out the behavioral model? (faster simulation, equivalence checks) How are properties modified: ports, synchronius read, common/shared read/write clock, are there any aspect ratio choices, pin placment options? |
|
Given that there are an enormous number of possible condifigurations and that openroad changes all the time, a one-off equivalence check is not enough: it there needs to be a way for users to add equivalence checks to a flow or when they generate the srams. |
|
This is an undergraduate project. "Complete" in the title is only meant that it is an actual working RAM with buffers and decoders, rather than just the memory array.
Same as https://github.com/AUCOHL/DFFRAM
We can add that to the TODO list
Not sure what this means. The ports are generated based on the generator options
The address port combinationally determines the output data, if that's what you're asking.
Read and write are on the same clock. We don't have plans to add separate clocks, but it could be put on the TODO list as a very low priority item. Timing is checked by STA so defining the constraints will not be straightforward.
No, there is no mux support yet.
It was checked for all word sizes which are supported in this commit (x2, x4, x8). Formal equivalence checking is low priority. |
|
Note that this is a DFF ram built out of standard cells so you could also simulate it at gate level. |
That would make sense for power simulations, maybe, but it would be awfully slow compared to a behavioral model. Pre-synthesis RTL for a pipelined multiplier vs netlist simulation after retiming comes to mind. There is a difference in simulation speed of orders of magnitude, not considering how long Verilator takes to build netlist. Also, the behavioral model is used for eqy(or similar) to validate that the behavioral model matches the netlist. |
|
I agree that a behavioral model is useful and I will put it on the TODO list. To give you an idea of scope, we currently have about 30 items on the TODO list for @braydenlouie to work on for as long as he is interested in the project. |
|
Lots of niggling little details, or it would be done already... |
4e31725 to
3fb1475
Compare
rovinski
left a comment
There was a problem hiding this comment.
@braydenlouie first set of changes:
- You need to fix the DCO. Click on the failing "DCO" test and run the commands it says to.
- You need to fix the merge conflicts. If you are comfortable doing it on your own, then you can do so. Otherwise I can show you how at the next meeting.
After that, you should be able to mark the PR as ready so that the CI can run.
| ########################################################################### | ||
| ## BSD 3-Clause License | ||
| ## | ||
| ## Copyright (c) 2023, Precision Innovations Inc. | ||
| ## All rights reserved. | ||
| ## | ||
| ## Redistribution and use in source and binary forms, with or without | ||
| ## modification, are permitted provided that the following conditions are met: | ||
| ## | ||
| ## * Redistributions of source code must retain the above copyright notice, this | ||
| ## list of conditions and the following disclaimer. | ||
| ## | ||
| ## * Redistributions in binary form must reproduce the above copyright notice, | ||
| ## this list of conditions and the following disclaimer in the documentation | ||
| ## and/or other materials provided with the distribution. | ||
| ## | ||
| ## * Neither the name of the copyright holder nor the names of its | ||
| ## contributors may be used to endorse or promote products derived from | ||
| ## this software without specific prior written permission. | ||
| ## | ||
| ## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
| ## AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
| ## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| ## ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | ||
| ## LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
| ## CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
| ## SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| ## INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
| ## CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
| ## ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
| ## POSSIBILITY OF SUCH DAMAGE. | ||
| ########################################################################### |
There was a problem hiding this comment.
All copyright headers should be updated to the SPDX format, as seen in other files.
There was a problem hiding this comment.
@braydenlouie this copyright still needs to be adjusted
There was a problem hiding this comment.
This file likely has to be updated using make ok
src/ram/include/ram/ram.h
Outdated
| odb::dbMaster* and2_cell_; | ||
| odb::dbMaster* clock_gate_cell_; | ||
| odb::dbMaster* buffer_cell_; | ||
| odb::dbMaster* filler_cell_; |
There was a problem hiding this comment.
filler_cell_ doesn't seem to be used. Even when it is used, it should probably be a std::vector<odb::dbMaster*>. Remove for now.
src/ram/src/ram.cpp
Outdated
| block_ = odb::dbBlock::create(chip, ram_name.c_str()); | ||
| } | ||
|
|
||
| // 9 tracks for 8 bits per word plus |
There was a problem hiding this comment.
"track" has a specific meaning which I'm not sure applies here. Perhaps you mean columns?
src/ram/src/ram.cpp
Outdated
| } | ||
|
|
||
| // input bterms | ||
| int numImputs = std::log2(word_count); |
There was a problem hiding this comment.
std::log2 returns a float, which is then truncated when assigning to an int. e.g. std::log2(5) will return ~2.3, which gets truncated to 2. You need to round up the number of bits required.
| int numImputs = std::log2(word_count); | |
| int numImputs = std::ceil(std::log2(word_count)); |
src/ram/src/ram.cpp
Outdated
| dbBTerm* RamGen::makeBTerm(const std::string& name) | ||
| { | ||
| auto net = dbNet::create(block_, name.c_str()); | ||
| auto bTerm = dbBTerm::create(net, name.c_str()); | ||
|
|
||
| return bTerm; | ||
| } | ||
|
|
||
| dbBTerm* RamGen::makeOutputBTerm(const std::string& name) | ||
| { | ||
| auto net = dbNet::create(block_, name.c_str()); | ||
| auto bTerm = dbBTerm::create(net, name.c_str()); | ||
|
|
||
| bTerm->setIoType(odb::dbIoType::OUTPUT); | ||
| return bTerm; | ||
| } |
There was a problem hiding this comment.
It probably makes more sense to have just one makeBTerm function and then pass in the IoType as an argument. Less code that way.
3fb1475 to
c350141
Compare
Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
c350141 to
a03644e
Compare
Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
src/ram/src/ram.cpp
Outdated
| // if readports == 1, only have Q outputs | ||
| if (read_ports == 1) { | ||
| array<dbBTerm*, 8> d; | ||
| array<dbBTerm*, 8> q; |
There was a problem hiding this comment.
Where does the static 8 come from? Is it because we assume only byte-level masking? This is fine for now, but I don't think there's a reason why we couldn't support arbitrary levels of masking (or at least something reasonable like powers of 2).
For example, it may be desirable to have word-level masking instead of byte-level masking if byte-level masking isn't needed.
There was a problem hiding this comment.
At the moment, the static 8 is assuming byte-level masking because I've only been testing one-word DFFRAMs, but this can be changed moving forward.
src/ram/src/ram.cpp
Outdated
| @@ -480,27 +444,27 @@ void RamGen::generate(const int bytes_per_word, | |||
| array<dbBTerm*, 8> D; // array for b-term for external inputs | |||
There was a problem hiding this comment.
I would use a more descriptive name, like d_bterms.
@maliberty what is the convention for capitalization in snake case? Should it be d_bterms or can it be D_bterms?
src/ram/src/ram.cpp
Outdated
| // if readports == 1, only have Q outputs | ||
| if (read_ports == 1) { | ||
| array<dbBTerm*, 8> d; | ||
| array<dbBTerm*, 8> q; |
There was a problem hiding this comment.
Similarly, I would call this q_bterms
| D_nets[bit] = makeNet(fmt::format("D_nets[{}]", bit + col * 8), "net"); | ||
| } | ||
|
|
||
| vector<array<dbBTerm*, 8>> Q; |
There was a problem hiding this comment.
The name on this is tough because it represents Q bterms of multiple ports. Maybe instead name it read_port_bterms?
|
@maliberty very close to ready to review here |
6563d0c to
98ae3b6
Compare
Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
98ae3b6 to
598cfc9
Compare
Signed-off-by: Brayden Louie <braydenl9988@gmail.com>
|
@maliberty @vvbandeira ready for review |
| ########################################################################### | ||
| ## BSD 3-Clause License | ||
| ## | ||
| ## Copyright (c) 2023, Precision Innovations Inc. | ||
| ## All rights reserved. | ||
| ## | ||
| ## Redistribution and use in source and binary forms, with or without | ||
| ## modification, are permitted provided that the following conditions are met: | ||
| ## | ||
| ## * Redistributions of source code must retain the above copyright notice, this | ||
| ## list of conditions and the following disclaimer. | ||
| ## | ||
| ## * Redistributions in binary form must reproduce the above copyright notice, | ||
| ## this list of conditions and the following disclaimer in the documentation | ||
| ## and/or other materials provided with the distribution. | ||
| ## | ||
| ## * Neither the name of the copyright holder nor the names of its | ||
| ## contributors may be used to endorse or promote products derived from | ||
| ## this software without specific prior written permission. | ||
| ## | ||
| ## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
| ## AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
| ## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| ## ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | ||
| ## LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
| ## CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
| ## SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| ## INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
| ## CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
| ## ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
| ## POSSIBILITY OF SUCH DAMAGE. | ||
| ########################################################################### |
There was a problem hiding this comment.
@braydenlouie this copyright still needs to be adjusted
src/ram/src/MakeRam.cpp
Outdated
| delete ram_gen; | ||
| } | ||
|
|
||
| void initRamGen(OpenRoad* openroad) |
There was a problem hiding this comment.
This function needs to be adjusted so that it matches the declaration after the merge. You can look at any other tool to see how to use the new style of initialization.
This is what's currently causing the build error.
src/OpenRoad.cc
Outdated
| resizer_ = rsz::makeResizer(); | ||
| opendp_ = dpl::makeOpendp(); | ||
| finale_ = fin::makeFinale(); | ||
| ram_gen_ = ram::makeRamGen(); |
There was a problem hiding this comment.
warning: no member named 'makeRamGen' in namespace 'ram'; did you mean simply 'makeRamGen'? [clang-diagnostic-error]
| ram_gen_ = ram::makeRamGen(); | |
| ram_gen_ = makeRamGen(); |
Additional context
src/ram/include/ram/MakeRam.h:13: 'makeRamGen' declared here
ram::RamGen* makeRamGen();
^
src/OpenRoad.cc
Outdated
| replace_, db_, sta_, resizer_, global_router_, logger_, tcl_interp); | ||
| initOpendp(opendp_, db_, logger_, tcl_interp); | ||
| initFinale(finale_, db_, logger_, tcl_interp); | ||
| initRamGen(ram_gen_, db_, logger_, tcl_interp); |
There was a problem hiding this comment.
warning: no matching function for call to 'initRamGen' [clang-diagnostic-error]
initRamGen(ram_gen_, db_, logger_, tcl_interp);
^Additional context
src/ram/include/ram/MakeRam.h:14: candidate function not viable: requires single argument 'openroad', but 4 arguments were provided
void initRamGen(OpenRoad* openroad);
^
src/ram/include/ram/ram.h
Outdated
| RamGen(); | ||
|
|
||
| void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger); | ||
| void generate(const int bytes_per_word, |
There was a problem hiding this comment.
warning: parameter 'bytes_per_word' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
| void generate(const int bytes_per_word, | |
| void generate(int bytes_per_word, |
src/ram/include/ram/ram.h
Outdated
|
|
||
| void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger); | ||
| void generate(const int bytes_per_word, | ||
| const int word_count, |
There was a problem hiding this comment.
warning: parameter 'word_count' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
| const int word_count, | |
| int word_count, |
src/ram/include/ram/ram.h
Outdated
| void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger); | ||
| void generate(const int bytes_per_word, | ||
| const int word_count, | ||
| const int read_ports, |
There was a problem hiding this comment.
warning: parameter 'read_ports' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
| const int read_ports, | |
| int read_ports, |
src/ram/src/layout.cpp
Outdated
| return cell_height; | ||
| } | ||
|
|
||
| const int Grid::getWidth() |
There was a problem hiding this comment.
warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
src/ram/src/layout.h:87:
- const int getWidth();
+ int getWidth();| const int Grid::getWidth() | |
| int Grid::getWidth() |
src/ram/src/layout.cpp
Outdated
| return cell_width; | ||
| } | ||
|
|
||
| const int Grid::numLayouts() |
There was a problem hiding this comment.
warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
src/ram/src/layout.h:89:
- const int numLayouts();
+ int numLayouts();| const int Grid::numLayouts() | |
| int Grid::numLayouts() |
src/ram/src/layout.cpp
Outdated
| return layouts_.size(); | ||
| } | ||
|
|
||
| const int Grid::getRowWidth() |
There was a problem hiding this comment.
warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
src/ram/src/layout.h:91:
- const int getRowWidth();
+ int getRowWidth();| const int Grid::getRowWidth() | |
| int Grid::getRowWidth() |
src/ram/src/ram.cpp
Outdated
| { | ||
| if (!inv_cell_) { | ||
| inv_cell_ = findMaster( | ||
| [this](sta::LibertyPort* port) { |
There was a problem hiding this comment.
warning: lambda capture 'this' is not used [clang-diagnostic-unused-lambda-capture]
| [this](sta::LibertyPort* port) { | |
| [](sta::LibertyPort* port) { |
src/ram/src/ram.cpp
Outdated
|
|
||
| if (!tristate_cell_) { | ||
| tristate_cell_ = findMaster( | ||
| [this](sta::LibertyPort* port) { |
There was a problem hiding this comment.
warning: lambda capture 'this' is not used [clang-diagnostic-unused-lambda-capture]
| [this](sta::LibertyPort* port) { | |
| [](sta::LibertyPort* port) { |
|
Please clean up the various action failures before review. |
|
@braydenlouie CI is failing due to compiler warnings; we consider them as errors in CI builds. |
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
How about an A/B comparison of synthesized RAM from behavioral Verilog to this implementation? |
|
Is there a |
|
@oharboe several steps left to make this usable. First is power grid, pin placement, routing, and abstracting. Technically we also need to do tapcell insertion, but that won't affect PPA too much. You are welcome to test drive it yourself and run some experiments once it's ready. |
|
I'm mainly interested in the time/area compared to flip flop. I thought the idea of this RAM implementation was to improve PPA in some way. Stands to reason that you'd want to measure that early to validate the advantages of this approach. Currently I'm mainly interested in register files(port dominated, so implemented via flip flops) and I'd need behavior verilog for simulation before this work becomes relevant for RAMs. |
We are 100% confident that structured placement and routing is better than synthesized. It is well established and also the experiments have already been run a while ago, see: https://github.com/AUCOHL/DFFRAM?tab=readme-ov-file#comparisons |
Well... it depends on the context and the quality of results you get from the tools in a context. The link below mentions "register files". My definition of register files is storage dominated by ports and access rather than storage bits. A paper written on this topic would be much more gripping if it has data on when it is worth using a dedicated RAM generator compared to synthesis with a tool like OpenROAD (lower bound for quality of results). Especially considering the downsides to RAM w.r.t not having an accurate behavioral model for simulation and the need for glue logic to match RAMs to use-case (In Chipyard they have the bartools to generate this glue logic).
Interesting. Though a RAM never exists in isolation, it is used in some context. What matters to me is to see what use-cases that make the effort of using this new RAM feature worth it considering that OpenROAD is known to reach 80%+ placement densities and that, especially as number of ports grow, ports and a more holistic flow can get better results than isolating separate parts of the system and joining at the end. As an example with a CPU, a register file(a 2R1W was classified as such in the link above) never exists in isolation, it comes with a bypass network(or you can't do back to back ALU operations). As number of ports grow, ports and bypass network dwarf the RAM/flip flops. Even for storage dominated use-cases, there's the matter of speed required clock period. I already know that Yosys will go off the rails(take a very, very long time) to do synthesis for a large RAM, but I'm thinking it would be possible to do synthesis for a RAM block and get a netlist, then use retiming on the glue logic + yosys retiming to combine many such RAMs into larger RAMs. I've done similar things and this should get me a large RAM in OpenROAD through floorplan in minutes and I find that global placement is easily able to place this sort of thing. |
Electrically speaking, a register file is when the storage medium is through latch-based devices (i.e. latch or flip-flop). SRAMs are when the storage medium is SRAM bit cells (i.e. 6T SRAM), and DRAM is stored through dynamic capacitive devices (i.e. 1T1C cells). There are generally access limitations implied as well, for example register files can perform same-cycle accesses, but DRAMs must have pre-charge and readout phases, etc. Architecturally speaking, a register file is anything that implements the architected set of registers. Anything more specific is a microarchitectural decision. Although a high number of read/write ports is typical for an architected register file, it is not really categorized as part of the taxonomy. It is simply a matter of how many ports does the RAM have.
The answer is always, as long as you have the right RAM generator. A bespoke RAM generator is always going to beat out a generic synthesis/placement/routing tool. RAM generators can generate non-CMOS logic which is not possible with synthesized netlists. Even non-SRAM latch-based RFs will use dynamic logic, sense amplifiers, etc. to outperform any CMOS configuration. As to whether it outperforms this RAM generator is purely a function of how much time is spent developing it. We only plan to incorporate logic which uses std cells. Technically it is already non-synthesizable because it uses a multi-driver bus for outputs, which is generally not synthesizable by tools.
OpenROAD does a lot of timing optimization and therefore uses more logic area to get those results. Anecdotally, I recall some experiments some years ago where we compared synthesized RFs vs generated RFs. The synthesized version used about 40% more logic area (ignoring utilization) just to struggle and fail to hit the same timing as the generated RAM.
This happens much more quickly in synthesized logic than it does with generated/bespoke logic. Synthesized logic will not produce multi-driver buses but instead use extremely wide muxes. This has rapidly diminishing returns.
Well, forwarded values typically exist outside the register file for in-order cores and they exist in the reorder buffer for out-of-order cores for that reason. RAM ports are simply fundamentally expensive, so many architectures workaround the need for large numbers of accesses within the same cycle by storing them in explicit structures. And yes, high-performance ROBs are hard to physically implement - I don't have any advice on that.
RAM generators will do better than this, 100% guaranteed. If it doesn't, then it's a problem with the RAM generator. |
|
If we implement enough of the planned features for this RAM generator, it will give you better PPA than any synthesized RAM, almost guaranteed (only exception is if library std cells suck). We are focusing on basic and common use cases right now, which are 1rw RAMs and 1r1w RAMs. With good fortune we can explore higher-port RAMs, but it makes no sense to explore the complicated cases before we've polished the simple ones. |
|
@rovinski Thank you for this excellent account. Very insightful... For out of order CPUs, the port count goes beyond 3R2W... I asked ChatGPT to create a scaling table... :-)
|
|
@rovinski Can this be made to work on asap7? Probably a silly question... I did look for some documentation (.md files), but didn't see any. |
I believe so, yes. The generator only requires these cells to work: 1) a flip-flop, 2) an AND gate, 3) a buffer and inverter, 4) a tristate driver, 5) a clock gate. I believe ASAP7 has all of those.
When we have something that's ready for an end-user, we'll try to add some.
Not sure where it's getting those complexities, but I wouldn't put too much stake in that, especially for the synthesized ones. |
|
For the open PDKs we don't have open RAM generators (openram hasn't seen wide adoption and some academic PDKs don't even have the building blocks). This beats the alternative which is general PnR not a RAM generator. |