Skip to content

Comments

ram: initial complete version#8124

Merged
maliberty merged 28 commits intoThe-OpenROAD-Project:masterfrom
braydenlouie:grid-branch
Oct 2, 2025
Merged

ram: initial complete version#8124
maliberty merged 28 commits intoThe-OpenROAD-Project:masterfrom
braydenlouie:grid-branch

Conversation

@braydenlouie
Copy link
Contributor

  • Added hierarchical decoder
    • Tested on x2, x4, and x8 sizes
  • Added input buffers
  • Created dbRows
  • Added fill cells through Tcl
  • Started work on adding bPins
  • Verified through RTL simulation that behavioral logic is correct

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@rovinski rovinski self-requested a review August 26, 2025 21:10
@oharboe
Copy link
Collaborator

oharboe commented Aug 27, 2025

what is the behavioral model of these SRAMs.?

add a cmd to write out the behavioral model? (faster simulation, equivalence checks)

How are properties modified: ports, synchronius read, common/shared read/write clock, are there any aspect ratio choices, pin placment options?

@oharboe
Copy link
Collaborator

oharboe commented Aug 27, 2025

Given that there are an enormous number of possible condifigurations and that openroad changes all the time, a one-off equivalence check is not enough: it there needs to be a way for users to add equivalence checks to a flow or when they generate the srams.

@rovinski
Copy link
Collaborator

This is an undergraduate project. "Complete" in the title is only meant that it is an actual working RAM with buffers and decoders, rather than just the memory array.

what is the behavioral model of these SRAMs.?

Same as https://github.com/AUCOHL/DFFRAM

add a cmd to write out the behavioral model?

We can add that to the TODO list

How are properties modified: ports

Not sure what this means. The ports are generated based on the generator options

synchronius read

The address port combinationally determines the output data, if that's what you're asking.

common/shared read/write clock

Read and write are on the same clock. We don't have plans to add separate clocks, but it could be put on the TODO list as a very low priority item. Timing is checked by STA so defining the constraints will not be straightforward.

are there any aspect ratio choices

No, there is no mux support yet.

pin placment options?

D and Q will be at the top edge, addr will be on the right edge (in the default orientation). We have plans to allow other configurations, but that is medium-to-low priority. The main configuration we would look to support is all pins on the same edge.

a one-off equivalence check is not enough

It was checked for all word sizes which are supported in this commit (x2, x4, x8). Formal equivalence checking is low priority.

@maliberty
Copy link
Member

Note that this is a DFF ram built out of standard cells so you could also simulate it at gate level.

@oharboe
Copy link
Collaborator

oharboe commented Aug 29, 2025

Note that this is a DFF ram built out of standard cells so you could also simulate it at gate level.

That would make sense for power simulations, maybe, but it would be awfully slow compared to a behavioral model.

Pre-synthesis RTL for a pipelined multiplier vs netlist simulation after retiming comes to mind. There is a difference in simulation speed of orders of magnitude, not considering how long Verilator takes to build netlist.

Also, the behavioral model is used for eqy(or similar) to validate that the behavioral model matches the netlist.

@rovinski
Copy link
Collaborator

I agree that a behavioral model is useful and I will put it on the TODO list. To give you an idea of scope, we currently have about 30 items on the TODO list for @braydenlouie to work on for as long as he is interested in the project.

@oharboe
Copy link
Collaborator

oharboe commented Aug 29, 2025

Lots of niggling little details, or it would be done already...

Copy link
Collaborator

@rovinski rovinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@braydenlouie first set of changes:

  1. You need to fix the DCO. Click on the failing "DCO" test and run the commands it says to.
  2. You need to fix the merge conflicts. If you are comfortable doing it on your own, then you can do so. Otherwise I can show you how at the next meeting.

After that, you should be able to mark the PR as ready so that the CI can run.

Comment on lines +1 to +32
###########################################################################
## BSD 3-Clause License
##
## Copyright (c) 2023, Precision Innovations Inc.
## All rights reserved.
##
## Redistribution and use in source and binary forms, with or without
## modification, are permitted provided that the following conditions are met:
##
## * Redistributions of source code must retain the above copyright notice, this
## list of conditions and the following disclaimer.
##
## * Redistributions in binary form must reproduce the above copyright notice,
## this list of conditions and the following disclaimer in the documentation
## and/or other materials provided with the distribution.
##
## * Neither the name of the copyright holder nor the names of its
## contributors may be used to endorse or promote products derived from
## this software without specific prior written permission.
##
## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
## AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
## ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
## LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
## CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
## SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
## INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
## CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
## ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
## POSSIBILITY OF SUCH DAMAGE.
###########################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All copyright headers should be updated to the SPDX format, as seen in other files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@braydenlouie this copyright still needs to be adjusted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file likely has to be updated using make ok

odb::dbMaster* and2_cell_;
odb::dbMaster* clock_gate_cell_;
odb::dbMaster* buffer_cell_;
odb::dbMaster* filler_cell_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filler_cell_ doesn't seem to be used. Even when it is used, it should probably be a std::vector<odb::dbMaster*>. Remove for now.

block_ = odb::dbBlock::create(chip, ram_name.c_str());
}

// 9 tracks for 8 bits per word plus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"track" has a specific meaning which I'm not sure applies here. Perhaps you mean columns?

}

// input bterms
int numImputs = std::log2(word_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::log2 returns a float, which is then truncated when assigning to an int. e.g. std::log2(5) will return ~2.3, which gets truncated to 2. You need to round up the number of bits required.

Suggested change
int numImputs = std::log2(word_count);
int numImputs = std::ceil(std::log2(word_count));

Comment on lines 100 to 115
dbBTerm* RamGen::makeBTerm(const std::string& name)
{
auto net = dbNet::create(block_, name.c_str());
auto bTerm = dbBTerm::create(net, name.c_str());

return bTerm;
}

dbBTerm* RamGen::makeOutputBTerm(const std::string& name)
{
auto net = dbNet::create(block_, name.c_str());
auto bTerm = dbBTerm::create(net, name.c_str());

bTerm->setIoType(odb::dbIoType::OUTPUT);
return bTerm;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes more sense to have just one makeBTerm function and then pass in the IoType as an argument. Less code that way.

Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
// if readports == 1, only have Q outputs
if (read_ports == 1) {
array<dbBTerm*, 8> d;
array<dbBTerm*, 8> q;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the static 8 come from? Is it because we assume only byte-level masking? This is fine for now, but I don't think there's a reason why we couldn't support arbitrary levels of masking (or at least something reasonable like powers of 2).

For example, it may be desirable to have word-level masking instead of byte-level masking if byte-level masking isn't needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the static 8 is assuming byte-level masking because I've only been testing one-word DFFRAMs, but this can be changed moving forward.

@@ -480,27 +444,27 @@ void RamGen::generate(const int bytes_per_word,
array<dbBTerm*, 8> D; // array for b-term for external inputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a more descriptive name, like d_bterms.
@maliberty what is the convention for capitalization in snake case? Should it be d_bterms or can it be D_bterms?

// if readports == 1, only have Q outputs
if (read_ports == 1) {
array<dbBTerm*, 8> d;
array<dbBTerm*, 8> q;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I would call this q_bterms

D_nets[bit] = makeNet(fmt::format("D_nets[{}]", bit + col * 8), "net");
}

vector<array<dbBTerm*, 8>> Q;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name on this is tough because it represents Q bterms of multiple ports. Maybe instead name it read_port_bterms?

@rovinski
Copy link
Collaborator

@maliberty very close to ready to review here

Signed-off-by: braydenl9988 <braydenl9988@gmail.com>
Signed-off-by: Brayden Louie <braydenl9988@gmail.com>
@braydenlouie braydenlouie marked this pull request as ready for review September 19, 2025 19:11
@rovinski
Copy link
Collaborator

@maliberty @vvbandeira ready for review

Comment on lines +1 to +32
###########################################################################
## BSD 3-Clause License
##
## Copyright (c) 2023, Precision Innovations Inc.
## All rights reserved.
##
## Redistribution and use in source and binary forms, with or without
## modification, are permitted provided that the following conditions are met:
##
## * Redistributions of source code must retain the above copyright notice, this
## list of conditions and the following disclaimer.
##
## * Redistributions in binary form must reproduce the above copyright notice,
## this list of conditions and the following disclaimer in the documentation
## and/or other materials provided with the distribution.
##
## * Neither the name of the copyright holder nor the names of its
## contributors may be used to endorse or promote products derived from
## this software without specific prior written permission.
##
## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
## AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
## ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
## LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
## CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
## SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
## INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
## CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
## ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
## POSSIBILITY OF SUCH DAMAGE.
###########################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@braydenlouie this copyright still needs to be adjusted

delete ram_gen;
}

void initRamGen(OpenRoad* openroad)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs to be adjusted so that it matches the declaration after the merge. You can look at any other tool to see how to use the new style of initialization.

This is what's currently causing the build error.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 31. Check the log or trigger a new build to see more.

src/OpenRoad.cc Outdated
resizer_ = rsz::makeResizer();
opendp_ = dpl::makeOpendp();
finale_ = fin::makeFinale();
ram_gen_ = ram::makeRamGen();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no member named 'makeRamGen' in namespace 'ram'; did you mean simply 'makeRamGen'? [clang-diagnostic-error]

Suggested change
ram_gen_ = ram::makeRamGen();
ram_gen_ = makeRamGen();
Additional context

src/ram/include/ram/MakeRam.h:13: 'makeRamGen' declared here

ram::RamGen* makeRamGen();
             ^

src/OpenRoad.cc Outdated
replace_, db_, sta_, resizer_, global_router_, logger_, tcl_interp);
initOpendp(opendp_, db_, logger_, tcl_interp);
initFinale(finale_, db_, logger_, tcl_interp);
initRamGen(ram_gen_, db_, logger_, tcl_interp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no matching function for call to 'initRamGen' [clang-diagnostic-error]

  initRamGen(ram_gen_, db_, logger_, tcl_interp);
  ^
Additional context

src/ram/include/ram/MakeRam.h:14: candidate function not viable: requires single argument 'openroad', but 4 arguments were provided

void initRamGen(OpenRoad* openroad);
     ^

RamGen();

void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger);
void generate(const int bytes_per_word,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 'bytes_per_word' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
void generate(const int bytes_per_word,
void generate(int bytes_per_word,


void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger);
void generate(const int bytes_per_word,
const int word_count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 'word_count' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
const int word_count,
int word_count,

void init(odb::dbDatabase* db, sta::dbNetwork* network, Logger* logger);
void generate(const int bytes_per_word,
const int word_count,
const int read_ports,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 'read_ports' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
const int read_ports,
int read_ports,

return cell_height;
}

const int Grid::getWidth()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]

src/ram/src/layout.h:87:

-   const int getWidth();
+   int getWidth();
Suggested change
const int Grid::getWidth()
int Grid::getWidth()

return cell_width;
}

const int Grid::numLayouts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]

src/ram/src/layout.h:89:

-   const int numLayouts();
+   int numLayouts();
Suggested change
const int Grid::numLayouts()
int Grid::numLayouts()

return layouts_.size();
}

const int Grid::getRowWidth()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]

src/ram/src/layout.h:91:

-   const int getRowWidth();
+   int getRowWidth();
Suggested change
const int Grid::getRowWidth()
int Grid::getRowWidth()

{
if (!inv_cell_) {
inv_cell_ = findMaster(
[this](sta::LibertyPort* port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: lambda capture 'this' is not used [clang-diagnostic-unused-lambda-capture]

Suggested change
[this](sta::LibertyPort* port) {
[](sta::LibertyPort* port) {


if (!tristate_cell_) {
tristate_cell_ = findMaster(
[this](sta::LibertyPort* port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: lambda capture 'this' is not used [clang-diagnostic-unused-lambda-capture]

Suggested change
[this](sta::LibertyPort* port) {
[](sta::LibertyPort* port) {

@maliberty
Copy link
Member

Please clean up the various action failures before review.

@vvbandeira
Copy link
Member

@braydenlouie CI is failing due to compiler warnings; we consider them as errors in CI builds.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@maliberty maliberty enabled auto-merge October 1, 2025 20:41
@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

clang-tidy review says "All clean, LGTM! 👍"

@maliberty maliberty disabled auto-merge October 2, 2025 04:29
@maliberty maliberty merged commit 346855b into The-OpenROAD-Project:master Oct 2, 2025
11 of 13 checks passed
@oharboe
Copy link
Collaborator

oharboe commented Oct 2, 2025

How about an A/B comparison of synthesized RAM from behavioral Verilog to this implementation?

@oharboe
Copy link
Collaborator

oharboe commented Oct 2, 2025

@hzeller
Copy link
Collaborator

hzeller commented Oct 2, 2025

Is there a ram-py.i file missing as building with bazel results in the following issue:

bazel build src/ram/...
ERROR: ~/OpenROAD/src/ram/BUILD:68:15: Action src/ram/swig-py.cc failed: missing input file '//src/ram:src/ram-py.i'
ERROR: ~/OpenROAD/src/ram/BUILD:68:15: Action src/ram/swig-py.cc failed: 1 input file(s) do not exist
Use --verbose_failures to see the command lines of failed build steps.
ERROR: ~/OpenROAD/src/ram/BUILD:68:15 Action src/ram/swig-py.cc failed: 1 input file(s) do not exist

@rovinski
Copy link
Collaborator

rovinski commented Oct 2, 2025

@oharboe several steps left to make this usable. First is power grid, pin placement, routing, and abstracting. Technically we also need to do tapcell insertion, but that won't affect PPA too much.

You are welcome to test drive it yourself and run some experiments once it's ready.

@oharboe
Copy link
Collaborator

oharboe commented Oct 2, 2025

I'm mainly interested in the time/area compared to flip flop.

I thought the idea of this RAM implementation was to improve PPA in some way. Stands to reason that you'd want to measure that early to validate the advantages of this approach.

Currently I'm mainly interested in register files(port dominated, so implemented via flip flops) and I'd need behavior verilog for simulation before this work becomes relevant for RAMs.

@rovinski
Copy link
Collaborator

rovinski commented Oct 2, 2025

I thought the idea of this RAM implementation was to improve PPA in some way. Stands to reason that you'd want to measure that early to validate the advantages of this approach.

We are 100% confident that structured placement and routing is better than synthesized. It is well established and also the experiments have already been run a while ago, see: https://github.com/AUCOHL/DFFRAM?tab=readme-ov-file#comparisons

@oharboe
Copy link
Collaborator

oharboe commented Oct 3, 2025

I thought the idea of this RAM implementation was to improve PPA in some way. Stands to reason that you'd want to measure that early to validate the advantages of this approach.

We are 100% confident that structured placement and routing is better than synthesized.

Well... it depends on the context and the quality of results you get from the tools in a context. The link below mentions "register files". My definition of register files is storage dominated by ports and access rather than storage bits.

A paper written on this topic would be much more gripping if it has data on when it is worth using a dedicated RAM generator compared to synthesis with a tool like OpenROAD (lower bound for quality of results).

Especially considering the downsides to RAM w.r.t not having an accurate behavioral model for simulation and the need for glue logic to match RAMs to use-case (In Chipyard they have the bartools to generate this glue logic).

It is well established and also the experiments have already been run a while ago, see: https://github.com/AUCOHL/DFFRAM?tab=readme-ov-file#comparisons

Interesting. Though a RAM never exists in isolation, it is used in some context. What matters to me is to see what use-cases that make the effort of using this new RAM feature worth it considering that OpenROAD is known to reach 80%+ placement densities and that, especially as number of ports grow, ports and a more holistic flow can get better results than isolating separate parts of the system and joining at the end.

As an example with a CPU, a register file(a 2R1W was classified as such in the link above) never exists in isolation, it comes with a bypass network(or you can't do back to back ALU operations). As number of ports grow, ports and bypass network dwarf the RAM/flip flops.

Even for storage dominated use-cases, there's the matter of speed required clock period. I already know that Yosys will go off the rails(take a very, very long time) to do synthesis for a large RAM, but I'm thinking it would be possible to do synthesis for a RAM block and get a netlist, then use retiming on the glue logic + yosys retiming to combine many such RAMs into larger RAMs. I've done similar things and this should get me a large RAM in OpenROAD through floorplan in minutes and I find that global placement is easily able to place this sort of thing.

@rovinski
Copy link
Collaborator

rovinski commented Oct 4, 2025

My definition of register files is storage dominated by ports and access rather than storage bits.

Electrically speaking, a register file is when the storage medium is through latch-based devices (i.e. latch or flip-flop). SRAMs are when the storage medium is SRAM bit cells (i.e. 6T SRAM), and DRAM is stored through dynamic capacitive devices (i.e. 1T1C cells). There are generally access limitations implied as well, for example register files can perform same-cycle accesses, but DRAMs must have pre-charge and readout phases, etc.

Architecturally speaking, a register file is anything that implements the architected set of registers. Anything more specific is a microarchitectural decision.

Although a high number of read/write ports is typical for an architected register file, it is not really categorized as part of the taxonomy. It is simply a matter of how many ports does the RAM have.

when it is worth using a dedicated RAM generator compared to synthesis with a tool like OpenROAD

The answer is always, as long as you have the right RAM generator. A bespoke RAM generator is always going to beat out a generic synthesis/placement/routing tool. RAM generators can generate non-CMOS logic which is not possible with synthesized netlists. Even non-SRAM latch-based RFs will use dynamic logic, sense amplifiers, etc. to outperform any CMOS configuration.

As to whether it outperforms this RAM generator is purely a function of how much time is spent developing it. We only plan to incorporate logic which uses std cells. Technically it is already non-synthesizable because it uses a multi-driver bus for outputs, which is generally not synthesizable by tools.

OpenROAD is known to reach 80%+ placement densities

OpenROAD does a lot of timing optimization and therefore uses more logic area to get those results. Anecdotally, I recall some experiments some years ago where we compared synthesized RFs vs generated RFs. The synthesized version used about 40% more logic area (ignoring utilization) just to struggle and fail to hit the same timing as the generated RAM.

As number of ports grow, ports and bypass network dwarf the RAM/flip flops.

This happens much more quickly in synthesized logic than it does with generated/bespoke logic. Synthesized logic will not produce multi-driver buses but instead use extremely wide muxes. This has rapidly diminishing returns.

it comes with a bypass network(or you can't do back to back ALU operations).

Well, forwarded values typically exist outside the register file for in-order cores and they exist in the reorder buffer for out-of-order cores for that reason. RAM ports are simply fundamentally expensive, so many architectures workaround the need for large numbers of accesses within the same cycle by storing them in explicit structures. And yes, high-performance ROBs are hard to physically implement - I don't have any advice on that.

Even for storage dominated use-cases

RAM generators will do better than this, 100% guaranteed. If it doesn't, then it's a problem with the RAM generator.

@rovinski
Copy link
Collaborator

rovinski commented Oct 4, 2025

If we implement enough of the planned features for this RAM generator, it will give you better PPA than any synthesized RAM, almost guaranteed (only exception is if library std cells suck). We are focusing on basic and common use cases right now, which are 1rw RAMs and 1r1w RAMs. With good fortune we can explore higher-port RAMs, but it makes no sense to explore the complicated cases before we've polished the simple ones.

@oharboe
Copy link
Collaborator

oharboe commented Oct 4, 2025

@rovinski Thank you for this excellent account. Very insightful...

For out of order CPUs, the port count goes beyond 3R2W... I asked ChatGPT to create a scaling table... :-)

Metric Synthesized RF Generated RAM
Area O(n²) O(n)
Delay O(n) O(1)–O(n)
Energy O(n²) O(n)
Typical crossover point ~1R1W → 2R1W starts to explode Scales up to 3R2W gracefully

@oharboe
Copy link
Collaborator

oharboe commented Oct 4, 2025

@rovinski Can this be made to work on asap7?

Probably a silly question... I did look for some documentation (.md files), but didn't see any.

@rovinski
Copy link
Collaborator

rovinski commented Oct 4, 2025

@rovinski Can this be made to work on asap7?

I believe so, yes. The generator only requires these cells to work: 1) a flip-flop, 2) an AND gate, 3) a buffer and inverter, 4) a tristate driver, 5) a clock gate. I believe ASAP7 has all of those.

Probably a silly question... I did look for some documentation (.md files), but didn't see any.

When we have something that's ready for an end-user, we'll try to add some.

I asked ChatGPT to create a scaling table...

Not sure where it's getting those complexities, but I wouldn't put too much stake in that, especially for the synthesized ones.

@maliberty
Copy link
Member

For the open PDKs we don't have open RAM generators (openram hasn't seen wide adoption and some academic PDKs don't even have the building blocks). This beats the alternative which is general PnR not a RAM generator.

@braydenlouie braydenlouie deleted the grid-branch branch November 29, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants