Refactoring of the Resizer Module #6884

mguthaus · 2025-03-20T16:59:02Z

mguthaus
Mar 20, 2025

I'd like to suggest (and implement) a refactored architecture for the resizer
code. The proposed architecture will add significant flexibility by
easing the addition optimization moves and application of the moves in different
orders and at varying levels of perturbation (i.e. post-place, post-grt,
post-cts, post-drt).

baseMove: A baseline "move" class that implements a transform. At a minimum, a move will
contain methods to:

Apply the move
Journal/Undo a move (using beginEco/endEco by default, but special cases may exist for performance)
Evaluate the move (may be local evaluation rather than full incremental STA)
Legalize the move (may be nothing and default to legalization in subsequent phase)
For example, we would have moves: resizeMove, bufferMove, cloneMove,
pinSwapMove, etc. Using a base class will easily allow extensibility and structure for
additional moves to be added, such as relocateMove which could just move a
gate. An ecoMove is also a good option to do legalization as some moves may
have reduced consideration for legalization.

moveSequence: A sequence of moves class that implements a sequence of moves, to be
applied in order to a gate/net. Currently, there is a single sequence that is hard coded but
with some options to disable steps of the sequence (e.g. skip_pin_swap,
skip_gate_cloning, etc.). For example, an early post-grt sequence could be:
[cloneMove, bufferMove, ecoMove, resizeMove]
where an early ECO may be desirable.
A later post-drt sequence could be:
[swapMove, resizeMove, ecoMove]
It is also possible just to leave out any legalization and do that after the
resizer is done in another step of the flow.

The standard move "recipes" will be typedefined but ultimately these sequence
could also be user-specified. Such capabilities would allow us to experiment
and dial-in which sequences are best through tuning and regression. It is also
realistic that some users may want to customize the resizer in arbitrary ways
for their specific design needs.

QoR: A "QoR" class would implement what measure of result will be evaluated. At a
minimum, this would be one of: WNS or TNS (for setup or hold), Area recovery,
DRV (max slew, max cap, fanout, length) but could also be a combination of the
above such as WNS with an area limit or area recovery with a positive slack
margin limit. For hold, for example, this could have
options to fix hold with or without allowing additional setup violations. By
simply changing the QoR class, both repair_timing and repair_design can be
implemented. This class may return results from a single corner of a
multi-corner STA.

moveCandidates: This class would perform selection and ordering of the
specific gates/nets to optimize. Depending on the QoR metric, you may want to
identify candidates based on DRV violation, WNS, TNS, positive slack (for area
recover), etc. There may be options for guard banding (e.g., setup_slack_margin
and hold_slack_limit_ratio). This class will also have options whether to do
path-based candidates or breadth-first based candidates. Different orders get
different results depending on the number of near-critical paths, design
structure, etc. This class may focus on candidates for selected in either a single
corner or multiple corners during timing closure. This could differ from the
QoR multi-corner settings. For example, you may want to focus on a single slow
corner but measure QoR on all the corners.

optimize (looking for better name): This class does the multiple passes and iterations
over the candidate lists. The arguments are a pass sequence, QoR metric,
candidate selection class as well as options for the number of iterations,
passes, and whether negative QoR is allowed to get out of minima. This would be
very similar to that used in the current repairSetup. Ultimately, this may be
refactored further to a single pass and multiple passes.

The current repair_timing and repair_design can easily be refactored to the
above without significantly modifying the functionality of the existing code at
first. The initial, default sequence would be taken from the existing
repairSetup.cc, repairHold.cc, and repairDesign.cc:

Setup:

remove driver (step 1)
upsize driver (step 2)
rebuffer (step 3)
swap pin (step 4)
split loads (step 5)
Hold:
insert delay
Design (DRV):
gain based buffering
make region repeaters
etc.

I would start by implementing the repair_timing setup and hold optimizations
only in the refactor while leaving repair_design largely alone. The same
arguments to repair_timing would update the moveSequence for the same
functionality. Once the main classes are functional, we can then augment to
include repair_design.

@maliberty

maliberty · 2025-03-20T18:59:35Z

maliberty
Mar 20, 2025
Maintainer

@precisionmoon @povik @QuantamHD

6 replies

precisionmoon Mar 20, 2025
Collaborator

@mguthaus, thanks much for putting a proposal together. Yes, OR will definitely benefit from such refactoring. A few suggestions:

change "move" to "transform" or "xform". In the future, we may need a cell movement transform to fix IR drop violations.
"optimize" can change to "apply" / "commit"
all transforms need to satisfy some template for its application, evaluation and undo. Very nice!
QoR evaluation involving TNS may be a overkill as TNS needs a complete timing update. We can define a local "sand box" (1 level fanin + cell under eval + 2 levels of fanout?) and define pass/fail metrics based on this structure.
it'd be nice to go beyond this fixed sequence of transforms such that we can try multiple transforms in parallel and pick the best one. But this will be a stretch. Also, we want to be able to stop the sequence in the middle. For example, for hold fixing, if cell sizing fixed the violation, we don't need buffering.
I'm not sure if we want to include legalization as a transform. This can create non-local logical changes are that hard to track. It's probably best to do legalization as a one batch step after optimization.
some nice-to-have transforms include
VT swap (for timing and for power)
multi-cell transforms (replace buffer pair with inverter pair)
negative transform (this damages QoR but opens up opportunities for others) For example, for power optimization, a low switching cell can be upsized to create positive slack that can be consumed by high switching cells.

mguthaus Mar 20, 2025
Author

@povik:

repair_timing vs repair_design: It's not just the objective, but also the QoR selected and the xforms used and their sequence. The application of the iterations/passes is essentially the same in all of these methods. You can definitely do early termination. I get what you are asking though -- the evaluation code in each xForm has to correspond to the QoR used.

repair_power could also use the same infrastructure, yes.

In terms of smaller steps, I suspect:

Create the xform classes and use that in the existing code.
Create the sequence/apply routines and replace we have for repair_timing -setup
Create a very simple QoR class for WNS, TNS and update the above using it
Then do the same for repair_timing -hold
Then we can discuss repair_design again.
I'd say that all of the above would be without legalization for now since we'd proably be focusing on post-grt and post-cts for the most bang for the buck.

@precisionmoon:

I agree, xform is probably a better name. I was thinking "move" in terms of the search space, not moving a cell but xform is more general.
TNS can be very useful early on or when doing multi-corner later on. Some benchmarks (e.g. highly datapath centric) will have many nearly equal paths, so only looking at WNS will just iterate between paths becoming more critical. My resarch at IBM did work on statistical gate sizing and it, like TNS, gave you a view into the "depth" of the criticality which is not possible via WNS. It is sometimes better to make a move that makes a small WNS improvement but fixes many paths at once.

The sand box approach is definitely a good option and can be implemented in an xform's evaluate function since this changes depending on the xform. I did this with gate sizing previously.

I hadn't seen this option anywhere before, but it seem slike a reasonable extension. It'd require some benchmarking to determine if it is worth it vs just doing several different sequences.
This actually becomes very critical as you get more detailed in the flow. If you resize, for example, but it doesn't fit, you may actually make the design worse after legalization. This can result in a lot of churn and needless moves.
Yes to all of them! There are many many options here.

povik Mar 20, 2025
Collaborator

Not to bikeshed but I prefer "move" given that it'll be all over the code -- having a monosyllabic everyday word can be nice. We can have "relocate move" or similar once we need a move which represents physical cell movement.

maliberty Mar 20, 2025
Maintainer

I think terminology is very important to get right. Move may connotate spatial translation rather than optimization. Perhaps step, tweak, tactic, action, or edit? Personally I like step as it fits into sequence naturally.

mguthaus Mar 20, 2025
Author

I'd probably stick with move. I do like sequence. The QoR could be evaluator, maybe. I'm still not sure about the optimize/pass/iterate/transform routine.

maliberty · 2025-03-20T22:10:46Z

maliberty
Mar 20, 2025
Maintainer

I like the proposal. I think a good next step could be to make a PR that just defines the new interfaces in C++ so that we can agree on them in more detail. That could be a simple PR to start.

0 replies

maliberty · 2025-03-20T22:10:52Z

maliberty
Mar 20, 2025
Maintainer

@gadfort FYI

0 replies

maliberty · 2025-03-20T22:19:13Z

maliberty
Mar 20, 2025
Maintainer

It might be useful to look at dpo. It has a set of transforms and a mini-language to sequence them. It also defines objective functions (evaluators). We aren't taking full advantage and just using a fixed recipe but the architecture has a somewhat similar feel. I wonder if we can generalize this mechanism enough to make it not rsz specific. I think iEDA has done a good job with separating out a library of evaluation classes.

I think you should also consider that placement legalization is not part of rsz. If you want it to be undo-able you will need to rely on odb's journaling as the mechanism.

I did some work on parallel transforms in the past (pick one of N trials). The db support for that gets very complex quickly and the bugs truly horrendous when it fails. I suggest we punt that for quite a while.

2 replies

mguthaus Mar 20, 2025
Author

Did a quick glance at dpo and that is similar, yes. I'll take more of a look.

The journaling and legalization are technically two separate things. I agree we should reuse things where possible.

A common legalization technique for single moves during detailed optimization is a "spiral" to to find the closest free space from the ideal location where something fits (and is legal). It's a single gate legalization that is pretty quick and doesn't disturb other gates. This can also be applied for global optimization, but usually it is less needed because the space is a bit softer of a constraint after global placement.

The journaling would just need to remember that we put the cell there, reroute (possibly), and then move it backto the original location if we reject the move. If we're doing it post-drt, this is definitely something we don't wan to take care of (in the rsz by itself)!

maliberty Mar 20, 2025
Maintainer

dpl does that already (called the diamond search). It does have strategies if the spiral fails but those could be disabled during rsz calls. We are already using it in post-grt opt in a more basic form (push things off macros/blockges as rsz isn't smart about them).

Refactoring of the Resizer Module #6884

Uh oh!

mguthaus Mar 20, 2025

Replies: 4 comments · 8 replies

Uh oh!

maliberty Mar 20, 2025 Maintainer

Uh oh!

precisionmoon Mar 20, 2025 Collaborator

Uh oh!

mguthaus Mar 20, 2025 Author

Uh oh!

povik Mar 20, 2025 Collaborator

Uh oh!

maliberty Mar 20, 2025 Maintainer

Uh oh!

mguthaus Mar 20, 2025 Author

Uh oh!

maliberty Mar 20, 2025 Maintainer

Uh oh!

maliberty Mar 20, 2025 Maintainer

Uh oh!

maliberty Mar 20, 2025 Maintainer

Uh oh!

Uh oh!

mguthaus Mar 20, 2025 Author

Uh oh!

maliberty Mar 20, 2025 Maintainer

mguthaus
Mar 20, 2025

Replies: 4 comments 8 replies

maliberty
Mar 20, 2025
Maintainer

precisionmoon Mar 20, 2025
Collaborator

mguthaus Mar 20, 2025
Author

povik Mar 20, 2025
Collaborator

maliberty Mar 20, 2025
Maintainer

mguthaus Mar 20, 2025
Author

maliberty
Mar 20, 2025
Maintainer

maliberty
Mar 20, 2025
Maintainer

maliberty
Mar 20, 2025
Maintainer

mguthaus Mar 20, 2025
Author

maliberty Mar 20, 2025
Maintainer