Cts: create new balance latencies method#7823
Cts: create new balance latencies method#7823maliberty merged 77 commits intoThe-OpenROAD-Project:masterfrom
Conversation
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…avg arrival time Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…e the lantancies Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…tree to insert delay buffers Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…balance_latencies_class
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…avg arrival time Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…e the lantancies Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
…tree to insert delay buffers Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
Signed-off-by: arthurjolo <arthurjolo@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
Is there a metrics update PR? |
| gated_clock1 | ||
| gated_clock2 | ||
| gated_clock3 | ||
| gated_clock4 | ||
| gated_clock5 |
src/cts/src/TritonCTS.cpp
Outdated
| techChar_->getLengthUnit(), | ||
| capPerDBU); | ||
| totalDelayBuff += balancer.run(); | ||
| parasitics_guard.update(); |
There was a problem hiding this comment.
Move the guard to just inside the if-statement and the end of the scope will handle the update. The goal of the guard is to avoid needing to manually call update and to be exception safe.
src/cts/src/LatencyBalancer.cpp
Outdated
| std::string newNetName | ||
| = "delaynet_" + std::to_string(delayBufIndex_) + "_" + clockName; | ||
| odb::dbNet* newNet = odb::dbNet::create(block, newNetName.c_str()); | ||
| newNet->setSigType(odb::dbSigType::CLOCK); | ||
|
|
||
| // create a new delay buffer | ||
| std::string newBufName | ||
| = "delaybuf_" + std::to_string(delayBufIndex_++) + "_" + clockName; | ||
| odb::dbMaster* master = db_->findMaster(options_->getRootBuffer().c_str()); | ||
| odb::dbInst* newBuf = odb::dbInst::create(block, master, newBufName.c_str()); |
There was a problem hiding this comment.
This code should be the same as what we already have in CTS for inserting delay buffers (TritonCTS::insertDelayBuffer), I see that this code has changed, probably to deal with hierarchy. So probably what we have on this PR should be changed as well.
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
|
||
| // Create a new delay buffer and connect output pin of driver to input pin of | ||
| // new buffer. Output pin of new buffer will be connected later. | ||
| odb::dbInst* LatencyBalancer::createDelayBuffer(odb::dbNet* driverNet, |
There was a problem hiding this comment.
@jhkim-pii can you give your review on this function to insure hierarchy is fine.
I tried to maintain the idea you used on TritonCTS::insertDelayBuffer, so the delay buffers are on the same module as their drivers, this way we avoid creating any extra ports on the modules for the worst cases.
There was a problem hiding this comment.
It looks good. As you mentioned, this makes a flat connection b/w the driver and the new buffer. So it does not need any hierarchy handling.
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
|
||
| // If it is not a leaf node compute the amount of buffers needed for its | ||
| // children | ||
| std::vector<odb::dbITerm*> sinksInput; |
There was a problem hiding this comment.
In a hierarchical flow, sinksInput can be located at different hierarchies, right?
Then, a hierarchy handling is required when the sink pin is disconnected and reconnected to the output pin of the new buffer because the hierarchy of the driver and the sink instance are different.
A simple solution is to use dbNetwork::hierarchicalConnect(dbITerm* source_pin, dbITerm* dest_pin, const char* connection_name) for the connection b/w the buffer output and the sink. hierarchicalConnect() automatically handles the hierarchy.
But its drawback is that it does not reuse the existing hierarchical ports and punches many new hierarchical ports.
There was a problem hiding this comment.
@arthurjolo Could you please add a test case with hierarchical flow? I think we can find issues from the hierarchical test case.
There was a problem hiding this comment.
There is a testcase hier_insertion_delay but maybe it is too simple.
There was a problem hiding this comment.
Is it necessary to do anything else at the disconnect? If i disconnect the pins and create a new hierarchical port than the previous connection port is dangling right?
There was a problem hiding this comment.
Yes. hierarchicalConnect() will create a new port and make the previous dangling. It is the drawback of the current hierarchicalConnect().
I will enhance hierarchicalConnect() to fully reuse the existing nets and ports to minimize the port punching.
There was a problem hiding this comment.
I modified testcase gated_clock4 to have a simple hierarchy and added hierarchicalConnect() for the delay buffers and its sinks. On the verilog output of gated_clock4 it looks like there the dangling ports are deleted but @jhkim-pii if you could take a look to confirm it is ok.
There was a problem hiding this comment.
Yes. It is the expected behavior. hierarchicalConnect() removes unused ports and pins in the end. I forgot to mention it yesterday.
void dbEditHierarchy::hierarchicalConnect(dbITerm* source_pin,
dbITerm* dest_pin,
const char* connection_name)
...
// 3.6. During the addition of new ports and new wiring we may
// leave orphaned pins, clean them up.
std::set<dbModInst*> cleaned_up;
for (auto module_to_clean_up : source_parent_tree) {
dbModInst* mi = module_to_clean_up->getModInst();
if (mi) {
dlogHierConnCleaningUpSrc(mi);
mi->removeUnusedPortsAndPins();
cleaned_up.insert(mi);
}
}
for (auto module_to_clean_up : dest_parent_tree) {
dbModInst* mi = module_to_clean_up->getModInst();
if (mi) {
if (cleaned_up.find(mi) == cleaned_up.end()) {
dlogHierConnCleaningUpDst(mi);
mi->removeUnusedPortsAndPins();
cleaned_up.insert(mi);
}
}
}
}
dlogHierConnDone();
}
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
Signed-off-by: arthurjolo <arthurjl@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
Started a new CI to update the metrics. I was having an error in DPL after grt for one design, but it looks like it was fixed. |
|
If the metrics are fine we can merge |
|
Metrics are fine, there is one PR to update some small failures. However, I have one timeout in the private CI. |
Needs #7821
This PR creates a new clock latencies balancer. Currently CTS only balance latencies between macros and registers trees, but when we have multiple sub-trees (eg.: multiple clock gaters) the sub-trees are not balanced between them eg.:
unity test gated_clock3 with master

This PR proposes a new balance latencies method, that takes into account all sub-trees connected to a same clock source, the LatenciesBalancer. This class builds a graph that connects all sub-trees of the clock and travels throw it to insert delay buffers where its needed. The delay buffers are inserted as high up in the tree as possible, and try to share the mod amount of buffers as possible as well. A more detailed explanation of the algorithm can be found here.
This changes help improve skew specially for designs with multiple clock gaters, where the problem mention tends to appear, eg.:
unity test gated_clock3 with this PR
