Replies: 14 comments 1 reply
-
|
Critique Question |
Beta Was this translation helpful? Give feedback.
-
|
Critique I thought this paper was a fun read. Denali uses a code generator backed by a SAT solver to arrive at almost mathematically optimal machine code. It does this by constructing an equivalence graph (E-graph) from a set of hand-coded axioms, then it performs a binary search to answer the question "is there a program that computes the goal term within K cycles". This methodology (at least to me) is very sensible and provably correct, under the assumption that the hand-coded axioms are sufficiently comprehensive. However, I am unconvinced about the real-world performance impact of the resulting code sequence. The paper emphasizes optimal code with respect to an abstract machine model, but does not provide benchmarks on physical hardware to demonstrate the end-to-end speedup. I'm not sure about the state of micro-architectural optimizations from when the paper was published, but I feel that concepts such as caching, data forwarding, hazards, and out-of-order execution would have a noticeable effect on wall clock execution of such programs. It might be the case that the fastest code is not the shortest in terms of abstract machine cycles. I was also excited about Denali's application in tracing JITs, but I'm not too sure if an optimization pass that takes multiple minutes is acceptable. Question
Could super-optimizers be employed by JITs to emit faster machine code? A JIT would know which code segments are executed frequently enough to benefit from a mathematically optimal solution. My main concern is that this still takes too long / too much compute (if running in a separate thread). This leads me to the next question: could super-optimizers be employed in an offline setting to enhance traditional compiler infrastructure? (for example, we could run super-optimizers on common code patterns to come up with peephole optimizations). |
Beta Was this translation helpful? Give feedback.
-
|
critique 'dead-architectures'-{expletive} paper. funny to see compaq (formerly dec) alpha and intel itanium mentioned as the cutting edge. more seriously, it's a pretty cool paper, if a bit light on evaluation (which makes sense, considering how specific the use case is). i guess the effectiveness of the 'superoptimisation' concept is kind of based on how amdahl's law-bound real programs are -- as in, do they genuinely spend 90% of their time in 1 loop, or is it possibly more distributed? and if it is more distributed, do the superoptimisations make a tangible enough difference to warrant their runtime? i also wonder if something like this has been applied to embedded systems before, considering they tend to be more deterministic (thus constraining the search space), and shaving a few cycles off of ISR handling / an RTOS context switching procedure could be huge. questions
|
Beta Was this translation helpful? Give feedback.
-
|
Critique: Questions: Forgetting about optimization for a second, what is the most practical way to incorporate formal methods into real engineering workflows, and are formally correct code generators a better path to correctness than verifying hand-written implementations after the fact? |
Beta Was this translation helpful? Give feedback.
-
|
Critique: Question: |
Beta Was this translation helpful? Give feedback.
-
CritiqueThe paper presents Denali, a superoptimizer that combines E-graph–based equivalence reasoning with SAT-based scheduling to generate near-optimal straight-line code. While the approach is elegant, it relies on a large set of hand-written axioms, and it's unclear how maintainable or scalable that approach would be for modern ISAs. The E-graph matching phase also seems like it could grow uncontrollably, and the paper gives little sense of how often heuristic cutoffs risk missing better implementations. The "correct-by-design" claim also feels fragile, since any incorrect axiom or encoding silently breaks correctness. And it's not obvious how frequently developers actually need to hand-optimize new critical assembly loops today, which makes it harder to gauge the practical impact of a tool like this. Questions
|
Beta Was this translation helpful? Give feedback.
-
CritiqueI found this paper on Denali to be really cool. It's impressive that the system can rival or even surpass human expert level coding in specific cases. However, my primary critique is the paper's limited benchmarking suite. The authors present only a few benchmarks, such as byte swap and checksum. While these are instructive for demonstrating Denali's capabilities, the small and relatively simple set of examples makes it difficult to assess the system's general performance and practicality. QuestionsHow does Denali's performance scale with the complexity of the input code? |
Beta Was this translation helpful? Give feedback.
-
CritiqueDenali introduces an optimizer that generates close to optimal machine code for small but critical code snippets, where typical compiler optimizations are unable to sufficiently optimize. Denali uses SAT solving to search for code that is mathematically closest to the optimal cycle count on modern architectures. My main concern is that Denali relies heavily on manually written axioms, which does not scale. When new architectures are inevitably developed, these axioms would likely need to be updated/rewritten, making Denali very high maintenance. Questions
|
Beta Was this translation helpful? Give feedback.
-
|
Critique: I sort of find it funny that this paper answers a question similar to the one I asked in class when we discussed function inlining: could we try all possible combinations of inlining until we found an optimal one? The problem of "superoptimizing" makes total sense — I'm also happy with the approach of automatically proving the correctness of potential reorderings, as opposed to the test-case approach from prior work. Question: Since the early 2000's we have developed WAY more powerful techniques (SMT) and systems (Lean, Z3) for automatic theorem proving and verification. Do these make a difference at all for superoptimizing with verification? How is this done these days? |
Beta Was this translation helpful? Give feedback.
-
|
Critique |
Beta Was this translation helpful? Give feedback.
-
|
Critique Question |
Beta Was this translation helpful? Give feedback.
-
|
Critique: A bit part making Denali work seems to be coming up with good axioms (both mathematical and architectural) as to make the matching algorithm tractable and actually find good results. I'll propose a similar critique to the SAT solving section of the paper. Maybe I've been blinded by hindsight as these techniques (as the people above also mention) have come to be much more popular nowadays, but it feels like finding these constraints is the hard part, and yet it seems like the authors gives no methodical way of finding them and additionally make no strong argument to why they can't do this. Really, to me this issue seems to be pretty glossed over. Question: Section 7 on additional constraints gives an example of a GMA which might require additional constraints but doesn't concretely explain these. What might the concrete constraint for the example given be? In general, how would this and possibly other constraints change based on the architecture supported Denali is targeting? |
Beta Was this translation helpful? Give feedback.
-
|
Critique Questions |
Beta Was this translation helpful? Give feedback.
-
|
Critique Questions What are some similarities/differences from modern superoptimizers and Denali? How would Denali perform if rebuilt today? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Reading: Denali: A Goal-Directed Superoptimizer
Discussion Leads: Thomas McFarland (@tf-mac ),Nipat Chenthanakij (@Mond45 ), Vesal Bakhtazad (@VesalBakhtazad )
Beta Was this translation helpful? Give feedback.
All reactions