(Question) Which jb phases in Soot are required for creating the Jimple body and full callgraph, and which are optional optimizations? #2270
Replies: 2 comments
-
|
Well, this is a tough question, which might be why there was no answer yet. I'll try to give my take on this, maybe @StevenArzt has some more insights.
Well, this depends on whether you want to have typed Jimple variables. If not, the output of the method source itself is already valid Jimple code. Given that there could be bugs in any subsequent phase, it might even be the most correct one.
I think it would be the best to take a deep look into the implementation and comments for each pass to see why it was implemented in the first place. Generally, each pass should retain the semantics of the Jimple code, so they could be applied later without any problem. Currently, the best way I know to see whether a specific configuration with disabled Soot phases is working is to actually read in a large number of applications, write them out again, run them and test whether they still work and are functionally equivalent to the original program.
Not that I know of. As I said, it's all pretty much intertwined. |
Beta Was this translation helpful? Give feedback.
-
|
Soot has evolved substantially over time. Initially, these transformations were all optional which is line with considering Soot a compiler framework. However, Soot has now turned into a program analysis framework and has taken on new input formats such as Dalvik (for Android). That means our focused shifted. The new primary goal is to create expressive Jimple code that downstream analyses can use. Commonly, downstream analyses want typed code. Depending on the input, this requires preprocessing such as vraible splitting, 0/null distinguishing, int/float distinguishing, etc. After that, you technically have valid code, but it's super ugly. Therefore, we condense the local again. Other transformations might be necessary or might be pure optimization. For example, removing dead code might simply make the code nicer. Or it may help downstream analyses avoid unnecessary effort. Or it might even be required because Dalvik doesn't require certain types of unreachable code to type-check. That means unless you remove the dead code, there might be apps for which there is no valid typing. I suggest running Soot with the default options. The method sources will run some transformers on their own, and there's usually a good reason for it. You don't need to apply the By the way, I have a strong research interest in finding an "optimal" (whatever this means) analysis configuration for a given input program and task at hand. That's not trivial, even if you just take a single binary decision that produces equivalent output. In other words, even picking the most efficient between two equivalent options is hard to estimate a-priori. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I apologize for creating an issue to clarify doubts about how soot works, but I was unable to send it through the discussion list available in the wiki link.
I'm using Soot to analyze Java bytecode, generate Jimple representations, and build call graphs using both CHA and SPARK approaches.
While exploring the
jbpack (Jimple Body creation and optimization), I came across multiple transformation phases such asjb.ls,jb.tr,jb.ule, etc. However, it's unclear which of these are strictly required to construct a valid Jimple body, and which are optional optimizations that can be applied later.Here’s the full list I’m reviewing:
🔗 Soot Phase Options Documentation (4.5.0)
My questions:
Which of these
jbphases are essential for building a correct and complete Jimple body from bytecode?Which ones are purely optimization passes and can be safely skipped or applied later?
Is there any official documentation, list, or rule of thumb to distinguish between “construction” and “optimization” phases in Soot?
I'm aiming to build a custom Soot pipeline, and I’d like to better understand the role, dependencies, and impact of each transformation phase to avoid unnecessary overhead.
Any clarification, insight, or reference would be highly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions