Skip to content

Using Geometric Encoding based Context Sensitive Points to Analysis (geomPTA)

Xiao Xiao edited this page Apr 27, 2014 · 31 revisions

Contributed by Xiao Xiao

Introduction

geomPTA is a context sensitive points-to analysis based on SPARK. It uses the call graph generated by SPARK to build the full context sensitivity model for subsequent analysis. The call graph cycles are handled by a special form of 1CFA for better precision, because Java programs intend to have large cycles. The algorithm details can be found in our technical report, where the evaluation results are only for reference because we are continuing to improve the precision and quality of geomPTA.

Running geomPTA

Starting geomPTA is simply by specifying the options "cg.spark enabled:true" and "cg.spark geom-pta:true". An useful option is "cg.spark geom-runs", which is 1 by default. This option controls how many times the geomPTA will be iterated, where more iterations mean better precision. Usually, setting "cg.spark geom-runs:3" is enough for high precision.

By default, geomPTA does not compute the context sensitive points-to information for all variables that are seen by SPARK. geomPTA artificially marks the functions in the packages with the names starting by "java.", "sun.", and etc. (For full list, please see function isJavaLibraryClass in class SootClass) as library functions. For rest of the functions, they are called application functions, although not all are really from user's code. geomPTA first identifies the pointers in library functions that potentially impact the points-to information of pointers in application code. Only these pointers and the pointers in application code will be updated by geomPTA, and we call them core pointers. In order to soundly refine the call graph, geomPTA also marks the base pointers at virtual callsites as core pointers.

When geomPTA finished, only the core pointers gain the context sensitive points-to information. The non-core pointers are still kept for context insensitive points-to information computed by SPARK. If you want to change this default behavior to computing points-to information for all pointers, please put the following code before running soot:

Parameters.seedPts = Constants.seedPts_all;

Both the classes Parameters and Constants are in the package soot.jimple.spark.geom.geomPA.

geomPTA also updates the call graph computed by SPARK, which can be obtained by Scene.v().getCallGraph(). After geomPTA, the virtual calls (including the interface calls) will be refined with up-to-date points-to information. The spurious call edges will be removed and after the removal, the unreachable methods will also be cleaned. Generally, the refined call graph will be much precise than that generated by SPARK.

Note: If you iterate the call graph edges obtained by Scene.v().getCallGraph().listener(), the call edges removed in the updated call graph may also present in the iterator. This is because all the edges ever added to the call graph are kept tracked.

Querying

Querying with SPARK interface

SPARK provides various formats of reachingObject for querying points-to information for a given local variable, global variable, or an object field. The result is returned in PointsToSet, an interface that provides basic utilities for visiting the points-to result. However, it doesn't permit programmers iterating the objects contained in the set. To do this, you can always safely cast a PointsToSet object returned by reachingObject to the type PointsToSetInternal, which has a forall utility. Sample code is as follows:

PointsToSetInternal pts = vn.getP2Set(); pts.forall( new P2SetVisitor() { public void visit(Node n) { // Do what you like with n, which is in the type of AllocNode });

Querying with geomPTA interface

In addition to the SPARK querying system, geomPTA provides its own interface for querying context sensitive points-to information in more sophisticated usage scenarios.

TBC......

Clone this wiki locally