Skip to content

Commit c099dbd

Browse files
committed
Python: Expand notes around bound methods self argument passing
1 parent 02b3a1b commit c099dbd

File tree

1 file changed

+45
-3
lines changed

1 file changed

+45
-3
lines changed

python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1151,6 +1151,40 @@ predicate normalCallArg(CallNode call, Node arg, ArgumentPosition apos) {
11511151
*
11521152
* Note: If `Bar.meth` and `Foo.meth` resolves to the same function, we will end up
11531153
* sending both `self` arguments to that function, which is by definition the right thing to do.
1154+
*
1155+
* ### Bound methods
1156+
*
1157+
* For bound methods, such as `bm = x.m; bm()`, it's a little unclear whether we should
1158+
* still use the object in the attribute lookup (`x.m`) as the self argument in the
1159+
* call (`bm()`). We currently do this, but there might also be cases where we don't
1160+
* want to do this.
1161+
*
1162+
* In the example below, we want to clear taint from the list before it reaches the
1163+
* sink, but because we don't have a use of `l` in the `clear()` call, we currently
1164+
* don't have any way to achieve our goal. (Note that this is a contrived example)
1165+
*
1166+
* ```py
1167+
* l = list()
1168+
* clear = l.clear
1169+
* l.append(tainted)
1170+
* clear()
1171+
* sink(l)
1172+
* ```
1173+
*
1174+
* To make the above even worse, bound-methods have a `__self__` property that refers to
1175+
* the object of the bound-method, so we can re-write the code as:
1176+
*
1177+
* ```py
1178+
* l = list()
1179+
* clear = l.clear
1180+
* clear.__self__.append(tainted)
1181+
* clear()
1182+
* sink(l)
1183+
* ```
1184+
*
1185+
* One idea to solve this is to track the object in a synthetic data-flow node every
1186+
* time the bound method is used, such that the `clear()` call would essentially be
1187+
* translated into `l.clear()`, and we can still have use-use flow.
11541188
*/
11551189
cached
11561190
predicate getCallArg(CallNode call, Function target, CallType type, Node arg, ArgumentPosition apos) {
@@ -1160,16 +1194,24 @@ predicate getCallArg(CallNode call, Function target, CallType type, Node arg, Ar
11601194
type instanceof CallTypePlainFunction and
11611195
normalCallArg(call, arg, apos)
11621196
or
1163-
// self argument for normal method calls
1197+
// self argument for normal method calls -- see note above about bound methods
11641198
type instanceof CallTypeNormalMethod and
11651199
apos.isSelf() and
11661200
resolveMethodCall(call, target, type, arg) and
1167-
// dataflow lib has requirement that arguments and calls are in same enclosing callable.
1201+
// dataflow lib has requirement that arguments and calls are in same enclosing
1202+
// callable. This requirement would be broken if we used `my_obj` as the self
1203+
// argument in the `f()` call in the example below:
1204+
// ```py
1205+
// def call_func(f):
1206+
// f()
1207+
//
1208+
// call_func(my_obj.some_method)
1209+
// ```
11681210
exists(CfgNode cfgNode | cfgNode.getNode() = call |
11691211
cfgNode.getEnclosingCallable() = arg.getEnclosingCallable()
11701212
)
11711213
or
1172-
// cls argument for classmethod calls
1214+
// cls argument for classmethod calls -- see ntoe above about bound methods
11731215
type instanceof CallTypeClassMethod and
11741216
apos.isSelf() and
11751217
resolveMethodCall(call, target, type, arg) and

0 commit comments

Comments
 (0)