[php2cpg] feat: add multi array dimension assignment support#5887
[php2cpg] feat: add multi array dimension assignment support#5887
Conversation
|
@ml86 Since the PHP parser doesn't provide code to use for populating the CODE property, getting this became a bit tricky in this implementation. See For an expression like |
| assignTwo.order shouldBe 2 | ||
|
|
||
| assignThree.name shouldBe Operators.assignment | ||
| assignThree.code shouldBe "$foo@tmp-3 = array(2 => $foo@tmp-1)" |
There was a problem hiding this comment.
This lowering is fine in the case where $xs does not already exist, but I think it could be problematic if $xs[1][2] is an array that was already tainted. This step plus the next would then overwrite the tainted array.
The lowering in general looks to be the other way around from what we discussed, but I also wasn't involved in later discussions. Did the plan change after we last spoke?
There was a problem hiding this comment.
Hmm, do you mean in that case, the previous taint information would be lost?
Regarding the lowering, the plan initially didn't change but when I started with the implementation, it was a bit more intuitive implementing it going top-to-bottom instead of bottom-to-top.
There was a problem hiding this comment.
The lowering looks good to me. If an array is tainted with a wildcard access path the assignment $xs[1][2]=... does not remove the taint.
There was a problem hiding this comment.
For $xs[][2][] = $val; this would be fine, but for $xs[1][2][] = $val we run into an issue. As it is lowered here, the first couple of assignments effectively reduce to
$xs[1] = array(2 => $val)
which means if we had something like $xs[1] = array(1 => "bad data"), that first element would be removed after the lowered assignment above.
There was a problem hiding this comment.
Edit to add the two specific lines in the lowering I'm concerned about:
assignTwo.code shouldBe "$foo@tmp-1 = array($foo@tmp-0)" // effectively = array(val)
assignFour.code shouldBe "$xs[1] = $foo@tmp-3" // $foo@tmp-3 == array(2 => foo@tmp-1)
To give a concrete example, it's the difference between
<?php
$xs = array(array(1 => "bad"));
$xs[0][2] = "good";
var_dump($xs);
// output
array(1) {
[0]=>
array(2) {
[1]=>
string(3) "bad"
[2]=>
string(4) "good"
}
}and
<?php
$xs = array(array(1 => "bad"));
$xs[0] = array(2 => "good");
var_dump($xs);
// output
array(1) {
[0]=>
array(1) {
[2]=>
string(4) "good"
}
}There was a problem hiding this comment.
I've been thinking some more and I think I didn't really understand the implications of using wildcard access paths for taint. If we're just using wildcards, then
xs[0][2] = ...
and
xs[0] = array(2 => ...)
should be equivalent since from the DF engine's perspective, it's unknown what's being overwritten in any case.
There was a problem hiding this comment.
Very good catch! I missed this. This poses a problem anytime an index in the chain is pointing to an array. Whoops.
There was a problem hiding this comment.
Ok, now I see your point @johannescoetzee. I think we can relatively easy fix this buy just emitting a normal access operator chain for every up to the first [] from where on we know that no old taint can be overwritten.
| "assignments using the multi array dimension fetch syntax with last dimension should be rewritten as multiple assignments" in { | ||
| val cpg = code("""<?php | ||
| |function foo($val) { | ||
| | $xs[1][2][] = $val; |
There was a problem hiding this comment.
A test for something like $xs[1][][2] not ending with a dimensionless index access is also missing.
|
|
||
| rhsIdxAccess.methodFullName shouldBe Operators.indexAccess | ||
| rhsIdxAccess.code shouldBe s"$$foo@tmp-0[${arrayIndex}]" // $foo@tmp-0 is the name of the variable holding the current value | ||
| rhsIdxAccess.code shouldBe s"$$foo@tmp-1[${arrayIndex}]" // $foo@tmp-1 is the name of the variable holding the current value |
There was a problem hiding this comment.
I'm guessing these changing are due to the duplicate ASTs being generated for the code? I do think we should find another option there. Duplicating work generating the AST isn't ideal, especially if it's only for the code field
There was a problem hiding this comment.
These are changing due to my updates to the getNewVarTmp method in Scope. The method previously used the counter from the top scope if it was a method-, class- or namespace- scope. If the top scope was a block scope, a new counter would be used and but the tmp var name would still be dependent on the enclosing method, type, or namespace name. So then it was possible to get a name that wasn't unique.
There was a problem hiding this comment.
Please separate the temp variable name fix into a separate PR.
| retIden.name shouldBe "foo@tmp-0" | ||
| retIden.code shouldBe "$foo@tmp-0" | ||
| retIden.order shouldBe 5 |
There was a problem hiding this comment.
This is also not correct. This is saying that $val appears as the last child in the method body, but that should not be the case
There was a problem hiding this comment.
The whole assignment expression is represented by the BLOCK node and so I've made the $val the last child of that. I think you may have confused that block as being the method block.
There was a problem hiding this comment.
You're right, I did miss one of the astChildrens in the query.
f15c613 to
04129d6
Compare
| inside(block.astChildren.not(_.isLocal).l) { | ||
| case List(assignOne: Call, assignTwo: Call, assignThree: Call, arrayPushCall: Call, retIden: Identifier) => | ||
| assignOne.name shouldBe Operators.assignment | ||
| assignOne.code shouldBe "$foo@tmp-0 = $val" |
There was a problem hiding this comment.
Lets avoid creating a temp variable also for the right hand side of the assignment. We already create one for each array invocation which makes the lowering complicated enough.
| arrayPushCall.code shouldBe "$xs[] = $foo@tmp-3" | ||
| arrayPushCall.order shouldBe 4 | ||
|
|
||
| retIden.name shouldBe "foo@tmp-0" |
There was a problem hiding this comment.
Use the right hande side of assignment in this case $val as last identifier of the block.
There was a problem hiding this comment.
The Ast for the right-hand side of the assignment is generated somewhere within the recursive method traverseArrayDimExpr. I didn't want to over-complicated the implementation. I initially called astForExpr for the RHS just before creating the Ast for the block but then since the RHS expression gets passed into traverseArrayDimExpr, astForExpr will be called again on the RHS expression.
I assign it to a tmp variable since calling astForExpr twice on the same PhpVariable isn't as bad as doing it on whatever complexion expression might be on the RHS
04129d6 to
9ac995b
Compare
Closes https://github.com/ShiftLeftSecurity/codescience/issues/8691