Skip to content

[php2cpg] feat: add multi array dimension assignment support#5887

Open
TNSelahle wants to merge 7 commits intomasterfrom
tebogo/nested-empty-array-access
Open

[php2cpg] feat: add multi array dimension assignment support#5887
TNSelahle wants to merge 7 commits intomasterfrom
tebogo/nested-empty-array-access

Conversation

@TNSelahle
Copy link
Copy Markdown
Member

@TNSelahle TNSelahle commented Mar 16, 2026

  • Add multi array dimension assignment support
  • Fix tmp var numberings not being unique when a tmp variable is generated while in a block scope

Closes https://github.com/ShiftLeftSecurity/codescience/issues/8691

@TNSelahle TNSelahle requested a review from ml86 March 16, 2026 11:23
@TNSelahle
Copy link
Copy Markdown
Member Author

@ml86 Since the PHP parser doesn't provide code to use for populating the CODE property, getting this became a bit tricky in this implementation. See codeForSingleArrayElemAssign and codeForMultiArrayDimAccess in AstCreatorHelper.scala

For an expression like $arr[$expr][], I generated the asts for the array access dimensions simply to get the code, which is a waste. However, I couldn't think of a better way to do this. An improvement would be to store some sort of cache for the code mapping a PhpExpr node to its generated code so it can be reused in other places.

assignTwo.order shouldBe 2

assignThree.name shouldBe Operators.assignment
assignThree.code shouldBe "$foo@tmp-3 = array(2 => $foo@tmp-1)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lowering is fine in the case where $xs does not already exist, but I think it could be problematic if $xs[1][2] is an array that was already tainted. This step plus the next would then overwrite the tainted array.

The lowering in general looks to be the other way around from what we discussed, but I also wasn't involved in later discussions. Did the plan change after we last spoke?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you mean in that case, the previous taint information would be lost?

Regarding the lowering, the plan initially didn't change but when I started with the implementation, it was a bit more intuitive implementing it going top-to-bottom instead of bottom-to-top.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lowering looks good to me. If an array is tainted with a wildcard access path the assignment $xs[1][2]=... does not remove the taint.

Copy link
Copy Markdown
Contributor

@johannescoetzee johannescoetzee Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For $xs[][2][] = $val; this would be fine, but for $xs[1][2][] = $val we run into an issue. As it is lowered here, the first couple of assignments effectively reduce to

$xs[1] = array(2 => $val)

which means if we had something like $xs[1] = array(1 => "bad data"), that first element would be removed after the lowered assignment above.

Copy link
Copy Markdown
Contributor

@johannescoetzee johannescoetzee Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit to add the two specific lines in the lowering I'm concerned about:

assignTwo.code shouldBe "$foo@tmp-1 = array($foo@tmp-0)" // effectively = array(val)
assignFour.code shouldBe "$xs[1] = $foo@tmp-3"           // $foo@tmp-3 == array(2 => foo@tmp-1)

To give a concrete example, it's the difference between

<?php

$xs = array(array(1 => "bad"));

$xs[0][2] = "good";

var_dump($xs);

// output
array(1) {
  [0]=>
  array(2) {
    [1]=>
    string(3) "bad"
    [2]=>
    string(4) "good"
  }
}

and

<?php

$xs = array(array(1 => "bad"));

$xs[0] = array(2 => "good");

var_dump($xs);

// output
array(1) {
  [0]=>
  array(1) {
    [2]=>
    string(4) "good"
  }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking some more and I think I didn't really understand the implications of using wildcard access paths for taint. If we're just using wildcards, then

xs[0][2] = ...

and

xs[0] = array(2 => ...)

should be equivalent since from the DF engine's perspective, it's unknown what's being overwritten in any case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch! I missed this. This poses a problem anytime an index in the chain is pointing to an array. Whoops.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I see your point @johannescoetzee. I think we can relatively easy fix this buy just emitting a normal access operator chain for every up to the first [] from where on we know that no old taint can be overwritten.

"assignments using the multi array dimension fetch syntax with last dimension should be rewritten as multiple assignments" in {
val cpg = code("""<?php
|function foo($val) {
| $xs[1][2][] = $val;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test for something like $xs[1][][2] not ending with a dimensionless index access is also missing.


rhsIdxAccess.methodFullName shouldBe Operators.indexAccess
rhsIdxAccess.code shouldBe s"$$foo@tmp-0[${arrayIndex}]" // $foo@tmp-0 is the name of the variable holding the current value
rhsIdxAccess.code shouldBe s"$$foo@tmp-1[${arrayIndex}]" // $foo@tmp-1 is the name of the variable holding the current value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing these changing are due to the duplicate ASTs being generated for the code? I do think we should find another option there. Duplicating work generating the AST isn't ideal, especially if it's only for the code field

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are changing due to my updates to the getNewVarTmp method in Scope. The method previously used the counter from the top scope if it was a method-, class- or namespace- scope. If the top scope was a block scope, a new counter would be used and but the tmp var name would still be dependent on the enclosing method, type, or namespace name. So then it was possible to get a name that wasn't unique.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please separate the temp variable name fix into a separate PR.

Comment on lines +112 to +114
retIden.name shouldBe "foo@tmp-0"
retIden.code shouldBe "$foo@tmp-0"
retIden.order shouldBe 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not correct. This is saying that $val appears as the last child in the method body, but that should not be the case

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole assignment expression is represented by the BLOCK node and so I've made the $val the last child of that. I think you may have confused that block as being the method block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I did miss one of the astChildrens in the query.

@TNSelahle TNSelahle force-pushed the tebogo/nested-empty-array-access branch 2 times, most recently from f15c613 to 04129d6 Compare March 24, 2026 11:21
inside(block.astChildren.not(_.isLocal).l) {
case List(assignOne: Call, assignTwo: Call, assignThree: Call, arrayPushCall: Call, retIden: Identifier) =>
assignOne.name shouldBe Operators.assignment
assignOne.code shouldBe "$foo@tmp-0 = $val"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets avoid creating a temp variable also for the right hand side of the assignment. We already create one for each array invocation which makes the lowering complicated enough.

arrayPushCall.code shouldBe "$xs[] = $foo@tmp-3"
arrayPushCall.order shouldBe 4

retIden.name shouldBe "foo@tmp-0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the right hande side of assignment in this case $val as last identifier of the block.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ast for the right-hand side of the assignment is generated somewhere within the recursive method traverseArrayDimExpr. I didn't want to over-complicated the implementation. I initially called astForExpr for the RHS just before creating the Ast for the block but then since the RHS expression gets passed into traverseArrayDimExpr, astForExpr will be called again on the RHS expression.

I assign it to a tmp variable since calling astForExpr twice on the same PhpVariable isn't as bad as doing it on whatever complexion expression might be on the RHS

@TNSelahle TNSelahle force-pushed the tebogo/nested-empty-array-access branch from 04129d6 to 9ac995b Compare April 1, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants