Skip to content

Conversation

@Hweinstock
Copy link
Contributor

Problem

The issue is here:

private updatePendingNodes() {
this.pollingSet.forEach(async (instanceId) => {
const childNode = this.ec2InstanceNodes.get(instanceId)!
await childNode.updateStatus()
if (!childNode.isPending()) {
this.pollingSet.delete(instanceId)
await childNode.refreshNode()
}
})
}

This fires off multiple asynchronous calls on each child, with the function returning before any are resolved. Thus, the child node could be deleted from the set after this function returned (potentially when another piece of code is using it). This leads to undefined errors within that function.

Solution

Refactor the code to make use of a for...of with await pattern.


License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Hweinstock Hweinstock marked this pull request as ready for review September 30, 2024 21:14
@Hweinstock Hweinstock requested a review from a team as a code owner September 30, 2024 21:14
Copy link
Contributor

@justinmk3 justinmk3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. maybe we should ban .forEach, it's pretty useless

does this close an issue or it's just something you noticed?

@Hweinstock
Copy link
Contributor Author

Hweinstock commented Sep 30, 2024

Yeah, I set up an issue tracking this as part of the reliability work when I saw it caused a CI file on another branch. Luckily it seems like a quick fix.

Also, I would agree about the forEach. It lets you give it an async function but the behavior isn't what you'd expect. Could also write/reuse some of the asyncCollection utilities to make an async safe version.

@Hweinstock Hweinstock merged commit fde2fe5 into aws:master Oct 1, 2024
9 of 10 checks passed
@Hweinstock Hweinstock deleted the ec2/fixFlaky branch October 1, 2024 13:04
Hweinstock added a commit that referenced this pull request Oct 14, 2024
## Problem
Follow up to: #5698
Fix: #5750

### First Problem Identified
The current tests in
https://github.com/aws/aws-toolkit-vscode/blob/master/packages/core/src/awsService/ec2/explorer/ec2ParentNode.ts
do not clear the `PollingSet` in between tests, therefore the timer
continues to run. This not only results in inconsistent state between
the items in the `PollingSet` and the items stored on the parent node
itself, but also allows the `PollingSet` action to trigger unexpectedly
in the middle of another test. When this happens, and the states don't
match, the `instanceId` from the `PollingSet` could potentially not be
found on the parent node, resulting in an undefined node. Then, calling
`updateStatus` on this node could explain the error.

For this error to happen, the `PollingSet` timer must trigger at a very
specific point (in the middle of another test), making it difficult to
debug and reproduce, and causing occasional flaky test failures.

### Second Problem Identified
The tests in
https://github.com/aws/aws-toolkit-vscode/blob/master/packages/core/src/awsService/ec2/explorer/ec2InstanceNode.ts
setup an unnatural state. The `testNode` sets its parent to the
`testParentNode`, but `testParentNode` is unaware this child exists.
This is problematic because when the child reports its status as pending
to the parent, the parent will throw an error since it can't find the
child. (example:
https://d1ihu6zq92vp9p.cloudfront.net/7b1e96c7-f2b7-4a8e-927c-b7aa84437960/report.html)
This type of state should not be allowed.
## Solution
### Solution to first problem
- clear the set, and the timer in-between each test. 

### Solution to second problem 
- make this confusing state impossible by adding the child to the parent
and parent to the child together.
- stub the resulting API call its makes. 
- Disable the polling set timer in the second tests (via sinon stubbing)
since it isn't relevant to what's being tested.

### Tangential Work included in PR:
- Throw our own error when an `instanceId` is not in map on the
parentNode. This will make it easier to catch if something goes wrong.
- Clean up test file:
`src/test/awsService/ec2/explorer/ec2ParentNode.test.ts`.

---

<!--- REMINDER: Ensure that your PR meets the guidelines in
CONTRIBUTING.md -->

License: I confirm that my contribution is made under the terms of the
Apache 2.0 license.
tverney pushed a commit to tverney/aws-toolkit-vscode that referenced this pull request Oct 21, 2024
## Problem
Follow up to: aws#5698
Fix: aws#5750

### First Problem Identified
The current tests in
https://github.com/aws/aws-toolkit-vscode/blob/master/packages/core/src/awsService/ec2/explorer/ec2ParentNode.ts
do not clear the `PollingSet` in between tests, therefore the timer
continues to run. This not only results in inconsistent state between
the items in the `PollingSet` and the items stored on the parent node
itself, but also allows the `PollingSet` action to trigger unexpectedly
in the middle of another test. When this happens, and the states don't
match, the `instanceId` from the `PollingSet` could potentially not be
found on the parent node, resulting in an undefined node. Then, calling
`updateStatus` on this node could explain the error.

For this error to happen, the `PollingSet` timer must trigger at a very
specific point (in the middle of another test), making it difficult to
debug and reproduce, and causing occasional flaky test failures.

### Second Problem Identified
The tests in
https://github.com/aws/aws-toolkit-vscode/blob/master/packages/core/src/awsService/ec2/explorer/ec2InstanceNode.ts
setup an unnatural state. The `testNode` sets its parent to the
`testParentNode`, but `testParentNode` is unaware this child exists.
This is problematic because when the child reports its status as pending
to the parent, the parent will throw an error since it can't find the
child. (example:
https://d1ihu6zq92vp9p.cloudfront.net/7b1e96c7-f2b7-4a8e-927c-b7aa84437960/report.html)
This type of state should not be allowed.
## Solution
### Solution to first problem
- clear the set, and the timer in-between each test. 

### Solution to second problem 
- make this confusing state impossible by adding the child to the parent
and parent to the child together.
- stub the resulting API call its makes. 
- Disable the polling set timer in the second tests (via sinon stubbing)
since it isn't relevant to what's being tested.

### Tangential Work included in PR:
- Throw our own error when an `instanceId` is not in map on the
parentNode. This will make it easier to catch if something goes wrong.
- Clean up test file:
`src/test/awsService/ec2/explorer/ec2ParentNode.test.ts`.

---

<!--- REMINDER: Ensure that your PR meets the guidelines in
CONTRIBUTING.md -->

License: I confirm that my contribution is made under the terms of the
Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants