Skip to content

Conversation

@Will-ShaoHua
Copy link
Contributor

@Will-ShaoHua Will-ShaoHua commented Apr 24, 2025

Problem

revision of #7134, this pr aims to address the comment from the previous PR #7134 (comment) to extract the logic to a shared module so the diff against 7134 is expected to be large.

after deployment 4/21, service has a strict 1 tps throttling policy and there was no caching for the API call previously.
It will impact users as long as they have multiple ide instances opened as all of them will make list profile call while users attempt to sign in.

Solution

  1. cache the api result for 60 seconds for reuse
  2. use lock to prevent multiple instances trying to call at the same time. Only 1 will make the service call and the rest will wait until it's done and read from the cache.

  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

  • This pull request modifies code in src/* but no tests were added/updated.
    • Confirm whether tests should be added or ensure the PR description explains why tests are not required.

…se it will require users to reselect profile"

This reverts commit 3e50145.
@Will-ShaoHua Will-ShaoHua changed the title wip cache resource implementation fix(amazonq): cache resource implementation Apr 24, 2025
@Will-ShaoHua Will-ShaoHua reopened this Apr 24, 2025
@Will-ShaoHua Will-ShaoHua marked this pull request as ready for review April 25, 2025 05:14
@Will-ShaoHua Will-ShaoHua requested review from a team as code owners April 25, 2025 05:14
@Will-ShaoHua Will-ShaoHua requested a review from justinmk3 April 25, 2025 05:20
@Will-ShaoHua
Copy link
Contributor Author

Will-ShaoHua commented Apr 25, 2025

@justinmk3 i will add test as a followup


if (!this.isConnected()) {
await this.regionProfileManager.invalidateProfile(this.regionProfileManager.activeRegionProfile?.arn)
await this.regionProfileManager.resetCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should invalidateProfile do this internally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda redundant i think, and also its name is confusing for sure (should've named it releaseLock), but i will remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateProfile doesn't need this i think as the source of truth of profile invalidateProfile is also from this resourceCache. Originally i added this line is kinda for debugging purpose and to avoid a dead lock scenario when the code was problematic. However i think releaseLock will be called

  1. when getResource goes to pull real response and executes successfully
  2. when getResource goes to pull real response and executes exceptionally
  3. when getResource goes to return cached value
  4. if one process/ide instance waits to long (current it's set to 15s), it will go either (1) or (2) and eventually release the lock again

so I think it's redundant

Copy link
Contributor Author

@Will-ShaoHua Will-ShaoHua Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reminds me that i need to add one clearCache onConnectionChanged. Cache expiration is only 60s atm tho, worth adding one in case ppl change connection and use the wrong resultset.

const _acquireLock = async () => {
const cachedValue = this.readCacheOrDefault()

if (!cachedValue.resource.locked) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this (also) check resource.timestamp and skip lock-aquisition if the cached value is new enough? what will prevent multiple vscode instances from sequentially waiting and then (redundantly) making the service call?

Comment on lines 59 to 69
const cachedValue = await this.tryLoadResourceAndLock()
const resource = cachedValue?.resource

// If cache is still fresh, return cached result, otherwise pull latest from the service
if (cachedValue && resource && resource.result) {
const duration = now() - resource.timestamp
if (duration < this.expirationInMilli) {
logger.info(`cache hit, duration(%sms) is less than expiration(%sms)`, duration, this.expirationInMilli)
// release the lock
await this.releaseLock(resource, cachedValue)
return resource.result
Copy link
Contributor

@justinmk3 justinmk3 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I think I get it: multiple vscode instances will all acquire the lock, but the N+1 instances will each return here.

I guess that's fine. But it means they all must acquire a lock just to do the read. Maybe that's necessary, to avoid reading during an update.

Copy link
Contributor Author

@Will-ShaoHua Will-ShaoHua Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea exactly, i should have mentioned it more explicitly in the doc string, it's like a read write lock? if i understand it correctly.

But not multiple vscode instances will all acquire the lock, only 1 can hold the lock and the rest of them need to wait until the first one finish pulling the real response and release the lock. Then the rest of them can start acquiring the lock 1 by 1 and return early here using the cached value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're definitely room to improve i think in the future. Not have enough time to consider all the potential use cases and requirements so currently it only meets the needs for listProfile / listCustomization.

Copy link
Contributor

@justinmk3 justinmk3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for followup

@Will-ShaoHua Will-ShaoHua changed the title fix(amazonq): cache resource implementation fix(amazonq): fix enterprise users not able to sign in correctly if they have 2+ vscode instances open Apr 25, 2025
@nkomonen-amazon nkomonen-amazon merged commit 3549e33 into aws:master Apr 25, 2025
31 checks passed
@Will-ShaoHua Will-ShaoHua deleted the cache-impl branch April 25, 2025 19:02
Comment on lines +80 to +88
logger.debug(
`cache is stale, duration(%sms) is older than expiration(%sms), pulling latest resource %s`,
duration,
this.expirationInMilli,
this.key
)
}
} else {
logger.info(`cache miss, pulling latest resource %s`, this.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I open 2 vscode instances locally, I see cache is stale in one, and cache miss in the other. That means the fetch (service call) happens in both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, one of my commit today broke it... #7173

Will-ShaoHua added a commit to Will-ShaoHua/aws-toolkit-vscode that referenced this pull request Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants