Skip to content

Conversation

@kzr-at-amazon
Copy link
Contributor

@kzr-at-amazon kzr-at-amazon commented Sep 15, 2025

Problem

  • Require account id to create a more complete product metric on SMUS feature usage

Solution

  • Add domain account id metadata to smus_login metric

Test

  • on core npm run generateTelemetry
  • npm test
  • manually test telemetry
2025-09-16 01:21:25.358 [debug] telemetry: smus_login {
  Metadata: {
    metricId: '43dfab8d-d5a6-4791-9afb-b0b3454efb23',
    traceId: 'ba4dd30a-3ad6-4101-84a0-8ae558947fed',
    parentId: '4e6f9558-995e-444e-8461-d718bd6434a5',
    duration: '1763',
    result: 'Cancelled',
    reason: 'UserCancelled',
    reasonDesc: 'Failed to initiate login. | UserCancelled: User cancelled domain URL input',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:22:39.406 [debug] telemetry: smus_login {
  Metadata: {
    metricId: '4840c7e2-7a68-49a8-8bb6-710d629856ab',
    traceId: '65160024-7c03-47c8-80e3-9fdf8493047b',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    awsRegion: 'us-east-2',
    smusDomainAccountId: '050752642559',
    duration: '34889',
    result: 'Succeeded',
    awsAccount: 'not-set'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:22:48.391 [debug] telemetry: smus_accessProject {
  Metadata: {
    metricId: 'fad0c4d1-49bd-4a87-aed5-72a6e4d04c3a',
    traceId: '65160024-7c03-47c8-80e3-9fdf8493047b',
    parentId: 'cda698a1-1c85-477c-9722-ca5b81ae6f63',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusProjectId: 'c1wqm5rlzb150p',
    smusDomainRegion: 'us-east-2',
    smusDomainAccountId: '050752642559',
    duration: '10451',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:22:51.374 [debug] telemetry: smus_renderProjectChildrenNode {
  Metadata: {
    metricId: '0bc6e3e4-a54b-44a2-84d6-b033e656d91e',
    traceId: 'e280e0c3-54c5-4aae-8431-ef5724a20e42',
    smusToolkitEnv: 'local',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusDomainAccountId: '050752642559',
    smusProjectId: 'c1wqm5rlzb150p',
    smusDomainRegion: 'us-east-2',
    duration: '2979',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: true
}

2025-09-16 01:24:25.389 [debug] telemetry: smus_renderLakehouseNode {
  Metadata: {
    metricId: 'f50b2b04-ffe7-49f6-8d85-a1e5d59438cc',
    traceId: '83155b49-3d84-4896-bb04-5ed5182006e6',
    smusToolkitEnv: 'local',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusDomainAccountId: '050752642559',
    smusProjectId: 'c1wqm5rlzb150p',
    smusConnectionId: 'c1pnab9bdg3qjt',
    smusConnectionType: 'LAKEHOUSE',
    smusProjectRegion: 'us-east-2',
    duration: '2321',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}


2025-09-16 01:24:54.443 [debug] telemetry: smus_renderS3Node {
  Metadata: {
    metricId: '74f286b6-1feb-43aa-b15e-4e0919ab272a',
    traceId: '9db37019-272f-4963-9e9e-1da9d9f9a26e',
    smusToolkitEnv: 'local',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusDomainAccountId: '050752642559',
    smusProjectId: 'c1wqm5rlzb150p',
    smusConnectionId: 'c13ow7arqtblih',
    smusConnectionType: 'S3',
    smusProjectRegion: 'us-east-2',
    duration: '2',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:25:40.741 [debug] telemetry: smus_openRemoteConnection {
  Metadata: {
    metricId: '2f32a489-b013-4f47-951f-04d535bc2761',
    traceId: '1b90a3dc-f01f-40cc-b0bf-5920b7c6e9cc',
    smusSpaceKey: 'd-rxs4hhmzrnho__ce',
    smusDomainRegion: 'us-east-2',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusDomainAccountId: '050752642559',
    smusProjectId: 'c1wqm5rlzb150p',
    duration: '18000',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:26:52.258 [debug] telemetry: smus_stopSpace {
  Metadata: {
    metricId: 'b0903fd5-ffb5-4a9a-b131-99baf11cb3a7',
    traceId: 'c5ec5f41-ae58-46b3-af8e-47c1d8fb737d',
    smusSpaceKey: 'd-rxs4hhmzrnho__ce',
    smusDomainRegion: 'us-east-2',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    smusDomainAccountId: '050752642559',
    smusProjectId: 'c1wqm5rlzb150p',
    duration: '2308',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

2025-09-16 01:27:22.027 [debug] telemetry: smus_signOut {
  Metadata: {
    metricId: 'e41138b8-9949-4088-b175-73c4d48110cc',
    traceId: 'b6a34176-e2a2-47f1-b170-d092488c0d00',
    parentId: '51f87fa9-fc1c-4351-914b-c0ce077b8f1b',
    smusDomainId: 'dzd_bh80g0fbj1h7xl',
    awsRegion: 'us-east-2',
    smusDomainAccountId: '050752642559',
    duration: '476',
    result: 'Succeeded',
    awsAccount: 'not-set'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}


  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kzr-at-amazon kzr-at-amazon requested a review from a team as a code owner September 15, 2025 20:37
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@github-actions
Copy link

  • This pull request modifies code in src/* but no tests were added/updated.
    • Confirm whether tests should be added or ensure the PR description explains why tests are not required.

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

// User cancelled
logger.debug('User cancelled domain URL input')
return
throw new ToolkitError('User cancelled domain URL input', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this typical? Why throw error, this is just a normal scenario where user cancelled workflow right?
Will this show up as a Fault or no in telemetry/dashboards/alarms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And did we validate and test? Are the telemetry events being emitted correctly? Can you share examples events? On Slack is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple return signal toolkit to overwrite result:succeeded.
Yes it is tested:

Metadata: {
    metricId: 'e07ce894-93e6-4f9b-a0b2-363f01a95c6e',
    traceId: '7fb087c7-dcd5-4e4e-bbf5-1f01abf0330f',
    command: 'aws.smus.login',
    duration: '3026',
    result: 'Cancelled',
    reason: 'Error',
    reasonDesc: 'Failed to initiate login. | User cancelled domain URL input',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would user see in this case? It's a very common case for user to firstly click on log in and realize they didn't have the domain url ready. So they leave to find the domain url. I don't think in this case they should see an error when they come back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The users will see this error message when they dont provide domainUrl. They can retry to provide domain url again. I was not able to override result:cancelled without ToolkitError. We rely on ToolkitError for now.

Screenshot 2025-09-15 at 5 14 20 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it by skipping showErrorMessage when UserCancelled

Comment on lines 271 to 268
try {
const derCredProvider = await authProvider.getDerCredentialsProvider()
const stsClient = new DefaultStsClient(region, await derCredProvider.getCredentials())
const callerIdentity = await stsClient.getCallerIdentity()
span.record({
smusDomainAccountId: callerIdentity.Account,
})
} catch (err) {
logger.error(
`Failed to resolve AWS account ID via STS Client for domain ${domainId} in region ${region}: ${err}`
)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just being duplicated multiple times. Can we just store it as a property of the connection similar to domainId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the duplicated parts.

},
{
"type": "smusDomainAccountId",
"required": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set required to false? How does this work if set to true? Does it not emit metric at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If true, we must provide it. It is set false, but in practice whenever account id is available, it will be recorded. "required": false to avoid any potential issue if account id was not available.

},
{
"name": "smus_startSpace",
"name": "smus_openRemoteConnection",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break something on client side? Hoping not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not afaik, the previous metric will be discontinued.

Comment on lines +1388 to +1389
"type": "smusDomainAccountId",
"required": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need this for all other SMUS metrics? It should be part of all SMUS metrics IMO.

Copy link
Contributor Author

@kzr-at-amazon kzr-at-amazon Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is added to all metrics.

@kzr-at-amazon kzr-at-amazon force-pushed the master branch 4 times, most recently from 4c8031e to 62aa073 Compare September 16, 2025 17:01
* @returns An object containing the region, accountId, and resourceName
* @throws If the ARN format is invalid
*/
export function parseArn(arn: string): { region: string; accountId: string } {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a better naming for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, lets rename to parseAccountIdFromSageMakerArn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fixed

* @returns Promise resolving to the domain's AWS account ID
* @throws ToolkitError if unable to retrieve account ID
*/
public async getDomainAccountId(): Promise<string> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can follow up: let's add a unit test for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is added

public async getDomainAccountId(): Promise<string> {
const logger = getLogger()

if (!this.activeConnection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check in line 451 needs to go first before all checks for active connection etc. If not, this code will break in remote session.

*/
export function parseArn(arn: string): { region: string; accountId: string } {
// Strip any prefix before '@'
const cleanedArn = arn.includes('@') ? arn.split('@')[1] : arn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is removed.

* @returns An object containing the region, accountId, and resourceName
* @throws If the ARN format is invalid
*/
export function parseArn(arn: string): { region: string; accountId: string } {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, lets rename to parseAccountIdFromSageMakerArn

const accountId = await authProvider.getDomainAccountId()
span.record({
smusSpaceKey: node.resource.DomainSpaceKey,
smusDomainRegion: node.resource.regionCode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Space metrics, it should be the projectDomainRegion? Or at least that also should be added. Feel free to follow up with adding ProjectAccountId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add to this, I think in this case, the smusProjectAccountId is also helpful. This is the same for the data node telemtry
Basically, if it's space or data connection, then the resources can be in associated account, instead of domain account. In this case, it is helpful to have projectRegion, and projectAccountId

throw new Error(`Region is undefined for domain ${domainId}`)
}
const accountId = await authProvider.getDomainAccountId()
span.record({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this span object need to be closed or something for the metric to emitted? Is that handled by the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span will emits metatdata to the wrapper metric. There is a run method with specified metric that wraps these callbacks.

secretAccessKey: 'test-secret',
sessionToken: 'test-token',
}),
getDomainAccountId: async () => '123456789012',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really validating that we are seeing it in telemetry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is manually verified, but here it is not validated in unittest. there are other places where i validated telemetry via assertTelemetry.

@vpbhargav
Copy link
Contributor

Changes LGTM, we should have just added the projectAccountId and region also while at it. But approving revision, please fix the failing tests and lint issues.

@ashishrp-aws ashishrp-aws merged commit e06830b into aws:master Sep 17, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants