-
Notifications
You must be signed in to change notification settings - Fork 730
telemetry: add domain acccount id telemetry #8048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done |
|
|
✅ I finished the code review, and didn't find any security or code quality issues. |
| // User cancelled | ||
| logger.debug('User cancelled domain URL input') | ||
| return | ||
| throw new ToolkitError('User cancelled domain URL input', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this typical? Why throw error, this is just a normal scenario where user cancelled workflow right?
Will this show up as a Fault or no in telemetry/dashboards/alarms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And did we validate and test? Are the telemetry events being emitted correctly? Can you share examples events? On Slack is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple return signal toolkit to overwrite result:succeeded.
Yes it is tested:
Metadata: {
metricId: 'e07ce894-93e6-4f9b-a0b2-363f01a95c6e',
traceId: '7fb087c7-dcd5-4e4e-bbf5-1f01abf0330f',
command: 'aws.smus.login',
duration: '3026',
result: 'Cancelled',
reason: 'Error',
reasonDesc: 'Failed to initiate login. | User cancelled domain URL input',
awsAccount: 'not-set',
awsRegion: 'us-east-1'
},
Value: 1,
Unit: 'None',
Passive: true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would user see in this case? It's a very common case for user to firstly click on log in and realize they didn't have the domain url ready. So they leave to find the domain url. I don't think in this case they should see an error when they come back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it by skipping showErrorMessage when UserCancelled
| try { | ||
| const derCredProvider = await authProvider.getDerCredentialsProvider() | ||
| const stsClient = new DefaultStsClient(region, await derCredProvider.getCredentials()) | ||
| const callerIdentity = await stsClient.getCallerIdentity() | ||
| span.record({ | ||
| smusDomainAccountId: callerIdentity.Account, | ||
| }) | ||
| } catch (err) { | ||
| logger.error( | ||
| `Failed to resolve AWS account ID via STS Client for domain ${domainId} in region ${region}: ${err}` | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just being duplicated multiple times. Can we just store it as a property of the connection similar to domainId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored the duplicated parts.
| }, | ||
| { | ||
| "type": "smusDomainAccountId", | ||
| "required": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set required to false? How does this work if set to true? Does it not emit metric at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If true, we must provide it. It is set false, but in practice whenever account id is available, it will be recorded. "required": false to avoid any potential issue if account id was not available.
| }, | ||
| { | ||
| "name": "smus_startSpace", | ||
| "name": "smus_openRemoteConnection", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this break something on client side? Hoping not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not afaik, the previous metric will be discontinued.
| "type": "smusDomainAccountId", | ||
| "required": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need this for all other SMUS metrics? It should be part of all SMUS metrics IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is added to all metrics.
4c8031e to
62aa073
Compare
| * @returns An object containing the region, accountId, and resourceName | ||
| * @throws If the ARN format is invalid | ||
| */ | ||
| export function parseArn(arn: string): { region: string; accountId: string } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a better naming for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, lets rename to parseAccountIdFromSageMakerArn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is fixed
| * @returns Promise resolving to the domain's AWS account ID | ||
| * @throws ToolkitError if unable to retrieve account ID | ||
| */ | ||
| public async getDomainAccountId(): Promise<string> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can follow up: let's add a unit test for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is added
| public async getDomainAccountId(): Promise<string> { | ||
| const logger = getLogger() | ||
|
|
||
| if (!this.activeConnection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check in line 451 needs to go first before all checks for active connection etc. If not, this code will break in remote session.
| */ | ||
| export function parseArn(arn: string): { region: string; accountId: string } { | ||
| // Strip any prefix before '@' | ||
| const cleanedArn = arn.includes('@') ? arn.split('@')[1] : arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is removed.
| * @returns An object containing the region, accountId, and resourceName | ||
| * @throws If the ARN format is invalid | ||
| */ | ||
| export function parseArn(arn: string): { region: string; accountId: string } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, lets rename to parseAccountIdFromSageMakerArn
| const accountId = await authProvider.getDomainAccountId() | ||
| span.record({ | ||
| smusSpaceKey: node.resource.DomainSpaceKey, | ||
| smusDomainRegion: node.resource.regionCode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Space metrics, it should be the projectDomainRegion? Or at least that also should be added. Feel free to follow up with adding ProjectAccountId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add to this, I think in this case, the smusProjectAccountId is also helpful. This is the same for the data node telemtry
Basically, if it's space or data connection, then the resources can be in associated account, instead of domain account. In this case, it is helpful to have projectRegion, and projectAccountId
| throw new Error(`Region is undefined for domain ${domainId}`) | ||
| } | ||
| const accountId = await authProvider.getDomainAccountId() | ||
| span.record({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this span object need to be closed or something for the metric to emitted? Is that handled by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span will emits metatdata to the wrapper metric. There is a run method with specified metric that wraps these callbacks.
| secretAccessKey: 'test-secret', | ||
| sessionToken: 'test-token', | ||
| }), | ||
| getDomainAccountId: async () => '123456789012', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really validating that we are seeing it in telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is manually verified, but here it is not validated in unittest. there are other places where i validated telemetry via assertTelemetry.
62aa073 to
79c458f
Compare
|
Changes LGTM, we should have just added the projectAccountId and region also while at it. But approving revision, please fix the failing tests and lint issues. |
79c458f to
a977a4f
Compare
a977a4f to
0c80633
Compare

Problem
Solution
Test
npm run generateTelemetrynpm testfeature/xbranches will not be squash-merged at release time.