diff --git a/P261194666.md b/P261194666.md new file mode 100644 index 00000000000..feafb3e7ce2 --- /dev/null +++ b/P261194666.md @@ -0,0 +1,630 @@ +# Root-cause SageMaker auth failure in Amazon Q for VSCode after v1.62.0 + +v1.63.0 of the extension introduced agentic chat and moved from directly calling the Q service using an AWS SDK client directly from the extension to indirectly +calling the Q service through the aws-lsp-codewhisperer service in language-servers (aka Flare). + +## Notes + +1. `isSageMaker` function used in many places in aws-toolkit-vscode codebase. `setContext` is used to set whether SMAI (`aws.isSageMaker`) or SMUS (`aws.isSageMakerUnifiedStudio`) is in use so that conditions testing those values can be used in package.json. +2. `aws-toolkit-vscode/packages/amazonq/src/lsp/chat/webviewProvider.ts` passes in booleans for SMAI and SMUS into MynahUI for Q Chat. +3. Files in `aws-toolkit-vscode/packages/core` apply to how the Amazon Q for VSCode extension used to call Q (formerly known as "CodeWhisperer") directly through an SDK client. It was this codebase that SageMaker depended on for authenticating with IAM credentials to Q. Files in `aws-toolkit-vscode/packages/amazonq` apply to how the extension now uses the LSP server (aka aws-lsp-codewhisperer in language-servers aka Flare) for indirectly accessing the Q service. The commit `938bb376647414776a55d7dd7d6761c863764c5c` is primarily what flipped over the extension from using core to amazonq leading to auth breaking for SageMaker. +4. Once we figure out how IAM credentials worked before (likely because core creates it's own SDK client and may do something fancy with auth that aws-lsp-codewhisperer does not), we may find that we need to apply a fix in aws-toolkit-vscode and/or language-servers. +5. Using the core (legacy) Q chat is not an option as the Amazon Q for VSCode team will not be maintaining it. +6. In user settings in VSCode, set `amazonq.trace.server` to `on` for more detailed logs from LSP server. +7. /Users/floralph/Source/P261194666.md contains A LOT of information about researching this issue so far. It can fill your context fast. We will refer to it from time to time and possibly migrate some of the most important information to this doc. You can ask about reading it, but don't read it unless I instruct you to do so and even then, you MUST stay focused only on what you've been asked to do with it. + +## This is CRITICAL + +When trying to root cause the issue, it is ABSOLUTELY CRITICAL that we follow the path of execution related to the CodeWhisperer LSP server from start onwards without dropping the trail. We CANNOT just assume things about other parts of code based on names nor should we assume they are even related to our issue if they are not in the specific code path that we're following. We have to be laser focused on following the code path and looking for issues, not jumping to conclusions and jumping to other code. As we have to use logging as our only means of tracing/debugging in the SageMaker instance, we can use that to follow the path of execution. + +## Repos on disk + +1. /Users/floralph/Source/aws-toolkit-vscode + 1. Branch from breaking commit 938bb376647414776a55d7dd7d6761c863764c5c for experimenting on: bug/sm-auth + 2. Branch where you tried to add IAM creds using /Users/floralph/Source/P261194666.md: floralph/P261194666 +2. /Users/floralph/Source/language-server-runtimes +3. /Users/floralph/Source/language-servers + +If we absolutely need to look at MynahUI code, I can try to track it down. It might be lingering in one of the repos above though. + +## Important files likely related to the issue and fix + +- aws-toolkit-vscode/packages/amazonq/src/extensionNode.ts + +## Git Bisect Results - Breaking Commit + +**Commit ID:** `938bb376647414776a55d7dd7d6761c863764c5c` +**Author:** Josh Pinkney +**Date:** Not specified in bisect output +**Title:** Enable Amazon Q LSP experiments by default + +### What This Commit Changed + +This commit flipped three experiment flags from `false` to `true`, fundamentally changing Amazon Q's architecture from legacy chat system to LSP-based system: + +```diff +diff --git a/packages/amazonq/src/extension.ts b/packages/amazonq/src/extension.ts +index fe5ce809c..345e6e646 100644 +--- a/packages/amazonq/src/extension.ts ++++ b/packages/amazonq/src/extension.ts +@@ -119,7 +119,7 @@ export async function activateAmazonQCommon(context: vscode.ExtensionContext, is + } + // This contains every lsp agnostic things (auth, security scan, code scan) + await activateCodeWhisperer(extContext as ExtContext) +- if (Experiments.instance.get('amazonqLSP', false)) { ++ if (Experiments.instance.get('amazonqLSP', true)) { + await activateAmazonqLsp(context) + } + +diff --git a/packages/amazonq/src/extensionNode.ts b/packages/amazonq/src/extensionNode.ts +index d3e98b025..5a8b5082c 100644 +--- a/packages/amazonq/src/extensionNode.ts ++++ b/packages/amazonq/src/extensionNode.ts +@@ -53,7 +53,7 @@ async function activateAmazonQNode(context: vscode.ExtensionContext) { + extensionContext: context, + } + +- if (!Experiments.instance.get('amazonqChatLSP', false)) { ++ if (!Experiments.instance.get('amazonqChatLSP', true)) { + const appInitContext = DefaultAmazonQAppInitContext.instance + const provider = new AmazonQChatViewProvider( + context, +diff --git a/packages/amazonq/src/lsp/client.ts b/packages/amazonq/src/lsp/client.ts +index e45a3fdac..12341ff17 100644 +--- a/packages/amazonq/src/lsp/client.ts ++++ b/packages/amazonq/src/lsp/client.ts +@@ -117,7 +117,7 @@ export async function startLanguageServer( + ) + } + +- if (Experiments.instance.get('amazonqChatLSP', false)) { ++ if (Experiments.instance.get('amazonqChatLSP', true)) { + await activate(client, encryptionKey, resourcePaths.ui) + } +``` + +### Commit: 6ce383258 - "feat(sagemaker): free tier Q Chat with auto-login for iam users and login option for pro tier users (#5886)" + +This commit shows how the Amazon Q for VSCode extension was updated (core only as the LSP server was not used by this extension at the time) to use +IAM credentials from SageMaker. + +**Author:** Ahmed Ali (azkali) +**Date:** October 29, 2024 + +#### Key Auth-Related Changes: + +1. **SageMaker Cookie-Based Authentication Detection** in `packages/core/src/auth/activation.ts`: + + ```typescript + interface SagemakerCookie { + authMode?: 'Sso' | 'Iam' + } + + export async function initialize(loginManager: LoginManager): Promise { + if (isAmazonQ() && isSageMaker()) { + // The command `sagemaker.parseCookies` is registered in VS Code Sagemaker environment. + const result = (await vscode.commands.executeCommand('sagemaker.parseCookies')) as SagemakerCookie + if (result.authMode !== 'Sso') { + initializeCredentialsProviderManager() + } + } + ``` + +2. **New Credentials Provider Manager Initialization** in `packages/core/src/auth/utils.ts`: + + ```typescript + export function initializeCredentialsProviderManager() { + const manager = CredentialsProviderManager.getInstance() + manager.addProviderFactory(new SharedCredentialsProviderFactory()) + manager.addProviders( + new Ec2CredentialsProvider(), + new EcsCredentialsProvider(), + new EnvVarsCredentialsProvider() + ) + } + ``` + +3. **Modified CodeWhisperer Auth Validation** in `packages/core/src/codewhisperer/util/authUtil.ts`: + + ```typescript + // BEFORE: + if (isSageMaker()) { + return isIamConnection(conn) + } + + // AFTER: + return ( + (isSageMaker() && isIamConnection(conn)) || + (isCloud9('codecatalyst') && isIamConnection(conn)) || + (isSsoConnection(conn) && hasScopes(conn, codeWhispererCoreScopes)) + ) + ``` + +4. **Amazon Q Connection Validation Enhanced**: + + ```typescript + export const isValidAmazonQConnection = (conn?: Connection): conn is Connection => { + return ( + (isSageMaker() && isIamConnection(conn)) || + ((isSsoConnection(conn) || isBuilderIdConnection(conn)) && + isValidCodeWhispererCoreConnection(conn) && + hasScopes(conn, amazonQScopes)) + ) + } + ``` + +5. **Dual Chat Client Implementation** in `packages/core/src/codewhispererChat/clients/chat/v0/chat.ts`: + + ```typescript + // New IAM-based chat method + async chatIam(chatRequest: SendMessageRequest): Promise { + const client = await createQDeveloperStreamingClient() + const response = await client.sendMessage(chatRequest) + // ... session handling + } + + // Existing SSO-based chat method + async chatSso(chatRequest: GenerateAssistantResponseRequest): Promise { + const client = await createCodeWhispererChatStreamingClient() + // ... existing logic + } + ``` + +6. **Chat Controller Route Selection** in `packages/core/src/codewhispererChat/controllers/chat/controller.ts`: + ```typescript + if (isSsoConnection(AuthUtil.instance.conn)) { + const { $metadata, generateAssistantResponseResponse } = await session.chatSso(request) + response = { $metadata: $metadata, message: generateAssistantResponseResponse } + } else { + const { $metadata, sendMessageResponse } = await session.chatIam(request as SendMessageRequest) + response = { $metadata: $metadata, message: sendMessageResponse } + } + ``` + +#### Key Findings: + +- **Two separate Q API clients**: `createQDeveloperStreamingClient()` for IAM, `createCodeWhispererChatStreamingClient()` for SSO +- **SageMaker cookie-based auth detection**: Uses `sagemaker.parseCookies` command to determine auth mode +- **Automatic credential provider setup**: Initializes EC2, ECS, and environment variable credential providers for IAM users +- **Route selection based on connection type**: SSO connections use old client, IAM connections use new Q Developer client + +## Related Historical Fix - CodeWhisperer SageMaker Authentication + +**Commit ID:** `b125a1bd3b135344d2aa24961e746a10e55702c6` +**Author:** Lei Gao +**Date:** March 18, 2024 +**Title:** "fix(codewhisperer): completion error in sagemaker #4545" + +### Problem Identified + +In SageMaker Code Editor, CodeWhisperer was failing with: + +``` +Unexpected key 'optOutPreference' found in params +``` + +### Root Cause + +SageMaker environments require **GenerateRecommendation** API calls instead of **ListRecommendation** API calls for SigV4 authentication to work properly. + +### Fix Applied + +Modified `packages/core/src/codewhisperer/service/recommendationHandler.ts`: + +```typescript +// BEFORE: Used pagination logic that triggered ListRecommendation +if (pagination) { + // ListRecommendation request - FAILS in SageMaker +} + +// AFTER: SageMaker detection forces GenerateRecommendation +if (pagination && !isSM) { + // Added !isSM condition + // ListRecommendation only for non-SageMaker +} else { + // GenerateRecommendation for SageMaker (and non-pagination cases) +} +``` + +## Key Insights + +### Pattern Recognition + +Both issues share the same fundamental problem: **SageMaker environments have different API authentication requirements** that break standard AWS SDK calls. + +### Hypothesis for Current Issue + +The Amazon Q LSP (enabled by default in v1.63.0) is likely making API calls that: + +1. Work fine in standard environments +2. Fail in SageMaker due to different credential passing mechanisms +3. Require SageMaker-specific request formatting (similar to CodeWhisperer fix) + +# Files + +## aws-toolkit-vscode/packages/amazonq/src/extension.ts + +Starts both the old "core" CodeWhisperer code with `await activateCodeWhisperer(extContext as ExtContext)` on line ~121, followed by the new LSP code for Amazon Q. Maybe the dev team is slowly migrating functionality from core to amazonq and this is how they have both running at once. The code below is one of 3 places where the 'amazonqLSP' experiment is set to on by default in the commit that broke SageMaker auth. + +```typescript +// This contains every lsp agnostic things (auth, security scan, code scan) +await activateCodeWhisperer(extContext as ExtContext) +if (Experiments.instance.get('amazonqLSP', true)) { + await activateAmazonqLsp(context) +} +``` + +`activateAmazonqLsp` downloads and installs the language-servers bundle then executes the CodeWhisperer start up script (we should find the specific name and path) and initializes the LSP server, including auth set up. + +## aws-toolkit-vscode/packages/core/src/auth/activation.ts + +This file appears critical to how the SageMaker auth worked. It is in core however, and not clear whether it is even in the code path for the LSP server or not. We should review this file closely to understand how IAM credentials worked as it should inform us on what needs to change in the amazonq package to support IAM credentials as well. The `sagemaker.parseCookies` code here also seems important in determining whether the SageMaker instance wants to use IAM or SSO, so that should probably be carried over into the amazonq package as well. + +The `Auth.instance.onDidChangeActiveConnection` handler code should be investigated further. It's not clear if it has anything to do with auth to Q or if it's just older "toolkit"-related auth stuff. + +## aws-toolkit-vscode/packages/core/src/auth/utils.ts + +This is a collection of utility functions and many are related to auth/security. However, it appears to be `initializeCredentialsProviderManager` in our code path, called by `aws-toolkit-vscode/packages/core/src/auth/activation.ts` that may be of importance. We should determine if we need this or similar functionality in amazonq package or if this is just a hold-over that updates the old "toolkit" (i.e. non-Amazon Q parts of the extension) stuff. + +## aws-toolkit-vscode/packages/amazonq/src/lsp/client.ts + +1. line ~68 sets `providesBearerToken: true` but doesn't appear to have anything similar for IAM credentials. +2. line ~93 to the end starts auth for LSP using the `AmazonQLspAuth` class. This all appears to be for SSO tokens, nothing for IAM credentials. + +## aws-toolkit-vscode/packages/amazonq/src/lsp/auth.ts + +1. Defines `AmazonQLspAuth` class that is only for SSO tokens, nothing about IAM credentials. +2. Some SSO token related functions are exported, but nothing similar for IAM credentials. + +## aws-toolkit-vscode/packages/core/src/codewhisperer/activation.ts + +`activate` in the old "core" Q implementation is called by `aws-toolkit-vscode/packages/amazonq/src/extension.ts` line ~121. + +Suspcious code that is still running in `activate` function. How does this not interfer with the new auth code in the amazonq package? + +```typescript +// initialize AuthUtil earlier to make sure it can listen to connection change events. +const auth = AuthUtil.instance +auth.initCodeWhispererHooks() +``` + +Further down in this file it still creates and uses `onst client = new codewhispererClient.DefaultCodeWhispererClient()` which makes it appear to be using both direct calls from the extension as well as the LSP to access the Q service. This bears further investigation into what this code is actually doing. + +## aws-toolkit-vscode/packages/core/src/codewhisperer/client/codewhisperer.ts + +This is the old "core" CodeWhisperer service client. There is likely important code here that informs how IAM authentication works with the service client that may be missing in the language-servers CodeWhisperer client. If my hunch is correct in that the "core" code is still in use for what hasn't been migrated yet, this code may not be actively used for Q Chat which was migrated (see the Experiments flags defaulting to true in the breaking commit) to the amazonq package and should be using the auth there and in language-servers. + +## aws-toolkit-vscode/packages/amazonq/src/extensionNode.ts + +The code below is one of 3 places where the 'amazonqChatLSP' experiment is set to on by default in the commit that broke SageMaker auth. There is some "auth"-related code in this file that should be investigated further to determine if it has any impact on the broken SageMaker auth. It isn't obvious that it does or doesn't. It may just be used in the MynahUI Q Chat webview, and not the LSP server. + +```typescript +if (!Experiments.instance.get('amazonqChatLSP', true)) { +``` + +## aws-toolkit-vscode/packages/core/src/auth/auth.ts + +This file was updated recently for the SMUS project. It may not be directly related to the broken SageMaker auth issue, but the comments on the added/changed functions are suspicious regarding how credentials are received. SMUS may be adding a different way to get IAM credentials than what SMAI used. + +```typescript +/** + * Returns true if credentials are provided by the environment (ex. via ~/.aws/) + * + * @param isC9 boolean for if Cloud9 is host + * @param isSM boolean for if SageMaker is host + * @returns boolean for if C9 "OR" SM + */ +export function hasVendedIamCredentials(isC9?: boolean, isSM?: boolean) { + isC9 ??= isCloud9() + isSM ??= isSageMaker() + return isSM || isC9 +} + +/** + * Returns true if credentials are provided by the metadata files in environment (ex. for IAM via ~/.aws/ and in a future case with SSO, from /cache or /sso) + * @param isSMUS boolean if SageMaker Unified Studio is host + * @returns boolean if SMUS + */ +export function hasVendedCredentialsFromMetadata(isSMUS?: boolean) { + isSMUS ??= isSageMaker('SMUS') + return isSMUS +} +``` + +There is also A LOT of other auth related functionality here, but it's in "core" and may not be directly related the code paths for LSP and breaking auth in SageMaker. + +## aws-toolkit-vscode/packages/core/src/codewhisperer/util/authUtil.ts + +There is some `isSageMaker`-related code here that we should investigate. It appears to be important to auth with SageMaker, but it's not clear if it or similar code is needed and has made it into the amazonq package. Once we confirm any of this code is in our code path of concern, it should be investigated further. + +## aws-toolkit-vscode/packages/amazonq/src/lsp/chat/webviewProvider.ts + +While there is special SageMaker handling in this file, it is not clear if it is related to IAM auth issues with the LSP or it is just related to the chat UI. If we find it is in our code path, we can investigate further. + +# Proposed Fix for SageMaker IAM Authentication in Amazon Q LSP + +> **NOTE:** We should start back tomorrow by addressing the issues and concerns raised in this document first thing, particularly the SageMaker cookie detection and connection metadata handling for IAM authentication. + +## Issue Summary + +The Amazon Q extension for VSCode fails to authenticate in SageMaker environments after v1.62.0 due to a change in architecture. The extension moved from directly calling the Q service using an AWS SDK client to indirectly calling it through the aws-lsp-codewhisperer service (Flare). While the old implementation had specific handling for SageMaker IAM credentials, the new LSP-based implementation only supports SSO token authentication. + +## Root Cause Analysis + +### Breaking Change + +Commit `938bb376647414776a55d7dd7d6761c863764c5c` enabled three experiment flags by default: + +1. `amazonqLSP` in `packages/amazonq/src/extension.ts` (line ~119) - Controls whether to activate the Amazon Q LSP +2. `amazonqChatLSP` in `packages/amazonq/src/extensionNode.ts` (line ~53) - Controls whether to use the legacy chat provider or the LSP-based chat provider +3. `amazonqChatLSP` in `packages/amazonq/src/lsp/client.ts` (line ~117) - Controls whether to activate the chat functionality in the LSP client + +This change moved the extension from using the core implementation to the LSP implementation, which lacks IAM credential support. + +### Recent IAM Support in Language-Servers Repository + +A significant recent commit in the language-servers repository adds IAM authentication support: + +**Commit ID:** 16b287b9e +**Author:** sdharani91 +**Date:** 2025-06-26 +**Title:** feat: enable iam auth for agentic chat (#1736) + +Key changes in this commit: + +1. **Environment Variable Flag**: + + ```typescript + // Added function to check for IAM auth mode + export function isUsingIAMAuth(): boolean { + return process.env.USE_IAM_AUTH === 'true' + } + ``` + +2. **Service Manager Selection**: + + ```typescript + // In qAgenticChatServer.ts + amazonQServiceManager = isUsingIAMAuth() ? getOrThrowBaseIAMServiceManager() : getOrThrowBaseTokenServiceManager() + ``` + +3. **IAM Credentials Handling**: + + ```typescript + // Added function to extract IAM credentials + export function getIAMCredentialsFromProvider(credentialsProvider: CredentialsProvider) { + if (!credentialsProvider.hasCredentials('iam')) { + throw new Error('Missing IAM creds') + } + + const credentials = credentialsProvider.getCredentials('iam') as Credentials + return { + accessKeyId: credentials.accessKeyId, + secretAccessKey: credentials.secretAccessKey, + sessionToken: credentials.sessionToken, + } + } + ``` + +4. **Unified Chat Response Interface**: + + ```typescript + // Created types to handle both auth flows + export type ChatCommandInput = SendMessageCommandInput | GenerateAssistantResponseCommandInputCodeWhispererStreaming + export type ChatCommandOutput = + | SendMessageCommandOutput + | GenerateAssistantResponseCommandOutputCodeWhispererStreaming + ``` + +5. **Source Parameter for IAM**: + ```typescript + // Added source parameter for IAM requests + request.source = 'IDE' + ``` + +This commit shows that IAM authentication support has been added to the language-servers repository, but the extension needs to set the `USE_IAM_AUTH` environment variable to `true` when running in SageMaker environments. + +## Proposed Fix + +Based on our investigation of the language-server-runtimes repository and the previous implementation attempt, here's a refined solution: + +1. **Set Environment Variable for IAM Auth**: + + ```typescript + // In packages/core/src/shared/lsp/utils/platform.ts + const env = { ...process.env } + if (isSageMaker()) { + // Check SageMaker cookie to determine auth mode + try { + const result = await vscode.commands.executeCommand('sagemaker.parseCookies') + if (result?.authMode !== 'Sso') { + env.USE_IAM_AUTH = 'true' + getLogger().info(`[SageMaker Debug] Setting USE_IAM_AUTH=true for language server process`) + } + } catch (err) { + getLogger().error('Failed to parse SageMaker cookies: %O', err) + // Default to IAM auth if cookie parsing fails + env.USE_IAM_AUTH = 'true' + getLogger().info(`[SageMaker Debug] Setting USE_IAM_AUTH=true for language server process (default)`) + } + } + + const lspProcess = new ChildProcess(bin, args, { + warnThresholds, + spawnOptions: { env }, + }) + ``` + +2. **Enhance `AmazonQLspAuth` Class** (`packages/amazonq/src/lsp/auth.ts`): + + ```typescript + async refreshConnection(force: boolean = false) { + const activeConnection = this.authUtil.conn + if (this.authUtil.isConnectionValid()) { + if (isSsoConnection(activeConnection)) { + // Existing SSO path + const token = await this.authUtil.getBearerToken() + await (force ? this._updateBearerToken(token) : this.updateBearerToken(token)) + } else if (isSageMaker() && isIamConnection(activeConnection)) { + // SageMaker IAM path + try { + const credentials = await this.authUtil.getCredentials() + if (credentials && credentials.accessKeyId && credentials.secretAccessKey) { + await (force ? this._updateIamCredentials(credentials) : this.updateIamCredentials(credentials)) + } else { + getLogger().error('Invalid IAM credentials: %O', credentials) + } + } catch (err) { + getLogger().error('Failed to get IAM credentials: %O', err) + } + } + } + } + + public updateIamCredentials = onceChanged(this._updateIamCredentials.bind(this)) + private async _updateIamCredentials(credentials: any) { + try { + // Extract only the required fields to match the expected format + const iamCredentials = { + accessKeyId: credentials.accessKeyId, + secretAccessKey: credentials.secretAccessKey, + sessionToken: credentials.sessionToken, + } + + const request = await this.createUpdateIamCredentialsRequest(iamCredentials) + await this.client.sendRequest(iamCredentialsUpdateRequestType.method, request) + this.client.info(`UpdateIamCredentials: Success`) + } catch (err) { + getLogger().error('Failed to update IAM credentials: %O', err) + } + } + ``` + +3. **Update Connection Metadata Handler** (`packages/amazonq/src/lsp/client.ts`): + + ```typescript + client.onRequest(notificationTypes.getConnectionMetadata.method, () => { + // For IAM auth, provide a default startUrl + if (process.env.USE_IAM_AUTH === 'true') { + return { + sso: { + startUrl: 'https://amzn.awsapps.com/start', // Default for IAM auth + }, + } + } + + // For SSO auth, use the actual startUrl + return { + sso: { + startUrl: AuthUtil.instance.auth.startUrl, + }, + } + }) + ``` + +4. **Modify Client Initialization** (`packages/amazonq/src/lsp/client.ts`): + + ```typescript + const useIamAuth = isSageMaker() && process.env.USE_IAM_AUTH === 'true' + + initializationOptions: { + // ... + credentials: { + providesBearerToken: !useIamAuth, + providesIam: useIamAuth, + }, + } + ``` + +5. **Ensure Auto-login Happens Early** (`packages/amazonq/src/lsp/activation.ts`): + ```typescript + export async function activate(ctx: vscode.ExtensionContext): Promise { + try { + // Check for SageMaker and auto-login if needed + if (isSageMaker()) { + try { + const result = await vscode.commands.executeCommand('sagemaker.parseCookies') + if (result?.authMode !== 'Sso') { + // Auto-login with IAM credentials + const sagemakerProfileId = asString({ + credentialSource: 'ec2', + credentialTypeId: 'sagemaker-instance', + }) + await Auth.instance.tryAutoConnect(sagemakerProfileId) + getLogger().info(`Automatically connected with SageMaker IAM credentials`) + } + } catch (err) { + getLogger().error('Failed to parse SageMaker cookies: %O', err) + } + } + + await lspSetupStage('all', async () => { + const installResult = await new AmazonQLspInstaller().resolve() + await lspSetupStage('launch', async () => await startLanguageServer(ctx, installResult.resourcePaths)) + }) + } catch (err) { + const e = err as ToolkitError + void vscode.window.showInformationMessage(`Unable to launch amazonq language server: ${e.message}`) + } + } + ``` + +This refined solution addresses the issues identified in the previous implementation attempt: + +1. It properly checks the SageMaker cookie to determine the auth mode +2. It ensures the IAM credentials are formatted correctly +3. It adds robust error handling +4. It ensures auto-login happens early in the initialization process + +# Next Steps + +## Plan for SageMaker Environment Testing + +We are going to set up a comprehensive testing environment on the SageMaker instance to debug and fix the IAM authentication issue: + +1. **Repository Setup**: + + - Clone aws-toolkit-vscode repository (already done locally) + - Clone language-servers repository to SageMaker instance + - Configure aws-toolkit-vscode to use local build of language-servers instead of downloaded version + +2. **Development Workflow**: + + - Make changes to language-servers codebase directly on SageMaker instance + - Add comprehensive logging throughout the authentication flow + - Test changes immediately in the SageMaker environment where the issue occurs + - Use `amazonq.trace.server` setting for detailed LSP server logs + +3. **Key Areas to Investigate**: + + - Verify that `USE_IAM_AUTH` environment variable is properly set and inherited + - Confirm IAM credentials are correctly passed from extension to language server + - Validate that language server selects correct service manager based on auth mode + - Test that SageMaker cookie detection works properly + +4. **Debugging Strategy**: + + - Follow the exact code execution path from extension activation to LSP authentication + - Add logging at each critical step to trace the authentication flow + - Capture and analyze any errors or failures in the authentication process + - Compare behavior between working SSO environments and failing SageMaker IAM environment + +5. **Implementation Priority**: + - First implement SageMaker cookie detection to determine auth mode + - Add IAM credential handling to AmazonQLspAuth class + - Ensure proper environment variable setting for language server process + - Test and validate the complete authentication flow + +This approach will allow us to make real-time changes and immediately test them in the actual environment where the authentication failure occurs, giving us the best chance to identify and fix the root cause. + +## Critical Issues to Address First + +The document emphasizes that we should **"start back tomorrow by addressing the issues and concerns raised in this document first thing, particularly the SageMaker cookie detection and connection metadata handling for IAM authentication."** + +The most critical missing pieces are: + +1. **SageMaker cookie detection** to determine when to use IAM vs SSO auth +2. **Connection metadata handling** for IAM authentication +3. **Proper error handling** throughout the authentication flow + +These should be implemented before testing the solution in a SageMaker environment. diff --git a/packages/amazonq/src/lsp/auth.ts b/packages/amazonq/src/lsp/auth.ts index 0bfee98f2e2..161ba4d9762 100644 --- a/packages/amazonq/src/lsp/auth.ts +++ b/packages/amazonq/src/lsp/auth.ts @@ -5,6 +5,7 @@ import { bearerCredentialsUpdateRequestType, + iamCredentialsUpdateRequestType, ConnectionMetadata, NotificationType, RequestType, @@ -17,8 +18,8 @@ import { LanguageClient } from 'vscode-languageclient' import { AuthUtil } from 'aws-core-vscode/codewhisperer' import { Writable } from 'stream' import { onceChanged } from 'aws-core-vscode/utils' -import { getLogger, oneMinute } from 'aws-core-vscode/shared' -import { isSsoConnection } from 'aws-core-vscode/auth' +import { getLogger, oneMinute, isSageMaker } from 'aws-core-vscode/shared' +import { isSsoConnection, isIamConnection } from 'aws-core-vscode/auth' export const encryptionKey = crypto.randomBytes(32) @@ -78,10 +79,16 @@ export class AmazonQLspAuth { */ async refreshConnection(force: boolean = false) { const activeConnection = this.authUtil.conn - if (this.authUtil.isConnectionValid() && isSsoConnection(activeConnection)) { - // send the token to the language server - const token = await this.authUtil.getBearerToken() - await (force ? this._updateBearerToken(token) : this.updateBearerToken(token)) + if (this.authUtil.isConnectionValid()) { + if (isSsoConnection(activeConnection)) { + // Existing SSO path + const token = await this.authUtil.getBearerToken() + await (force ? this._updateBearerToken(token) : this.updateBearerToken(token)) + } else if (isSageMaker() && isIamConnection(activeConnection)) { + // New SageMaker IAM path + const credentials = await this.authUtil.getCredentials() + await (force ? this._updateIamCredentials(credentials) : this.updateIamCredentials(credentials)) + } } } @@ -92,9 +99,7 @@ export class AmazonQLspAuth { public updateBearerToken = onceChanged(this._updateBearerToken.bind(this)) private async _updateBearerToken(token: string) { - const request = await this.createUpdateCredentialsRequest({ - token, - }) + const request = await this.createUpdateBearerCredentialsRequest(token) // "aws/credentials/token/update" // https://github.com/aws/language-servers/blob/44d81f0b5754747d77bda60b40cc70950413a737/core/aws-lsp-core/src/credentials/credentialsProvider.ts#L27 @@ -103,6 +108,26 @@ export class AmazonQLspAuth { this.client.info(`UpdateBearerToken: ${JSON.stringify(request)}`) } + public updateIamCredentials = onceChanged(this._updateIamCredentials.bind(this)) + private async _updateIamCredentials(credentials: any) { + getLogger().info( + `[SageMaker Debug] Updating IAM credentials - credentials received: ${credentials ? 'YES' : 'NO'}` + ) + if (credentials) { + getLogger().info( + `[SageMaker Debug] IAM credentials structure: accessKeyId=${credentials.accessKeyId ? 'present' : 'missing'}, secretAccessKey=${credentials.secretAccessKey ? 'present' : 'missing'}, sessionToken=${credentials.sessionToken ? 'present' : 'missing'}` + ) + } + + const request = await this.createUpdateIamCredentialsRequest(credentials) + + // "aws/credentials/iam/update" + await this.client.sendRequest(iamCredentialsUpdateRequestType.method, request) + + this.client.info(`UpdateIamCredentials: ${JSON.stringify(request)}`) + getLogger().info(`[SageMaker Debug] IAM credentials update request sent successfully`) + } + public startTokenRefreshInterval(pollingTime: number = oneMinute / 2) { const interval = setInterval(async () => { await this.refreshConnection().catch((e) => this.logRefreshError(e)) @@ -110,8 +135,9 @@ export class AmazonQLspAuth { return interval } - private async createUpdateCredentialsRequest(data: any): Promise { - const payload = new TextEncoder().encode(JSON.stringify({ data })) + private async createUpdateBearerCredentialsRequest(token: string): Promise { + const bearerCredentials = { token } + const payload = new TextEncoder().encode(JSON.stringify({ data: bearerCredentials })) const jwt = await new jose.CompactEncrypt(payload) .setProtectedHeader({ alg: 'dir', enc: 'A256GCM' }) @@ -127,4 +153,24 @@ export class AmazonQLspAuth { encrypted: true, } } + + private async createUpdateIamCredentialsRequest(credentials: any): Promise { + // Extract IAM credentials structure + const iamCredentials = { + accessKeyId: credentials.accessKeyId, + secretAccessKey: credentials.secretAccessKey, + sessionToken: credentials.sessionToken, + } + const payload = new TextEncoder().encode(JSON.stringify({ data: iamCredentials })) + + const jwt = await new jose.CompactEncrypt(payload) + .setProtectedHeader({ alg: 'dir', enc: 'A256GCM' }) + .encrypt(encryptionKey) + + return { + data: jwt, + // Omit metadata for IAM credentials since startUrl is undefined for non-SSO connections + encrypted: true, + } + } } diff --git a/packages/amazonq/src/lsp/chat/messages.ts b/packages/amazonq/src/lsp/chat/messages.ts index 918abb46f40..b60c40e25b6 100644 --- a/packages/amazonq/src/lsp/chat/messages.ts +++ b/packages/amazonq/src/lsp/chat/messages.ts @@ -285,6 +285,31 @@ export function registerMessageListeners( } const chatRequest = await encryptRequest(chatParams, encryptionKey) + + // Add detailed logging for SageMaker debugging + if (process.env.USE_IAM_AUTH === 'true') { + languageClient.info(`[SageMaker Debug] Making chat request with IAM auth`) + languageClient.info(`[SageMaker Debug] Chat request method: ${chatRequestType.method}`) + languageClient.info( + `[SageMaker Debug] Original chat params: ${JSON.stringify( + { + tabId: chatParams.tabId, + prompt: chatParams.prompt, + // Don't log full textDocument content, just metadata + textDocument: chatParams.textDocument + ? { uri: chatParams.textDocument.uri } + : undefined, + context: chatParams.context ? `${chatParams.context.length} context items` : undefined, + }, + undefined, + 2 + )}` + ) + languageClient.info( + `[SageMaker Debug] Environment context: USE_IAM_AUTH=${process.env.USE_IAM_AUTH}, AWS_REGION=${process.env.AWS_REGION}` + ) + } + try { const chatResult = await languageClient.sendRequest( chatRequestType.method, @@ -294,6 +319,26 @@ export function registerMessageListeners( }, cancellationToken.token ) + + // Add response content logging for SageMaker debugging + if (process.env.USE_IAM_AUTH === 'true') { + languageClient.info(`[SageMaker Debug] Chat response received - type: ${typeof chatResult}`) + if (typeof chatResult === 'string') { + languageClient.info( + `[SageMaker Debug] Chat response (string): ${chatResult.substring(0, 200)}...` + ) + } else if (chatResult && typeof chatResult === 'object') { + languageClient.info( + `[SageMaker Debug] Chat response (object keys): ${Object.keys(chatResult)}` + ) + if ('message' in chatResult) { + languageClient.info( + `[SageMaker Debug] Chat response message: ${JSON.stringify(chatResult.message).substring(0, 200)}...` + ) + } + } + } + await handleCompleteResult( chatResult, encryptionKey, diff --git a/packages/amazonq/src/lsp/client.ts b/packages/amazonq/src/lsp/client.ts index e9066678698..9d810862bbc 100644 --- a/packages/amazonq/src/lsp/client.ts +++ b/packages/amazonq/src/lsp/client.ts @@ -16,6 +16,7 @@ import { RenameFilesParams, ResponseMessage, WorkspaceFolder, + ConnectionMetadata, } from '@aws/language-server-runtimes/protocol' import { AuthUtil, @@ -181,10 +182,11 @@ export async function startLanguageServer( contextConfiguration: { workspaceIdentifier: extensionContext.storageUri?.path, }, - logLevel: toAmazonQLSPLogLevel(globals.logOutputChannel.logLevel), + logLevel: isSageMaker() ? 'debug' : toAmazonQLSPLogLevel(globals.logOutputChannel.logLevel), }, credentials: { providesBearerToken: true, + providesIam: isSageMaker(), // Enable IAM credentials for SageMaker environments }, }, /** @@ -210,6 +212,32 @@ export async function startLanguageServer( toDispose.push(disposable) await client.onReady() + // Set up connection metadata handler + client.onRequest(notificationTypes.getConnectionMetadata.method, () => { + // For IAM auth, provide a default startUrl + if (process.env.USE_IAM_AUTH === 'true') { + getLogger().info( + `[SageMaker Debug] Connection metadata requested - returning hardcoded startUrl for IAM auth` + ) + return { + sso: { + // TODO P261194666 Replace with correct startUrl once identified + startUrl: 'https://amzn.awsapps.com/start', // Default for IAM auth + }, + } + } + + // For SSO auth, use the actual startUrl + getLogger().info( + `[SageMaker Debug] Connection metadata requested - returning actual startUrl for SSO auth: ${AuthUtil.instance.auth.startUrl}` + ) + return { + sso: { + startUrl: AuthUtil.instance.auth.startUrl, + }, + } + }) + const auth = await initializeAuth(client) await onLanguageServerReady(extensionContext, auth, client, resourcePaths, toDispose) diff --git a/packages/core/package.json b/packages/core/package.json index 6c2ec740915..baa8446aea9 100644 --- a/packages/core/package.json +++ b/packages/core/package.json @@ -455,7 +455,7 @@ "webpackDev": "webpack --mode development", "serveVue": "ts-node ./scripts/build/checkServerPort.ts && webpack serve --port 8080 --config-name vue --mode development", "watch": "npm run clean && npm run buildScripts && npm run compileOnly -- --watch", - "lint": "ts-node ./scripts/lint/testLint.ts", + "lint": "node --max-old-space-size=8192 -r ts-node/register ./scripts/lint/testLint.ts", "generateClients": "ts-node ./scripts/build/generateServiceClient.ts ", "generateIcons": "ts-node ../../scripts/generateIcons.ts", "generateTelemetry": "node ../../node_modules/@aws-toolkits/telemetry/lib/generateTelemetry.js --extraInput=src/shared/telemetry/vscodeTelemetry.json --output=src/shared/telemetry/telemetry.gen.ts" diff --git a/packages/core/scripts/lint/testLint.ts b/packages/core/scripts/lint/testLint.ts index d215e57f675..52b22d12fd6 100644 --- a/packages/core/scripts/lint/testLint.ts +++ b/packages/core/scripts/lint/testLint.ts @@ -9,7 +9,9 @@ void (async () => { try { console.log('Running linting tests...') - const mocha = new Mocha() + const mocha = new Mocha({ + timeout: 5000, + }) const testFiles = await glob('dist/src/testLint/**/*.test.js') for (const file of testFiles) { diff --git a/packages/core/src/auth/auth.ts b/packages/core/src/auth/auth.ts index 5cc9f810baa..6962b85bfa9 100644 --- a/packages/core/src/auth/auth.ts +++ b/packages/core/src/auth/auth.ts @@ -1032,6 +1032,16 @@ export class Auth implements AuthService, ConnectionManager { } } } + + // Add conditional auto-login logic for SageMaker (jmkeyes@ guidance) + if (hasVendedIamCredentials() && isSageMaker()) { + // SageMaker auto-login logic - use 'ec2' source since SageMaker uses EC2-like instance credentials + const sagemakerProfileId = asString({ credentialSource: 'ec2', credentialTypeId: 'sagemaker-instance' }) + if ((await tryConnection(sagemakerProfileId)) === true) { + getLogger().info(`auth: automatically connected with SageMaker credentials`) + return + } + } } /** diff --git a/packages/core/src/shared/lsp/utils/platform.ts b/packages/core/src/shared/lsp/utils/platform.ts index fadeefb7e68..6928a6eb0ce 100644 --- a/packages/core/src/shared/lsp/utils/platform.ts +++ b/packages/core/src/shared/lsp/utils/platform.ts @@ -3,11 +3,18 @@ * SPDX-License-Identifier: Apache-2.0 */ +import * as vscode from 'vscode' import { ToolkitError } from '../../errors' import { Logger } from '../../logger/logger' import { ChildProcess } from '../../utilities/processUtils' import { waitUntil } from '../../utilities/timeoutUtils' import { isDebugInstance } from '../../vscode/env' +import { isSageMaker } from '../../extensionUtilities' +import { getLogger } from '../../logger/logger' + +interface SagemakerCookie { + authMode?: 'Sso' | 'Iam' +} export function getNodeExecutableName(): string { return process.platform === 'win32' ? 'node.exe' : 'node' @@ -101,7 +108,40 @@ export function createServerOptions({ args.unshift('--inspect=6080') } - const lspProcess = new ChildProcess(bin, args, { warnThresholds }) + // Set USE_IAM_AUTH environment variable for SageMaker environments based on cookie detection + // This tells the language server to use IAM authentication mode instead of SSO mode + const env = { ...process.env } + if (isSageMaker()) { + try { + // The command `sagemaker.parseCookies` is registered in VS Code SageMaker environment + const result = (await vscode.commands.executeCommand('sagemaker.parseCookies')) as SagemakerCookie + if (result.authMode !== 'Sso') { + env.USE_IAM_AUTH = 'true' + getLogger().info( + `[SageMaker Debug] Setting USE_IAM_AUTH=true for language server process (authMode: ${result.authMode})` + ) + } else { + getLogger().info(`[SageMaker Debug] Using SSO auth mode, not setting USE_IAM_AUTH`) + } + } catch (err) { + getLogger().warn(`[SageMaker Debug] Failed to parse SageMaker cookies, defaulting to IAM auth: ${err}`) + env.USE_IAM_AUTH = 'true' + } + + // Log important environment variables for debugging + getLogger().info(`[SageMaker Debug] Environment variables for language server:`) + getLogger().info(`[SageMaker Debug] USE_IAM_AUTH: ${env.USE_IAM_AUTH}`) + getLogger().info( + `[SageMaker Debug] AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: ${env.AWS_CONTAINER_CREDENTIALS_RELATIVE_URI}` + ) + getLogger().info(`[SageMaker Debug] AWS_DEFAULT_REGION: ${env.AWS_DEFAULT_REGION}`) + getLogger().info(`[SageMaker Debug] AWS_REGION: ${env.AWS_REGION}`) + } + + const lspProcess = new ChildProcess(bin, args, { + warnThresholds, + spawnOptions: { env }, + }) // this is a long running process, awaiting it will never resolve void lspProcess.run() diff --git a/packages/core/src/testLint/eslint.test.ts b/packages/core/src/testLint/eslint.test.ts index ccf670a1cee..fc3607008bb 100644 --- a/packages/core/src/testLint/eslint.test.ts +++ b/packages/core/src/testLint/eslint.test.ts @@ -12,6 +12,8 @@ describe('eslint', function () { it('passes eslint', function () { const result = runCmd( [ + 'node', + '--max-old-space-size=8192', '../../node_modules/.bin/eslint', '-c', '../../.eslintrc.js',