Skip to content

fix: turn atlas-connect-cluster async #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 10, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/logger.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ export const LogId = {
atlasDeleteDatabaseUserFailure: mongoLogId(1_001_002),
atlasConnectFailure: mongoLogId(1_001_003),
atlasInspectFailure: mongoLogId(1_001_004),
atlasConnectAttempt: mongoLogId(1_001_005),
atlasConnectSucceeded: mongoLogId(1_001_006),

telemetryDisabled: mongoLogId(1_002_001),
telemetryEmitFailure: mongoLogId(1_002_002),
Expand Down
128 changes: 123 additions & 5 deletions src/tools/atlas/metadata/connectCluster.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,42 @@ const EXPIRY_MS = 1000 * 60 * 60 * 12; // 12 hours
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}

export class ConnectClusterTool extends AtlasToolBase {
protected name = "atlas-connect-cluster";
protected description = "Connect to MongoDB Atlas cluster";
protected description = "Connect to / Inspect connection of MongoDB Atlas cluster";
protected operationType: OperationType = "metadata";
protected argsShape = {
projectId: z.string().describe("Atlas project ID"),
clusterName: z.string().describe("Atlas cluster name"),
};

protected async execute({ projectId, clusterName }: ToolArgs<typeof this.argsShape>): Promise<CallToolResult> {
await this.session.disconnect();
private async queryConnection(
projectId: string,
clusterName: string
): Promise<"connected" | "disconnected" | "connecting" | "connected-to-other-cluster"> {
if (!this.session.connectedAtlasCluster) {
return "disconnected";
}

if (
this.session.connectedAtlasCluster.projectId !== projectId ||
this.session.connectedAtlasCluster.clusterName !== clusterName
) {
return "connected-to-other-cluster";
}

if (!this.session.serviceProvider) {
return "connecting";
}

await this.session.serviceProvider.runCommand("admin", {
ping: 1,
});
return "connected";
Copy link
Preview

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the runCommand('admin', { ping: 1 }) call in a try/catch so transient ping errors don’t bubble up and trigger a full reconnection flow prematurely.

Suggested change
await this.session.serviceProvider.runCommand("admin", {
ping: 1,
});
return "connected";
try {
await this.session.serviceProvider.runCommand("admin", {
ping: 1,
});
return "connected";
} catch (error) {
logger.warn(LogId.ConnectionPingError, `Ping command failed: ${error.message}`);
return "connecting";
}

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, I'm bubbling up the error

}

private async prepareClusterConnection(projectId: string, clusterName: string): Promise<string> {
const cluster = await inspectCluster(this.session.apiClient, projectId, clusterName);

if (!cluster.connectionString) {
Expand Down Expand Up @@ -83,9 +107,20 @@ export class ConnectClusterTool extends AtlasToolBase {
cn.searchParams.set("authSource", "admin");
const connectionString = cn.toString();

return connectionString;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const connectionString = cn.toString();
return connectionString;
return cn.toString();

}

private async connectToCluster(connectionString: string): Promise<void> {
let lastError: Error | undefined = undefined;

for (let i = 0; i < 20; i++) {
logger.debug(
LogId.atlasConnectAttempt,
"atlas-connect-cluster",
`attempting to connect to cluster: ${this.session.connectedAtlasCluster?.clusterName}`
);

for (let i = 0; i < 600; i++) {
// try for 5 minutes
Copy link
Preview

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the retry count (600) and delay (500ms) into named constants to improve readability and ease future adjustments.

Suggested change
for (let i = 0; i < 600; i++) {
// try for 5 minutes
for (let i = 0; i < RETRY_COUNT; i++) {
// try for RETRY_COUNT attempts

Copilot uses AI. Check for mistakes.

try {
await this.session.connectToMongoDB(connectionString, this.config.connectOptions);
lastError = undefined;
Expand All @@ -106,14 +141,97 @@ export class ConnectClusterTool extends AtlasToolBase {
}

if (lastError) {
void this.session.apiClient
.deleteDatabaseUser({
params: {
path: {
groupId: this.session.connectedAtlasCluster?.projectId || "",
username: this.session.connectedAtlasCluster?.username || "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are not set, does it make sense to make that call at all?

databaseName: "admin",
},
},
})
.catch((err: unknown) => {
const error = err instanceof Error ? err : new Error(String(err));
logger.debug(
LogId.atlasConnectFailure,
"atlas-connect-cluster",
`error deleting database user: ${error.message}`
);
});
this.session.connectedAtlasCluster = undefined;
throw lastError;
}

logger.debug(
LogId.atlasConnectSucceeded,
"atlas-connect-cluster",
`connected to cluster: ${this.session.connectedAtlasCluster?.clusterName}`
);
}

protected async execute({ projectId, clusterName }: ToolArgs<typeof this.argsShape>): Promise<CallToolResult> {
try {
const state = await this.queryConnection(projectId, clusterName);
switch (state) {
case "connected":
return {
content: [
{
type: "text",
text: "Cluster is already connected.",
},
],
};
case "connecting":
return {
content: [
{
type: "text",
text: "Cluster is connecting...",
},
],
};
case "connected-to-other-cluster":
case "disconnected":
default:
// fall through to create new connection
break;
}
} catch (err: unknown) {
const error = err instanceof Error ? err : new Error(String(err));
logger.debug(
LogId.atlasConnectFailure,
"atlas-connect-cluster",
`error querying cluster: ${error.message}`
);
// fall through to create new connection
}

await this.session.disconnect();
const connectionString = await this.prepareClusterConnection(projectId, clusterName);
process.nextTick(async () => {
try {
await this.connectToCluster(connectionString);
} catch (err: unknown) {
const error = err instanceof Error ? err : new Error(String(err));
logger.debug(
LogId.atlasConnectFailure,
"atlas-connect-cluster",
`error connecting to cluster: ${error.message}`
);
}
});

return {
content: [
{
type: "text",
text: `Connected to cluster "${clusterName}"`,
text: `Attempting to connect to cluster "${clusterName}"...`,
},
{
type: "text",
text: `Warning: Check again in a few seconds.`,
},
],
};
Expand Down
20 changes: 18 additions & 2 deletions tests/integration/tools/atlas/clusters.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,24 @@ describeWithAtlas("clusters", (integration) => {
arguments: { projectId, clusterName },
})) as CallToolResult;
expect(response.content).toBeArray();
expect(response.content).toHaveLength(1);
expect(response.content[0]?.text).toContain(`Connected to cluster "${clusterName}"`);
expect(response.content).toHaveLength(2);
expect(response.content[0]?.type).toEqual("text");
expect(response.content[0]?.text).toContain(`Attempting to connect to cluster "${clusterName}"...`);

for (let i = 0; i < 600; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we added the 30 second retry logic in the connect tool, this may be a bit excessive - worst case scenario, this will result in 5 hours of waiting for the test to fail.

Copy link
Collaborator

@gagik gagik Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect jest has some default test timeouts. maybe we can set an explicit timeout here and turn this into a while loop or create a waitFor / retry helper

Copy link
Collaborator Author

@fmenezes fmenezes Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not the case, on subsequent calls we don't try for 30 secs, we know there is a background process running so we return Attempting ... message straight away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to reflect what we discussed offline, now we always wait 30 secs, adjusted the test to 10 times only

const response = (await integration.mcpClient().callTool({
name: "atlas-connect-cluster",
arguments: { projectId, clusterName },
})) as CallToolResult;
expect(response.content).toBeArray();
expect(response.content).toHaveLength(1);
expect(response.content[0]?.type).toEqual("text");
const c = response.content[0] as { text: string };
if (c.text.includes("Cluster is already connected.")) {
break; // success
}
await sleep(500);
}
});
});
});
Expand Down
Loading