Skip to content

Commit 3593c5e

Browse files
authored
feat(sagemaker-unified-studio-spark-upgrade-mcp-server): Add Spark Upgrade Agent ReadMe for Remote MCP support (#1884)
* feat(sagemaker-unified-studio-spark-upgrade-mcp-server): Add Spark Upgrade Agent ReadMe for Remote MCP support * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Add Spark Upgrade version and fix read me feature * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Add Image and fix comments * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Add Image and fix comments * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Update Architecture steps * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Update Architecture steps * feat(sagemaker-unified-studio-mcp-spark-troubleshooting): Add readme * Revert "feat(sagemaker-unified-studio-mcp-spark-troubleshooting): Add readme" This reverts commit 014b3f3. * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Update capabilities * chore(sagemaker-unified-studio-spark-upgrade-mcp-server): Address Comments
1 parent e6c4675 commit 3593c5e

File tree

1 file changed

+164
-0
lines changed
  • src/sagemaker-unified-studio-spark-upgrade-mcp-server

1 file changed

+164
-0
lines changed
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# SageMaker Unified Studio MCP for Spark Upgrade
2+
3+
A fully managed remote MCP server that provides specialized tools and guidance for upgrading Apache Spark applications on Amazon EMR. This server accelerates Spark version upgrades through automated analysis, code transformation, and validation capabilities.
4+
5+
**Important Note**: Not all MCP clients today support remote servers. Please make sure that your client supports remote MCP servers or that you have a suitable proxy setup to use this server.
6+
7+
## Key Features & Capabilities
8+
9+
- **Project Analysis & Planning**: Deep analysis of Spark application structure, dependencies, and API usage to generate comprehensive step-by-step upgrade plans with risk assessment
10+
- **Automated Code Transformation**: Automated PySpark and Scala code updates for version compatibility, handling API changes and deprecations
11+
- **Dependency & Build Management**: Update and manage Maven/SBT/pip dependencies and build environments for target Spark versions with iterative error resolution
12+
- **Comprehensive Testing & Validation**: Execute unit tests, integration tests and EMR validation jobs and validates the upgraded application against target spark version
13+
- **Data Quality Validation**: Ensure data integrity throughout the upgrade process with validation rules
14+
- **EMR Integration & Monitoring**: Submit and monitor EMR jobs for upgrade validation across Amazon EMR on EC2 and Amazon EMR Serverless
15+
- **Observability & Progress Tracking**: Track upgrade progress, analyze results, and provide detailed insights throughout the upgrade process
16+
17+
18+
## Architecture
19+
The upgrade agent has three main components: any MCP-compatible AI Assistant in your development environment for interaction, the [MCP Proxy for AWS](https://github.com/aws/mcp-proxy-for-aws) that handles secure communication between your client and the MCP server, and the Amazon SageMaker Unified Studio Managed MCP Server (in preview) that provides specialized Spark upgrade tools for Amazon EMR. This diagram illustrates how you interact with the Amazon SageMaker Unified Studio Managed MCP Server through your AI Assistant.
20+
21+
![img](https://docs.aws.amazon.com/images/emr/latest/ReleaseGuide/images/SparkUpgradeIntroduction.png)
22+
23+
24+
The AI assistant will orchestrate the upgrade using specialized tools provided by the MCP server following these steps:
25+
26+
- **Planning**: The agent analyzes your project structure and generates or revises an upgrade plan that guides the end-to-end Spark upgrade process.
27+
28+
- **Compile & Build**: Agent updates the build environment and dependencies, compiles the project, and iteratively fixes build and test failures.
29+
30+
- **Spark code edit tool**: Applies targeted code updates to resolve Spark version incompatibilities, fixing both build-time and runtime errors.
31+
32+
- **Execution & Validation**: Submits remote validation jobs to EMR, monitors execution and logs, and iteratively fixes runtime and data-quality issues.
33+
34+
- **Observability**: Tracks upgrade progress using EMR observability tools and allows users to view upgrade analyses and status at any time.
35+
36+
Please refer to [Using Spark Upgrade Tools](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-upgrade-agent-tools.html) for a list of major tools for each steps.
37+
38+
### Supported Upgrade Paths
39+
- We support Apache Spark upgrades from version 2.4 to 3.5. The corresponding deployment mode mappings are as follows
40+
- **EMR Release Upgrades**:
41+
- For EMR-EC2
42+
- Source Version: EMR 5.20.0 and later
43+
- Target Version: EMR 7.12.0 and earlier, should be newer than EMR 5.20.0
44+
45+
- For EMR-Serverless
46+
- Source Version: EMR Serverless 6.6.0 and later
47+
- Target Version: EMR Serverless 7.12.0 and earlier
48+
49+
50+
51+
52+
## Configuration
53+
**Note:** The specific configuration format varies by MCP client. Below is an example for [Kiro CLI](https://kiro.dev/).
54+
55+
56+
**Kiro CLI**
57+
58+
```json
59+
{
60+
"mcpServers": {
61+
"spark-upgrade": {
62+
"type": "stdio",
63+
"command": "uvx",
64+
"args": [
65+
"mcp-proxy-for-aws@latest",
66+
"https://sagemaker-unified-studio-mcp.us-east-1.api.aws/spark-upgrade/mcp",
67+
"--service",
68+
"sagemaker-unified-studio-mcp",
69+
"--profile",
70+
"spark-upgrade-profile",
71+
"--region",
72+
"us-east-1",
73+
"--read-timeout",
74+
"180"
75+
],
76+
"timeout": 180000,
77+
"disabled": false
78+
}
79+
}
80+
}
81+
```
82+
83+
See [Using the Upgrade Agent](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-upgrade-agent-using.html) for the configuration guidance for different MCP clients like Kiro, Cline and GitHub CoPilot.
84+
85+
## Usage Examples
86+
87+
1. **Run the spark upgrade analysis**:
88+
- EMR-S
89+
```
90+
Help me upgrade my spark application in <project-path> from EMR-EC2 version 6.0.0 to 7.12.0. you can use EMR-S Application id xxg017hmd2agxxxx and execution role <role name> to run the validation and s3 paths s3://s3-staging-path to store updated application artifacts.
91+
```
92+
- EMR-EC2
93+
```
94+
Upgrade my Spark application <local-project-path> from EMR-S version 6.6.0 to 7.12.0. Use EMR-EC2 Cluster j-PPXXXXTG09XX to run the validation and s3 paths s3://s3-staging-path to store updated application artifacts.
95+
```
96+
97+
2. **List the analyses**:
98+
```
99+
Provide me a list of analyses performed by the spark agent
100+
```
101+
102+
3. **Describe Analysis**:
103+
```
104+
can you explain the analysis 439715b3-xxxx-42a6-xxxx-3bf7f1fxxxx
105+
```
106+
4. **Reuse Plan for other analysis**:
107+
```
108+
Use my upgrade_plan spark_upgrade_plan_xxx.json to upgrade my project in <project-path>
109+
```
110+
111+
## AWS Authentication
112+
113+
### Step 1: Configure AWS CLI Profile
114+
```
115+
aws configure set profile.spark-upgrade-profile.role_arn ${IAM_ROLE}
116+
aws configure set profile.spark-upgrade-profile.source_profile <AWS CLI Profile to assume the IAM role - ex: default>
117+
aws configure set profile.spark-upgrade-profile.region ${SMUS_MCP_REGION}
118+
```
119+
### Step 2: if you are using Kiro CLI, use the following command to add the MCP configuration
120+
```
121+
kiro-cli-chat mcp add \
122+
--name "spark-upgrade" \
123+
--command "uvx" \
124+
--args "[\"mcp-proxy-for-aws@latest\",\"https://sagemaker-unified-studio-mcp.${SMUS_MCP_REGION}.api.aws/spark-upgrade/mcp\", \"--service\", \"sagemaker-unified-studio-mcp\", \"--profile\", \"spark-upgrade-profile\", \"--region\", \"${SMUS_MCP_REGION}\", \"--read-timeout\", \"180\"]" \
125+
--timeout 180000\
126+
--scope global
127+
```
128+
For more infomation, refer to [AWS docs](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-upgrade-agent-setup.html)
129+
## Data Usage
130+
131+
This server processes your code and configuration files to provide upgrade recommendations. No sensitive data is stored permanently, and all processing follows AWS data protection standards.
132+
133+
## FAQs
134+
135+
### 1. Which Spark versions are supported?
136+
- For EMR-EC2
137+
- Source Version: EMR 5.20.0 and later
138+
- Target Version: EMR 7.12.0 and earlier, should be newer than EMR 5.20.0
139+
140+
- For EMR-Serverless
141+
- Source Version: EMR Serverless 6.6.0 and later
142+
- Target Version: EMR Serverless 7.12.0 and earlier
143+
144+
145+
146+
### 2. Can I use this for Scala applications?
147+
148+
Yes, the agent supports both PySpark and Scala Spark applications, including Maven and SBT build system
149+
150+
### 3. What about custom libraries and UDFs?
151+
152+
The agent analyzes custom dependencies and provides guidance for updating user-defined functions and third-party libraries.
153+
154+
### 4. How does data quality validation work?
155+
156+
The agent compares output data between old and new Spark versions using validation rules and statistical analysis.
157+
158+
### 5. Can I customize the upgrade process?
159+
160+
Yes, you can modify upgrade plans, exclude specific transformations, and customize validation criteria based on your requirements.
161+
162+
### 6. What if the automated upgrade fails?
163+
164+
The agent provides detailed error analysis, suggested fixes, and fallback strategies. You maintain full control over all changes.

0 commit comments

Comments
 (0)