Skip to content

Commit 9f043f4

Browse files
authored
feat(jobs): Add cancel_dataproc_job tool with comprehensive job cancellation support (#34)
🎯 Overview This PR adds a new **`cancel_dataproc_job`** MCP tool to provide emergency job cancellation capabilities for the Dataproc MCP Server. This addresses a critical need for users to stop runaway or long-running jobs to control costs and manage resources. ✨ Key Features: - Emergency job cancellation with minimal parameters (only jobId required) - Intelligent state handling (only attempts cancellation for PENDING/RUNNING jobs) - Comprehensive error handling with clear messages for all job states - Job tracking integration and knowledge base indexing - Enhanced documentation with 17 total tools (was 16) 🧪 Testing: - 15 comprehensive unit tests covering all scenarios - All 26 unit tests passing - Golden Command validation: `npm run pre-push` passed - Build, lint, format, type-check: All passed - Security audit: Clean - Documentation: All 209 links validated 💡 Use Cases: - Emergency cost control: Stop expensive runaway jobs - Pipeline management: Cancel dependent jobs when upstream fails - Development workflows: Quick cancellation during testing - Resource management: Free up cluster resources
1 parent ad977de commit 9f043f4

File tree

14 files changed

+747
-16
lines changed

14 files changed

+747
-16
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,3 +131,4 @@ dataproc-ops-test-report.json
131131
enhanced-prompt-demo.js
132132
test-spark-job.py
133133
verification-report.json
134+
state/dataproc-state.json

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ npx @dipseth/dataproc-mcp-server@latest
8787
## ✨ Features
8888

8989
### 🎯 **Core Capabilities**
90-
- **21 Production-Ready MCP Tools** - Complete Dataproc management suite
90+
- **22 Production-Ready MCP Tools** - Complete Dataproc management suite
9191
- **🧠 Knowledge Base Semantic Search** - Natural language queries with optional Qdrant integration
9292
- **🚀 Response Optimization** - 60-96% token reduction with Qdrant storage
9393
- **🔄 Generic Type Conversion System** - Automatic, type-safe data transformations
@@ -132,7 +132,7 @@ npx @dipseth/dataproc-mcp-server@latest
132132
- **Troubleshooting Guides** - Common issues and solutions
133133
- **IDE Integration** - TypeScript support
134134

135-
## 🛠️ Complete MCP Tools Suite (21 Tools)
135+
## 🛠️ Complete MCP Tools Suite (22 Tools)
136136

137137
> **🔄 Enhanced with Generic Type Conversion**: All tools now benefit from automatic, type-safe data transformations with intelligent compression and field mapping.
138138
@@ -148,11 +148,12 @@ npx @dipseth/dataproc-mcp-server@latest
148148
| `delete_cluster` | Delete existing clusters | ✅ Project/region defaults | Safe deletion |
149149
| `get_zeppelin_url` | Get Zeppelin notebook URL | ✅ Auto-discovery | Web interface access |
150150

151-
### 💼 **Job Management (6 Tools)**
151+
### 💼 **Job Management (7 Tools)**
152152
| Tool | Description | Smart Defaults | Key Features |
153153
|------|-------------|----------------|--------------|
154154
| `submit_hive_query` | Submit Hive queries to clusters | ✅ 70% fewer params | Async support, timeouts |
155155
| `submit_dataproc_job` | Submit Spark/PySpark/Presto jobs | ✅ 75% fewer params | Multi-engine support, **Local file staging** |
156+
| `cancel_dataproc_job` | Cancel running or pending jobs | ✅ JobID only needed | **Emergency cancellation**, cost control |
156157
| `get_job_status` | Get job execution status | ✅ JobID only needed | Real-time monitoring |
157158
| `get_job_results` | Get job outputs and results | ✅ Auto-pagination | Result formatting |
158159
| `get_query_status` | Get Hive query status | ✅ Minimal params | Query tracking |
@@ -216,7 +217,7 @@ my-company-analytics-prod-1234:
216217
- **[Knowledge Base Semantic Search](https://dipseth.github.io/dataproc-mcp/KNOWLEDGE_BASE_SEMANTIC_SEARCH/)** - Natural language queries and setup
217218
- **[Generic Type Conversion System](docs/GENERIC_TYPE_CONVERTER.md)** - Architectural design and implementation
218219
- **[Generic Converter Migration Guide](docs/GENERIC_TYPE_CONVERTER.md)** - Migration from manual conversions
219-
- **[API Reference](https://dipseth.github.io/dataproc-mcp/api/)** - Complete tool documentation
220+
- **[API Reference](https://dipseth.github.io/dataproc-mcp/API_REFERENCE/)** - Complete tool documentation
220221
- **[Configuration Examples](https://dipseth.github.io/dataproc-mcp/CONFIGURATION_EXAMPLES/)** - Real-world configurations
221222
- **[Security Guide](https://dipseth.github.io/dataproc-mcp/security/)** - Best practices and compliance
222223
- **[Installation Guide](https://dipseth.github.io/dataproc-mcp/INSTALLATION_GUIDE/)** - Detailed setup instructions

docs/API_REFERENCE.md

Lines changed: 77 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ permalink: /API_REFERENCE/
77

88
# 📚 API Reference
99

10-
Complete reference for all 16 Dataproc MCP Server tools with practical examples and usage patterns.
10+
Complete reference for all 17 Dataproc MCP Server tools with practical examples and usage patterns.
1111

1212
## Overview
1313

14-
The Dataproc MCP Server provides 16 comprehensive tools organized into four categories:
14+
The Dataproc MCP Server provides 17 comprehensive tools organized into four categories:
1515
- **Cluster Management** (6 tools)
16-
- **Job Execution** (5 tools)
16+
- **Job Execution** (6 tools)
1717
- **Profile Management** (3 tools)
1818
- **Monitoring & Utilities** (2 tools)
1919

@@ -542,9 +542,78 @@ Gets the results of a completed Dataproc job.
542542
}
543543
```
544544
545+
### 12. cancel_dataproc_job
546+
547+
Cancels a running or pending Dataproc job with intelligent status handling and job tracking integration.
548+
549+
**Parameters:**
550+
- `jobId` (string, required): The ID of the Dataproc job to cancel
551+
- `projectId` (string, optional): GCP project ID (uses defaults if not provided)
552+
- `region` (string, optional): Dataproc region (uses defaults if not provided)
553+
- `verbose` (boolean, optional): Return full response without filtering (default: false)
554+
555+
**🛑 CANCELLATION WORKFLOW:**
556+
- Attempts to cancel jobs in PENDING or RUNNING states
557+
- Provides informative messages for jobs already in terminal states
558+
- Updates internal job tracking when cancellation succeeds
559+
560+
**📊 STATUS HANDLING:**
561+
- **PENDING/RUNNING** → Cancellation attempted
562+
- **DONE/ERROR/CANCELLED** → Informative message returned
563+
- **Job not found** → Clear error message
564+
565+
**💡 MONITORING:**
566+
After cancellation, use `get_job_status("jobId")` to confirm the job reaches CANCELLED state.
567+
568+
**Example:**
569+
```json
570+
{
571+
"tool": "cancel_dataproc_job",
572+
"arguments": {
573+
"jobId": "Clean_Places_sub_group_base_1_cleaned_places_13b6ec3f"
574+
}
575+
}
576+
```
577+
578+
**Successful Cancellation Response:**
579+
```json
580+
{
581+
"content": [
582+
{
583+
"type": "text",
584+
"text": "🛑 Job Cancellation Status\n\nJob ID: Clean_Places_sub_group_base_1_cleaned_places_13b6ec3f\nStatus: 3\nMessage: Cancellation request sent for job Clean_Places_sub_group_base_1_cleaned_places_13b6ec3f."
585+
}
586+
]
587+
}
588+
```
589+
590+
**Job Already Completed Response:**
591+
```json
592+
{
593+
"content": [
594+
{
595+
"type": "text",
596+
"text": "Cannot cancel job Clean_Places_sub_group_base_1_cleaned_places_13b6ec3f in state: 'DONE'; cancellable states: '[PENDING, RUNNING]'"
597+
}
598+
]
599+
}
600+
```
601+
602+
**Use Cases:**
603+
- **Emergency Cancellation**: Stop runaway jobs consuming excessive resources
604+
- **Pipeline Management**: Cancel dependent jobs when upstream processes fail
605+
- **Cost Control**: Terminate expensive long-running jobs
606+
- **Development Workflow**: Cancel test jobs during development iterations
607+
608+
**Best Practices:**
609+
1. **Monitor job status** before and after cancellation attempts
610+
2. **Use with get_job_status** to verify cancellation completion
611+
3. **Handle gracefully** when jobs are already in terminal states
612+
4. **Consider dependencies** before cancelling pipeline jobs
613+
545614
## Profile Management Tools
546615
547-
### 12. list_profiles
616+
### 13. list_profiles
548617
549618
Lists available cluster configuration profiles.
550619
@@ -573,7 +642,7 @@ Lists available cluster configuration profiles.
573642
}
574643
```
575644
576-
### 13. get_profile
645+
### 14. get_profile
577646
578647
Gets details for a specific cluster configuration profile.
579648
@@ -590,7 +659,7 @@ Gets details for a specific cluster configuration profile.
590659
}
591660
```
592661
593-
### 14. list_tracked_clusters
662+
### 15. list_tracked_clusters
594663
595664
Lists clusters that were created and tracked by this MCP server.
596665
@@ -609,7 +678,7 @@ Lists clusters that were created and tracked by this MCP server.
609678
610679
## Monitoring & Utilities
611680
612-
### 15. get_zeppelin_url
681+
### 16. get_zeppelin_url
613682
614683
Gets the Zeppelin notebook URL for a cluster (if enabled).
615684
@@ -642,7 +711,7 @@ Gets the Zeppelin notebook URL for a cluster (if enabled).
642711
}
643712
```
644713
645-
### 16. check_active_jobs
714+
### 17. check_active_jobs
646715
647716
🚀 Quick status check for all active and recent jobs with intelligent response optimization.
648717

docs/QUICK_START.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,16 @@ Show me all my Dataproc clusters
191191
Submit a Spark job to process data from gs://my-bucket/data.csv
192192
```
193193

194+
### Cancel a Running Job
195+
```
196+
Cancel the job with ID "my-long-running-job-12345"
197+
```
198+
199+
### Monitor Job Status
200+
```
201+
Check the status of job "my-job-67890"
202+
```
203+
194204
### Try Semantic Search (if Qdrant enabled)
195205
```
196206
Show me clusters with machine learning packages installed

scripts/setup.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,7 @@ async function createMCPTemplate() {
140140
"submit_dataproc_job",
141141
"get_job_status",
142142
"get_job_results",
143+
"cancel_dataproc_job",
143144
"get_zeppelin_url"
144145
],
145146
"env": {

src/handlers/cluster-handlers.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
77
import { logger } from '../utils/logger.js';
8+
import { deepMerge } from '../utils/object-utils.js';
89
import SecurityMiddleware from '../security/middleware.js';
910
import {
1011
StartDataprocClusterSchema,
@@ -767,7 +768,7 @@ export async function handleCreateClusterFromYaml(args: any, deps: HandlerDepend
767768

768769
// Apply overrides if provided
769770
if (overrides && typeof overrides === 'object') {
770-
clusterConfig = { ...clusterConfig, ...overrides };
771+
clusterConfig = deepMerge(clusterConfig, overrides);
771772
}
772773

773774
// Use existing cluster creation logic with properly extracted config
@@ -824,7 +825,7 @@ export async function handleCreateClusterFromProfile(args: any, deps: HandlerDep
824825

825826
// Apply overrides if provided
826827
if (overrides && typeof overrides === 'object') {
827-
clusterConfig = { ...clusterConfig, ...overrides };
828+
clusterConfig = deepMerge(clusterConfig, overrides);
828829
}
829830

830831
// Use existing cluster creation logic

src/handlers/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import {
2020
handleGetQueryResults,
2121
handleGetJobResults,
2222
handleCheckActiveJobs,
23+
handleCancelDataprocJob,
2324
} from './job-handlers.js';
2425
import {
2526
handleListProfiles,
@@ -97,6 +98,8 @@ export async function handleToolCall(toolName: string, args: any, deps: AllHandl
9798
return handleGetJobResults(args, deps);
9899
case 'check_active_jobs':
99100
return handleCheckActiveJobs(args, deps);
101+
case 'cancel_dataproc_job':
102+
return handleCancelDataprocJob(args, deps);
100103

101104
// Profile handlers
102105
case 'list_profiles':

0 commit comments

Comments
 (0)