You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`async` (boolean, optional): Whether to submit asynchronously
300
300
301
+
**🔧 LOCAL FILE STAGING:**
302
+
303
+
The `baseDirectory` parameter in the local file staging system controls how relative file paths are resolved when using the template syntax `{@./relative/path}` or direct relative paths in job configurations.
304
+
305
+
**Configuration:**
306
+
The `baseDirectory` parameter is configured in `config/default-params.json` with a default value of `"."`, which refers to the **current working directory** where the MCP server process is running (typically the project root directory).
307
+
308
+
**Path Resolution Logic:**
309
+
310
+
1.**Absolute Paths**: If a file path is already absolute (starts with `/`), it's used as-is
311
+
2.**Relative Path Resolution**: For relative paths, the system:
312
+
- Gets the baseDirectory value from configuration (default: `"."`)
313
+
- Resolves the baseDirectory if it's relative:
314
+
- First tries to use `DATAPROC_CONFIG_PATH` environment variable's directory
315
+
- Falls back to `process.cwd()` (current working directory)
316
+
- Combines baseDirectory with the relative file path
Copy file name to clipboardExpand all lines: docs/examples/queries/hive-query-examples.md
+104-1Lines changed: 104 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,4 +137,107 @@ To set Hive properties for the query:
137
137
},
138
138
"timeoutMs": 300000
139
139
}
140
-
}
140
+
}
141
+
```
142
+
143
+
## PySpark Job Examples with Local File Staging
144
+
145
+
The MCP server supports automatic local file staging for PySpark jobs, allowing you to reference local Python files using template syntax. Files are automatically uploaded to GCS and the job configuration is transformed to use the staged files.
-**Template Syntax**: Use `{@./relative/path}` or `{@/absolute/path}` to reference local files
195
+
-**Automatic Upload**: Files are automatically staged to the cluster's GCS staging bucket
196
+
-**Unique Naming**: Staged files get unique names with timestamps to avoid conflicts
197
+
-**Cleanup**: Staged files are automatically cleaned up after job completion
198
+
-**Supported Extensions**: `.py`, `.jar`, `.sql`, `.R` files are supported
199
+
200
+
### Example Test Job Output
201
+
202
+
When using the test-spark-job.py file, you can expect output similar to:
203
+
204
+
```
205
+
=== PySpark Local File Staging Test ===
206
+
Spark version: 3.1.3
207
+
Arguments received: ['--mode', 'test']
208
+
209
+
=== Sample Data ===
210
+
+---------+---+---------+
211
+
| name|age| role|
212
+
+---------+---+---------+
213
+
| Alice| 25| Engineer|
214
+
| Bob| 30| Manager|
215
+
| Charlie| 35| Analyst|
216
+
| Diana| 28| Designer|
217
+
+---------+---+---------+
218
+
219
+
=== Data Analysis ===
220
+
Total records: 4
221
+
Average age: 29.5
222
+
223
+
=== Role Distribution ===
224
+
+---------+-----+
225
+
| role|count|
226
+
+---------+-----+
227
+
| Engineer| 1|
228
+
| Manager| 1|
229
+
| Analyst| 1|
230
+
| Designer| 1|
231
+
+---------+-----+
232
+
233
+
=== Test Completed Successfully! ===
234
+
Local file staging is working correctly.
235
+
```
236
+
237
+
### Successful Test Cases
238
+
239
+
The following job IDs demonstrate successful local file staging:
240
+
- Job ID: `db620480-135f-4de6-b9a6-4045b308fe97` - Basic PySpark job with local file
241
+
- Job ID: `36ed88b2-acad-4cfb-8fbf-88ad1ba22ad7` - PySpark job with multiple local files
242
+
243
+
These examples show that local file staging works seamlessly with the Dataproc MCP server, providing the same experience as using `gcloud dataproc jobs submit pyspark` with local files.
0 commit comments