|
1 | | -# spark_log_parser |
2 | | -The **Spark log parser** parses unmodified Spark output logs. |
| 1 | +# log_parser |
| 2 | +The **Parser for Apache Spark** parses unmodified Apache Spark History Server Event logs. |
3 | 3 |
|
4 | | -Parsed logs contain metadata pertaining to your Spark application execution. Particularly, the runtime for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain |
5 | | -sensitive information such as the data that your Spark application is processing. Below is an example of the output of the log parser |
6 | | - |
| 4 | +Parsed logs contain metadata pertaining to your Apache Spark application execution. Particularly, the runtime for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain |
| 5 | +sensitive information such as the data that your Apache Spark application is processing. Below is an example of the output of the log parser |
| 6 | + |
7 | 7 |
|
8 | 8 | # Installation |
9 | 9 | Clone this repo to the desired directory. |
10 | 10 |
|
11 | 11 | # Getting Started |
12 | | -### Step 0: Generate the appropriate Apache Spark EMR log |
13 | | -If you have not already done so, complete the [instructions](https://github.com/synccomputingcode/spark_log_parser/blob/main/docs/event_log_download.pdf) to download the spark event log. |
| 12 | +### Step 0: Generate the appropriate Apache Spark History Server Event log |
| 13 | +If you have not already done so, complete the [instructions](docs/event_log_download.pdf) to download the Apache Spark event log. |
14 | 14 |
|
15 | 15 | ### Step 1: Parse the log to strip away sensitive information |
16 | 16 | 1. To process a log file, execute the parse.py script in the sync_parser folder, and provide a |
17 | 17 | log file destination with the -d flag. |
18 | 18 |
|
19 | 19 | `python3 sync_parser/parse.py -d [log file location]` |
20 | 20 |
|
21 | | - The parsed file `[log file name].spk` will appear in the sync_parser/results directory. |
22 | | - |
23 | | - To re-process and overwrite a previously generated parsed log add the -o flag: |
| 21 | + The parsed file `parsed-[log file name]` will appear in the results directory. |
24 | 22 |
|
25 | | - `python3 sync_parser/parse.py -d [log file location] -o` |
26 | 23 |
|
27 | | -3. Send Sync Computing the parsed log |
| 24 | +2. Send Sync Computing the parsed log |
28 | 25 |
|
29 | | - The parsed file `[log file name].spk` will appear in the sync_parser/results directory. Email |
30 | | -your contact at Sync Computing the parsed file. |
| 26 | +Email Sync Computing (or upload to the Sync Auto-tuner) the parsed event log. |
0 commit comments