Skip to content
This repository is currently being migrated. It's locked while the migration is in progress.
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,26 +35,40 @@ The script relies on AWS CLI to retreive the data.
```

`<cluster id>` is the cluster id that you are interested in parsing. The cluster id is prefixed with 'j-'.
`<region>` represents [the region the cluster ran in](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions).

<img src="https://user-images.githubusercontent.com/59929718/147899913-c1305da0-aab5-4882-8faa-3beeff710ec8.png" width="50%" height="50%">
New EMR console | Old EMR Console
:-------------------------:|:-------------------------:
<img src="https://user-images.githubusercontent.com/4088105/223570783-1a729e33-e270-4e4b-82bc-e380fed764ef.png"> | <img src="https://user-images.githubusercontent.com/4088105/223570400-ef23916f-dab5-465e-8ff5-ca1a57f137be.png">

`<region>` represents [the region the cluster ran in](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions). The script doesn't rely on the region configured in AWS config to align with the region the cluster actually ran in. (e.g. us-east-1)

## Step 2: Retrieve EMR Spark logs and upload into Autotuner

## Step 2: Retrieve EMR Spark logs and upload into Autotuner step #2
1. Go to the EMR console in AWS, and find the cluster that ran the job you are interested in optimizing. Click on the cluster name to view details of the cluster.

1. Assure that you have spark.eventLog.enabled set to true for any jobs you are interested in optimizing.
2. Verify that you have `spark.eventLog.enabled` set to true for any jobs you are interested in optimizing. The Sync Autotuner needs a Spark event log from a job run in order to provide optimized cluster configurations for the job.

2. Go to the EMR console in AWS, and find the cluster that ran the job you are interested in optimizing. Click on the cluster name to view details of the cluster.
<img src="https://user-images.githubusercontent.com/59929718/147900986-44b68adf-8f7d-4fda-b84b-2c54f6015fc5.png" width="50%" height="50%">
New EMR console | Old EMR Console
:-------------------------:|:-------------------------:
<img src="https://user-images.githubusercontent.com/4088105/223572670-4ee02e08-3a2e-4021-add6-185f645838fe.png"> | <img src="https://user-images.githubusercontent.com/4088105/223572532-aa20eb49-a010-401f-a63b-cab035efea5a.png">

3. Once you are in the cluster information page, click on the “Application user interfaces” tab, and click on “Spark history server” (in red below) under “Persistent application user interfaces.”
<img src="https://user-images.githubusercontent.com/59929718/147901007-81f08b39-1c20-468f-b57c-57dcfe4e46d5.png" width="50%" height="50%">
3. If `spark.eventLog.dir` is set and specifies an S3 location then download the Spark event log from the specified S3 location. Skip to Step 7.

4. If `spark.eventLog.dir` is **not set**, follow the steps below to download the Spark event log from the Spark history server.

4. A new tab should open up with the Spark history server. It may take a minute to load. Click the download button under the event log column to download the Spark event log. Upload this log into the Autotuner in step #2.
5. Once you are in the cluster information page, click on the “Application user interfaces” tab, and click on “Spark history server” (in red below) under “Persistent application user interfaces.”

New EMR console | Old EMR Console
:-------------------------:|:-------------------------:
<img src="https://user-images.githubusercontent.com/4088105/223585554-df6b249d-10ca-41ef-b8a9-ff328a709a9f.png"> | <img src="https://user-images.githubusercontent.com/4088105/223585649-8f32d7dd-e20d-49af-b307-dba292fcebcd.png">

6. A new tab should open up with the Spark history server. It may take a minute to load. Click the download button under the event log column to download the Spark event log.
<img src="https://user-images.githubusercontent.com/59929718/147901014-2c111ad3-3a74-4786-971c-880e578c9257.png" width="50%" height="50%">

7. Upload the Spark event log into the Autotuner.
<img src="https://user-images.githubusercontent.com/4088105/223587432-013fd96d-597a-49c0-969b-edf0e406706b.png" width="50%" height="50%">



# Databricks Tools

Expand Down Expand Up @@ -118,4 +132,4 @@ Instructions for finding a cluster-id through the Databricks console can be foun
],
"total_count": 22
}
```
```