Skip to content

Commit 215c06f

Browse files
authored
Revise README for interview preparation clarity
Updated README to clarify interview preparation goals and structure.
1 parent 2d18902 commit 215c06f

File tree

1 file changed

+43
-39
lines changed

1 file changed

+43
-39
lines changed

README.md

Lines changed: 43 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,24 @@
33
This coding challenge is a collection of _Python_ jobs that are supposed to extract, transform and load data.
44
These jobs are using _PySpark_ to process larger volumes of data and are supposed to run on a _Spark_ cluster (via `spark-submit`).
55

6-
## Gearing Up for the Pairing Session
6+
## Preparing for the interview
77

8-
**✅ Goals**
8+
> [!WARNING]
9+
> The exercises will be given at the time of interview, and **solved by pairing with the interviewer**.
10+
> Please do not solve the exercises before the interview.
911
10-
1. **Get a working environment** See local [local](#local-setup)
11-
2. **Get a high-level understanding of the code and test dataset structure**
12-
3. Have your preferred text editor or IDE setup and ready to go.
13-
14-
**❌ Non-Goals**
12+
**✅ Goals:**
1513

16-
- solving the exercises / writing code
17-
> ⚠️ The exercises will be given at the time of interview, and solved by pairing with the interviewer.
14+
1. **Get a [working environment set up](#setup-the-environment).** You can setup a [local environment](#option-1-local-setup), use a [devcontainer](#option-2-devcontainer-setup) or use [Github codespaces](#option-3-github-codespaces).
15+
2. 2. **Get a high-level understanding of the code and test dataset structure**
16+
3. Have your preferred text editor or IDE setup and ready to go.
17+
4. ⚠️ Don't solve the exercises before the interview. ⚠️
1818

19-
### Local Setup
19+
## Setup the environment
20+
### Option 1: Local Setup
2021

21-
> 💡 Use the [Devcontainer setup](#devcontainer-setup) if you encounter issues.
22+
> [!TIP]
23+
> Use the [Devcontainer setup](#option-2-devcontainer-setup) if you encounter issues.
2224
2325
#### Pre-requisites
2426

@@ -32,23 +34,23 @@ Please make sure you have the following installed and can run them
3234

3335
We recommend using WSL 2 on Windows for this exercise, due to the [lack of support](https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems) of windows paths from Hadoop/Spark.
3436

35-
Follow instructions on the [Windows official page](https://learn.microsoft.com/en-us/windows/wsl/setup/environment) and then the linux install.
36-
37-
> 💡 Use the [Devcontainer setup](#devcontainer-setup) if you encounter issues.
37+
Follow instructions on the [Windows official page](https://learn.microsoft.com/en-us/windows/wsl/setup/environment) and then the linux install.
38+
Use the [Devcontainer setup](#option-2-devcontainer-setup) if you encounter issues.
3839

3940
#### Install all dependencies
4041

4142
```bash
4243
poetry install
4344
```
4445

45-
### Devcontainer setup
46+
### Option 2: Devcontainer setup
4647

4748
Configuration to use dev containers is provided in `.devcontainer`
4849

49-
> ⚠️ this take up to 7 minutes to setup, make sure to have things running before the interview.
50+
> [!WARNING]
51+
> This takes up to 7 minutes to setup, make sure to have things running before the interview.
5052
51-
### In Github codespaces
53+
### Option 3: Github codespaces
5254

5355
1. [Fork](https://github.com/techops-recsys-lateral-hiring/dataengineer-transformations-python/fork) this repository.
5456
2. Follow [codespace instructions](https://docs.github.com/en/codespaces/developing-in-a-codespace/creating-a-codespace-for-a-repository#the-codespace-creation-process) from the forked repository, to create the environment.
@@ -59,23 +61,23 @@ This requires a working local docker setup matching your OS and licensing situat
5961

6062
If you have all of these, follow instructions in https://code.visualstudio.com/docs/devcontainers/containers. Otherwise, consider using codespaces.
6163

62-
### Verify setup
64+
## Verify setup
6365

64-
> All of the following commands should be running successfully
66+
All of the following tests should be running successfully
6567

66-
#### Run unit tests
68+
### Run unit tests
6769

6870
```bash
6971
poetry run pytest tests/unit
7072
```
7173

72-
#### Run integration tests
74+
### Run integration tests
7375

7476
```bash
7577
poetry run pytest tests/integration
7678
```
7779

78-
#### Run style checks
80+
### Run style checks
7981

8082
```bash
8183
poetry run mypy --ignore-missing-imports --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs \
@@ -84,24 +86,29 @@ poetry run mypy --ignore-missing-imports --disallow-untyped-calls --disallow-unt
8486
poetry run ruff format && poetry run ruff check
8587
```
8688

87-
### Anything else?
89+
### Done!
8890

8991
All commands are passing?
9092
You are good to go!
9193

92-
> ⚠️ do not try to solve the exercises ahead of the interview
94+
> [!WARNING]
95+
> Remember, do not try to solve the exercises ahead of the interview.
96+
97+
> [!TIP]
98+
> You are allowed to customize your environment (having the test in vscode directly for example): feel free to spend the time making this comfortable for you. This is not an expectation.
9399
94-
You are allowed to customize your environment (having the test in vscode directly for example): feel free to spend the time making this comfortable for you. This is not an expectation.
95100

96-
## Jobs
97101

98-
There are two exercises in this repo: Word Count, and Citibike.
102+
## Interview Exercises
103+
104+
There are two exercises in this repo: [Word Count](#word-count), and [Citibike](#citibike).
99105

100106
Currently, these exist as skeletons, and have some **initial test cases** which are defined but some are skipped.
101107

102-
The following section provides context over them.
108+
The following section provides context over them. Read this before the interview to familiarise yourself with the exercises and its structure.
103109

104-
> ⚠️ do not try to solve the exercises ahead of the interview
110+
> [!WARNING]
111+
> Please, do not try to solve the exercises ahead of the interview.
105112
106113
### Code walk
107114

@@ -191,7 +198,7 @@ flowchart TD
191198

192199
There is a dump of the datalake for this under `resources/citibike/citibike.csv` with historical data.
193200

194-
#### Ingest
201+
#### 1. Ingest
195202

196203
Reads a `*.csv` file and transforms it to parquet format. The column names will be sanitized (whitespaces replaced).
197204

@@ -226,14 +233,13 @@ poetry build && poetry run spark-submit \
226233
<OUTPUT_PATH>
227234
```
228235

229-
#### Distance calculation
236+
#### 2. Distance calculation
230237

231238
This job takes bike trip information and adds the "as the crow flies" distance traveled for each trip.
232239
It reads the previously ingested data parquet files.
233240

234-
Hint:
235-
236-
- For distance calculation, consider using [**Haversine formula**](https://www.movable-type.co.uk/scripts/latlong.html) as an option.
241+
> [!TIP]
242+
> For distance calculation, consider using [**Haversine formula**](https://www.movable-type.co.uk/scripts/latlong.html) as an option.
237243
238244
##### Input
239245

@@ -266,13 +272,11 @@ poetry build && poetry run spark-submit \
266272
<OUTPUT_PATH>
267273
```
268274

269-
---
275+
> [!WARNING]
276+
> One last time: do not try to solve the exercises ahead of the interview. 😅
270277
271-
> ⚠️ do not try to solve the exercises ahead of the interview
272-
273-
---
274278

275-
## Reading List
279+
## Resources / Reading list
276280

277281
If you are unfamiliar with some of the tools used here, we recommend some resources to get started
278282

0 commit comments

Comments
 (0)