Skip to content

Commit 0c4880a

Browse files
committed
Base pyspark-jupyter image in official jupyter/pyspark-notebook
1 parent 700dbc7 commit 0c4880a

File tree

5 files changed

+11
-45
lines changed

5 files changed

+11
-45
lines changed

infra/pyspark-jupyter/Dockerfile

Lines changed: 6 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,7 @@
1-
FROM luisbelloch/spark
1+
FROM jupyter/pyspark-notebook
22
LABEL maintainer="Luis Belloch <[email protected]>"
3-
4-
ENV DEBIAN_FRONTEND=noninteractive
5-
RUN apt-get update && \
6-
apt-get install -y --no-install-recommends python3-pip && \
7-
rm -rf /var/lib/apt/lists/*
8-
9-
RUN pip3 install --upgrade setuptools wheel && \
10-
rm -rf ~/.cache/*
11-
12-
RUN pip3 install --upgrade jupyterlab pandas && \
13-
rm -rf ~/.cache/*
14-
15-
ENV PYSPARK_DRIVER_PYTHON=jupyter
16-
ENV PYSPARK_DRIVER_PYTHON_OPTS="lab --ip $(awk \'END{print $1}\' /etc/hosts) --allow-root --port 8888"
17-
18-
WORKDIR /opt/notebook
19-
COPY entrypoint.sh /opt/notebook
20-
21-
EXPOSE 8888
22-
23-
CMD ["/opt/notebook/entrypoint.sh"]
24-
3+
ENV JUPYTER_ENABLE_LAB=yes
4+
RUN git clone https://github.com/luisbelloch/data_processing_course.git && \
5+
mv data_processing_course/data . && \
6+
mv data_processing_course/spark ./ejemplos && \
7+
rm -rf data_processing_course

infra/pyspark-jupyter/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ push:
1313
docker push luisbelloch/pyspark-jupyter
1414

1515
run:
16-
./pyspark-jupyter.sh
16+
docker run -p 8888:8888 -p 4040:4040 luisbelloch/pyspark-jupyter
1717

1818
list:
1919
docker images luisbelloch/pyspark-jupyter

infra/pyspark-jupyter/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# PySpark + Jupyter
22

3-
This folder contains a docker container with PySpark ready to be run from a Jupyter Notebook.
3+
This folder contains a docker container with PySpark ready to be run from a Jupyter Notebook, specifically customized for the course.
4+
5+
For more general uses, we recommend to use the official [Jupyter Docker Stacks](https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html). This image itself is derived from `jupyter/pyspark-notebook` one.
46

57
To run it, simply do:
68

@@ -10,7 +12,7 @@ docker run -p 8888:8888 -ti luisbelloch/pyspark-jupyter
1012

1113
And navigate to [http://localhost:8888](http://localhost:8888). The password token will be displayed in the terminal.
1214

13-
There's a simple script that will also mount `data` folder used in samples. You can easily access to it from the notebook:
15+
This image contains `data` folder used in the examples. You can easily access to it from the notebook:
1416

1517
```python
1618
rdd = sc.textFile('./data/compras_tiny.csv')

infra/pyspark-jupyter/entrypoint.sh

Lines changed: 0 additions & 7 deletions
This file was deleted.

infra/pyspark-jupyter/pyspark-jupyter.sh

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)