Skip to content

Commit 4bb727b

Browse files
authored
Added a dockerfile for building and testing the package (#195)
* Added a dockerfile for building and testing the package This dockerfile can be used to setup and run the tests in the Python Deequ package. This way, we do not need to install any dependencies in our local workspaces. Right now, it only builds against Spark version 3.3. Will be adding other versions in a future PR. Verified that the docker run output is the same as that of the PR workflow. * Locked Poetry version to 1.7.1
1 parent 7fd0cff commit 4bb727b

File tree

3 files changed

+39
-2
lines changed

3 files changed

+39
-2
lines changed

.github/workflows/base.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
SPARK_VERSION: ${{matrix.PYSPARK_VERSION}}
3434
run: |
3535
pip install --upgrade pip
36-
pip install poetry
36+
pip install poetry==1.7.1
3737
poetry install
3838
poetry add pyspark==$SPARK_VERSION
3939
poetry run python -m pytest -s tests

Dockerfile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
FROM ubuntu:22.04
2+
3+
ARG DEBIAN_FRONTEND=noninteractive
4+
5+
RUN apt-get update
6+
RUN apt-get install -y software-properties-common
7+
RUN add-apt-repository ppa:deadsnakes/ppa
8+
RUN apt-get install -y python3.8 python3-pip
9+
RUN apt-get install -y python3.8-distutils
10+
RUN apt-get install -y openjdk-11-jdk
11+
12+
# Update symlink to point to latest
13+
RUN rm /usr/bin/python3 && ln -s /usr/bin/python3.8 /usr/bin/python3
14+
RUN python3 --version
15+
RUN pip3 --version
16+
RUN java -version
17+
RUN pip install poetry==1.7.1
18+
19+
COPY . /python-deequ
20+
WORKDIR python-deequ
21+
22+
RUN poetry lock --no-update
23+
RUN poetry install
24+
RUN poetry add pyspark==3.3
25+
26+
ENV SPARK_VERSION=3.3
27+
CMD poetry run python -m pytest -s tests

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,4 +244,14 @@ Take a look at tests in `tests/dataquality` and `tests/jobs`
244244

245245
```bash
246246
$ poetry run pytest
247-
```
247+
```
248+
249+
## Running Tests Locally (Docker)
250+
251+
If you have issues installing the dependencies listed above, another way to run the tests and verify your changes is through Docker. There is a Dockerfile that will install the required dependencies and run the tests in a container.
252+
253+
```
254+
docker build . -t spark-3.3-docker-test
255+
docker run spark-3.3-docker-test
256+
```
257+

0 commit comments

Comments
 (0)