Welcome to the Quinn contributing guide

Issues

Create a new issue

If you spot a problem with the docs, search if an issue already. If a related issue doesn't exist, you can open a new issue.

Solve an issue

Scan through our existing issues to find one that interests you. If you find an issue to work on, make sure that no one else is already working on it, so you can get assigned. After that, you are welcome to open a PR with a fix.

Good first issue

You can find a list of good first issues which can help you better understand code base of the project.

Auto-assigning issues

We have a workflow that automatically assigns issues to users who comment 'take' on an issue. This is configured in the .github/workflows/assign-on-comment.yml file. When a user comments take on the issue, a GitHub Action will be run to assign the issue to the user if it's not already assigned.

Contributing

Fork the repository

To start contributing you should fork this repository and only after that clone your fork. If you accidentally forked this repository you can fix it any time by this command:

# for user-login
git remote set-url origin https://github.com/your-github-name/quinn.git
# for private keys way
git remote set-url origin git@github.com:your-github-name/quinn.git

Install the project

Installing poetry

After cloning the project you should install all the dependencies. We are using poetry as a build tool. You can install poetry by following this instruction.

Installing dependencies

You can create a virtualenv with poetry. The recommended version of Python is 3.9:

poetry env use python3.9

After that you should install all the dependencies including development:

make install_deps

Setup Java

To run spark tests you need to have properly configured Java. Apache Spark currently supports mainly only Java 8 (1.8). You can find an instruction on how to set up Java here. When you are running spark tests you should have JAVA_HOME variable in your environment which points to the installation of Java 8.

Pre-commit installation and execution

We use pre-commit hooks to ensure code quality. The configuration for pre-commit hooks is in the .pre-commit-config.yaml file. To install pre-commit, run:

poetry shell
poetry run pre-commit install

To run pre-commit hooks manually, use:

pre-commit run --all-files

Running Tests

This project uses pytest and chispa for running spark tests. Please run all the tests before creating a pull request. In the case when you are working on new functionality you should also add new tests. You can run test as following:

make test

GitHub Actions local setup using 'act'

You can run GitHub Actions locally using the act tool. The configuration for GitHub Actions is in the .github/workflows/ci.yml file. To install act, follow the instructions here. To run a specific job, use:

act -j <job-name>

For example, to run the test job, use:

act -j test

If you need help with act, use:

act --help

For MacBooks with M1 processors, you might have to add the --container-architecture tag:

act -j <job-name> --container-architecture linux/arm64

Running Spark-Connect tests locally

To run the Spark-Connect tests locally, follow the below steps. Please note, this only works on Mac/UNIX-based systems.

Set up the required environment variables: Following variables need to be setup, so that the shell script that is used to install the Spark-Connect binary & start the server picks the version.

The version can either be 3.5.1 or 3.4.3, as those are the ones used in our CI.
```
export SPARK_VERSION=3.5.1
export SPARK_CONNECT_MODE_ENABLED=1
```
Check if the required environment variables are set: Run the below command to check if the required environment variables are set.
```
echo $SPARK_VERSION
echo $SPARK_CONNECT_MODE_ENABLED
```
Install required system packages: Run the below command to install wget.

For Mac users:
```
brew install wget
```
For Ubuntu users:
```
sudo apt-get install wget
```
Execute the shell script: Run the below command to execute the shell script that installs the Spark-Connect & starts the server.
```
sh scripts/run_spark_connect_server.sh
```
Run the tests: Run the below command to execute the tests using Spark-Connect.
```
make test
```
Cleanups: After running the tests, you can stop the Spark-Connect server and unset the environment variables.
```
unset SPARK_VERSION
unset SPARK_CONNECT_MODE_ENABLED
```

Code style

This project follows the PySpark style guide. All public functions and methods should be documented in README.md and also should have docstrings in sphinx format:

"""[Summary]

:param [ParamName]: [ParamDescription], defaults to [DefaultParamVal]
:type [ParamName]: [ParamType](, optional)
...
:raises [ErrorType]: [ErrorDescription]
...
:return: [ReturnDescription]
:rtype: [ReturnType]
"""

We are using isort and ruff as linters. You can find instructions on how to set up and use these tools here:

isort
ruff

Adding ruff to IDEs

VSCode

Install the Ruff extension by Astral Software from the VSCode marketplace (Extension ID: charliermarsh.ruff).
Open the command palette (Ctrl+Shift+P) and select Preferences: Open Settings (JSON).
Add the following configuration to your settings.json file:

{
    "python.linting.ruffEnabled": true,
    "python.linting.enabled": true,
    "python.formatting.provider": "none",
    "editor.formatOnSave": true
}

The above settings will enable linting with Ruff, and format your code with Ruff on save.

PyCharm

To set up Ruff in PyCharm using poetry, follow these steps:

Find the path to your poetry executable:
- Open a terminal.
- For macOS/Linux, use the command which poetry.
- For Windows, use the command where poetry.
- Note down the path returned by the command.
Open the Preferences window (Cmd+, on macOS).
Navigate to Tools > External Tools.
Click the + icon to add a new external tool.
Fill in the following details:
- Name: Ruff
- Program: Enter the path to your poetry executable that you noted earlier.
- Arguments: run ruff check --fix $FilePathRelativeToProjectRoot$
- Working directory: $ProjectFileDir$
Click OK to save the configuration.
To run Ruff, right-click on a file or directory in the project view, select External Tools, and then select Ruff.

Pull Request

When you're finished with the changes, create a pull request, also known as a PR.

Don't forget to link PR to the issue if you are solving one.
As you update your PR and apply changes, mark each conversation as resolved.
If you run into any merge issues, checkout this git tutorial to help you resolve merge conflicts and other issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to the Quinn contributing guide

Issues

Create a new issue

Solve an issue

Good first issue

Auto-assigning issues

Contributing

Fork the repository

Install the project

Installing poetry

Installing dependencies

Setup Java

Pre-commit installation and execution

Running Tests

GitHub Actions local setup using 'act'

Running Spark-Connect tests locally

Code style

Adding ruff to IDEs

VSCode

PyCharm

Pull Request

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Welcome to the Quinn contributing guide

Issues

Create a new issue

Solve an issue

Good first issue

Auto-assigning issues

Contributing

Fork the repository

Install the project

Installing poetry

Installing dependencies

Setup Java

Pre-commit installation and execution

Running Tests

GitHub Actions local setup using 'act'

Running Spark-Connect tests locally

Code style

Adding ruff to IDEs

VSCode

PyCharm

Pull Request