Skip to content

Commit 6491bdf

Browse files
committed
Update documentation for setting up with uv
1 parent 29437b3 commit 6491bdf

File tree

4 files changed

+61
-79
lines changed

4 files changed

+61
-79
lines changed

README.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,13 @@ See [examples](examples/README.md) for more information.
138138

139139
- [Serialize query plans using Substrait](https://github.com/apache/datafusion-python/blob/main/examples/substrait.py)
140140

141-
## How to install (from pip)
141+
## How to install
142+
143+
### uv
144+
145+
```bash
146+
uv add datafusion
147+
```
142148

143149
### Pip
144150

@@ -164,24 +170,21 @@ You can verify the installation by running:
164170

165171
## How to develop
166172

167-
This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin).
173+
This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). The Maturin tools used in this workflow can be installed either via `uv` or `pip`. Both approaches should offer the same experience. It is recommended to use `uv` since it has significant performance improvements
174+
over `pip`.
168175

169-
The Maturin tools used in this workflow can be installed either via Conda or Pip. Both approaches should offer the same experience. Multiple approaches are only offered to appease developer preference. Bootstrapping for both Conda and Pip are as follows.
170-
171-
Bootstrap (Conda):
176+
Bootstrap (`uv`):
172177

173178
```bash
174179
# fetch this repo
175180
git clone [email protected]:apache/datafusion-python.git
176-
# create the conda environment for dev
177-
conda env create -f ./conda/environments/datafusion-dev.yaml -n datafusion-dev
178-
# activate the conda environment
179-
conda activate datafusion-dev
181+
# create the virtual enviornment
182+
uv sync --dev --no-install-package datafusion
183+
# activate the environment
184+
source venv/bin/activate
180185
```
181186

182-
Or alternatively, if you are on an OS that supports CUDA Toolkit, you can use `-f ./conda/environments/datafusion-cuda-dev.yaml`.
183-
184-
Bootstrap (Pip):
187+
Bootstrap (`pip`):
185188

186189
```bash
187190
# fetch this repo
@@ -192,33 +195,40 @@ python3 -m venv venv
192195
source venv/bin/activate
193196
# update pip itself if necessary
194197
python -m pip install -U pip
195-
# install dependencies (for Python 3.8+)
196-
python -m pip install -r requirements.in
198+
# install dependencies
199+
python -m pip install -r pyproject.toml
197200
```
198201

199202
The tests rely on test data in git submodules.
200203

201204
```bash
202-
git submodule init
203-
git submodule update
205+
git submodule update --init
204206
```
205207

206208
Whenever rust code changes (your changes or via `git pull`):
207209

208210
```bash
209211
# make sure you activate the venv using "source venv/bin/activate" first
210-
maturin develop
212+
maturin develop --uv
211213
python -m pytest
212214
```
213215

216+
Alternatively if you are using `uv` you can do the following without
217+
needing to activate the virtual environment:
218+
219+
```bash
220+
uv maturin delelop --uv
221+
uv pytest .
222+
```
223+
214224
### Running & Installing pre-commit hooks
215225

216-
arrow-datafusion-python takes advantage of [pre-commit](https://pre-commit.com/) to assist developers with code linting to help reduce
226+
`datafusion-python` takes advantage of [pre-commit](https://pre-commit.com/) to assist developers with code linting to help reduce
217227
the number of commits that ultimately fail in CI due to linter errors. Using the pre-commit hooks is optional for the
218228
developer but certainly helpful for keeping PRs clean and concise.
219229

220230
Our pre-commit hooks can be installed by running `pre-commit install`, which will install the configurations in
221-
your ARROW_DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete
231+
your DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete
222232
the commit if an offending lint is found allowing you to make changes locally before pushing.
223233

224234
The pre-commit hooks can also be run adhoc without installing them by simply running `pre-commit run --all-files`
@@ -236,18 +246,8 @@ There are scripts in `ci/scripts` for running Rust and Python linters.
236246

237247
## How to update dependencies
238248

239-
To change test dependencies, change the `requirements.in` and run
240-
241-
```bash
242-
# install pip-tools (this can be done only once), also consider running in venv
243-
python -m pip install pip-tools
244-
python -m piptools compile --generate-hashes -o requirements-310.txt
245-
```
246-
247-
To update dependencies, run with `-U`
249+
To change test dependencies, change the `pyproject.toml` and run
248250

249251
```bash
250-
python -m piptools compile -U --generate-hashes -o requirements-310.txt
252+
uv sync --dev --no-install-package datafusion
251253
```
252-
253-
More details [here](https://github.com/jazzband/pip-tools)

dev/release/README.md

Lines changed: 2 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -218,28 +218,9 @@ uploading them using `twine`:
218218
twine upload --repository pypi dist-release/*
219219
```
220220

221-
### Publish Python Artifacts to Anaconda
221+
### Publish Python Artifacts to conda-forge
222222

223-
Publishing artifacts to Anaconda is similar to PyPi. First, Download the source tarball created in the previous step and untar it.
224-
225-
```bash
226-
# Assuming you have an existing conda environment named `datafusion-dev` if not see root README for instructions
227-
conda activate datafusion-dev
228-
conda build .
229-
```
230-
231-
This will setup a virtual conda environment and build the artifacts inside of that virtual env. This step can take a few minutes as the entire build, host, and runtime environments are setup. Once complete a local filesystem path will be emitted for the location of the resulting package. Observe that path and copy to your clipboard.
232-
233-
Ex: `/home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2`
234-
235-
Now you are ready to publish this resulting package to anaconda.org. This can be accomplished in a few simple steps.
236-
237-
```bash
238-
# First login to Anaconda with the datafusion credentials
239-
anaconda login
240-
# Upload the package
241-
anaconda upload /home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2
242-
```
223+
Pypi packages auto upload to conda-forge via [datafusion feedstock](https://github.com/conda-forge/datafusion-feedstock)
243224

244225
### Push the Release Tag
245226

dev/release/verify-release-candidate.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ setup_tempdir() {
106106
}
107107

108108
test_source_distribution() {
109-
# install rust toolchain in a similar fashion like test-miniconda
109+
# install rust toolchain
110110
export RUSTUP_HOME=$PWD/test-rustup
111111
export CARGO_HOME=$PWD/test-rustup
112112

docs/mdbook/src/installation.md

Lines changed: 27 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -18,44 +18,45 @@
1818

1919
DataFusion is easy to install, just like any other Python library.
2020

21-
## Using pip
21+
## Using uv
2222

23-
``` bash
24-
pip install datafusion
25-
```
23+
If you do not yet have a virtual environment, create one:
2624

27-
## Conda & JupyterLab setup
25+
```bash
26+
uv venv
27+
```
2828

29-
This section explains how to install DataFusion in a conda environment with other libraries that allow for a nice Jupyter workflow. This setup is completely optional. These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
29+
You can add datafusion to your virtual environment with the usual:
3030

31-
![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
31+
```bash
32+
uv pip install datafusion
33+
```
3234

33-
Create a conda environment with DataFusion, Jupyter, and other useful dependencies in the `datafusion-env.yml` file:
35+
Or, to add to a project:
3436

37+
```bash
38+
uv add datafusion
3539
```
36-
name: datafusion-env
37-
channels:
38-
- conda-forge
39-
- defaults
40-
dependencies:
41-
- python=3.9
42-
- ipykernel
43-
- nb_conda
44-
- jupyterlab
45-
- jupyterlab_code_formatter
46-
- isort
47-
- black
48-
- pip
49-
- pip:
50-
- datafusion
5140

41+
## Using pip
42+
43+
``` bash
44+
pip install datafusion
5245
```
5346

54-
Create the environment with `conda env create -f datafusion-env.yml`.
47+
## uv & JupyterLab setup
5548

56-
Activate the environment with `conda activate datafusion-env`.
49+
This section explains how to install DataFusion in a uv environment with other libraries that allow for a nice Jupyter workflow. This setup is completely optional. These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
5750

58-
Run `jupyter lab` or open the [JupyterLab Desktop application](https://github.com/jupyterlab/jupyterlab-desktop) to start running DataFusion in a Jupyter notebook.
51+
![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
52+
53+
Create a virtual environment with DataFusion, Jupyter, and other useful dependencies and start the desktop application.
54+
55+
```bash
56+
uv venv
57+
uv pip install datafusion jupyterlab jupyterlab_code_formatter
58+
uv run jupyter lab
59+
```
5960

6061
## Examples
6162

0 commit comments

Comments
 (0)