Update documentation for setting up with uv

timsaucer · timsaucer · commit 6491bdf67ee8 · 2025-01-14T13:45:29.000-05:00
diff --git a/README.md b/README.md
@@ -138,7 +138,13 @@ See [examples](examples/README.md) for more information.
 
 - [Serialize query plans using Substrait](https://github.com/apache/datafusion-python/blob/main/examples/substrait.py)
 
-## How to install (from pip)
+## How to install
+
+### uv
+
+```bash
+uv add datafusion
+```
 
 ### Pip
 
@@ -164,24 +170,21 @@ You can verify the installation by running:
 
 ## How to develop
 
-This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin).
+This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). The Maturin tools used in this workflow can be installed either via `uv` or `pip`. Both approaches should offer the same experience. It is recommended to use `uv` since it has significant performance improvements
+over `pip`.
 
-The Maturin tools used in this workflow can be installed either via Conda or Pip. Both approaches should offer the same experience. Multiple approaches are only offered to appease developer preference. Bootstrapping for both Conda and Pip are as follows.
-
-Bootstrap (Conda):
+Bootstrap (`uv`):
 
 ```bash
 # fetch this repo
 git clone git@github.com:apache/datafusion-python.git
-# create the conda environment for dev
-conda env create -f ./conda/environments/datafusion-dev.yaml -n datafusion-dev
-# activate the conda environment
-conda activate datafusion-dev
+# create the virtual enviornment
+uv sync --dev --no-install-package datafusion
+# activate the environment
+source venv/bin/activate
 ```
 
-Or alternatively, if you are on an OS that supports CUDA Toolkit, you can use `-f ./conda/environments/datafusion-cuda-dev.yaml`.
-
-Bootstrap (Pip):
+Bootstrap (`pip`):
 
 ```bash
 # fetch this repo
@@ -192,33 +195,40 @@ python3 -m venv venv
 source venv/bin/activate
 # update pip itself if necessary
 python -m pip install -U pip
-# install dependencies (for Python 3.8+)
-python -m pip install -r requirements.in
+# install dependencies
+python -m pip install -r pyproject.toml
 ```
 
 The tests rely on test data in git submodules.
 
 ```bash
-git submodule init
-git submodule update
+git submodule update --init
 ```
 
 Whenever rust code changes (your changes or via `git pull`):
 
 ```bash
 # make sure you activate the venv using "source venv/bin/activate" first
-maturin develop
+maturin develop --uv
 python -m pytest
 ```
 
+Alternatively if you are using `uv` you can do the following without
+needing to activate the virtual environment:
+
+```bash
+uv maturin delelop --uv
+uv pytest .
+```
+
 ### Running & Installing pre-commit hooks
 
-arrow-datafusion-python takes advantage of [pre-commit](https://pre-commit.com/) to assist developers with code linting to help reduce
+`datafusion-python` takes advantage of [pre-commit](https://pre-commit.com/) to assist developers with code linting to help reduce
 the number of commits that ultimately fail in CI due to linter errors. Using the pre-commit hooks is optional for the
 developer but certainly helpful for keeping PRs clean and concise.
 
 Our pre-commit hooks can be installed by running `pre-commit install`, which will install the configurations in
-your ARROW_DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete
+your DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete
 the commit if an offending lint is found allowing you to make changes locally before pushing.
 
 The pre-commit hooks can also be run adhoc without installing them by simply running `pre-commit run --all-files`
@@ -236,18 +246,8 @@ There are scripts in `ci/scripts` for running Rust and Python linters.
 
 ## How to update dependencies
 
-To change test dependencies, change the `requirements.in` and run
-
-```bash
-# install pip-tools (this can be done only once), also consider running in venv
-python -m pip install pip-tools
-python -m piptools compile --generate-hashes -o requirements-310.txt
-```
-
-To update dependencies, run with `-U`
+To change test dependencies, change the `pyproject.toml` and run
 
 ```bash
-python -m piptools compile -U --generate-hashes -o requirements-310.txt
+uv sync --dev --no-install-package datafusion
 ```
-
-More details [here](https://github.com/jazzband/pip-tools)
diff --git a/dev/release/README.md b/dev/release/README.md
@@ -218,28 +218,9 @@ uploading them using `twine`:
 twine upload --repository pypi dist-release/*
 ```
 
-### Publish Python Artifacts to Anaconda
+### Publish Python Artifacts to conda-forge
 
-Publishing artifacts to Anaconda is similar to PyPi. First, Download the source tarball created in the previous step and untar it.
-
-```bash
-# Assuming you have an existing conda environment named `datafusion-dev` if not see root README for instructions
-conda activate datafusion-dev
-conda build .
-```
-
-This will setup a virtual conda environment and build the artifacts inside of that virtual env. This step can take a few minutes as the entire build, host, and runtime environments are setup. Once complete a local filesystem path will be emitted for the location of the resulting package. Observe that path and copy to your clipboard.
-
-Ex: `/home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2`
-
-Now you are ready to publish this resulting package to anaconda.org. This can be accomplished in a few simple steps.
-
-```bash
-# First login to Anaconda with the datafusion credentials
-anaconda login
-# Upload the package
-anaconda upload /home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2
-```
+Pypi packages auto upload to conda-forge via [datafusion feedstock](https://github.com/conda-forge/datafusion-feedstock)
 
 ### Push the Release Tag
 
diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
@@ -106,7 +106,7 @@ setup_tempdir() {
 }
 
 test_source_distribution() {
-  # install rust toolchain in a similar fashion like test-miniconda
+  # install rust toolchain
   export RUSTUP_HOME=$PWD/test-rustup
   export CARGO_HOME=$PWD/test-rustup
 
diff --git a/docs/mdbook/src/installation.md b/docs/mdbook/src/installation.md
@@ -18,44 +18,45 @@
 
 DataFusion is easy to install, just like any other Python library.
 
-## Using pip
+## Using uv
 
-``` bash
-pip install datafusion
-```
+If you do not yet have a virtual environment, create one:
 
-## Conda & JupyterLab setup
+```bash
+uv venv
+```
 
-This section explains how to install DataFusion in a conda environment with other libraries that allow for a nice Jupyter workflow.  This setup is completely optional.  These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
+You can add datafusion to your virtual environment with the usual:
 
-![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
+```bash
+uv pip install datafusion
+```
 
-Create a conda environment with DataFusion, Jupyter, and other useful dependencies in the `datafusion-env.yml` file:
+Or, to add to a project:
 
+```bash
+uv add datafusion
 ```
-name: datafusion-env
-channels:
-  - conda-forge
-  - defaults
-dependencies:
-  - python=3.9
-  - ipykernel
-  - nb_conda
-  - jupyterlab
-  - jupyterlab_code_formatter
-  - isort
-  - black
-  - pip
-  - pip:
-    - datafusion
 
+## Using pip
+
+``` bash
+pip install datafusion
 ```
 
-Create the environment with `conda env create -f datafusion-env.yml`.
+## uv & JupyterLab setup
 
-Activate the environment with `conda activate datafusion-env`.
+This section explains how to install DataFusion in a uv environment with other libraries that allow for a nice Jupyter workflow.  This setup is completely optional.  These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
 
-Run `jupyter lab` or open the [JupyterLab Desktop application](https://github.com/jupyterlab/jupyterlab-desktop) to start running DataFusion in a Jupyter notebook.
+![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
+
+Create a virtual environment with DataFusion, Jupyter, and other useful dependencies and start the desktop application.
+
+```bash
+uv venv
+uv pip install datafusion jupyterlab jupyterlab_code_formatter
+uv run jupyter lab
+```
 
 ## Examples
 

Original file line number	Diff line number	Diff line change
`@@ -106,7 +106,7 @@ setup_tempdir() {`
`106`	`106`	`}`
`107`	`107`
`108`	`108`	`test_source_distribution() {`
`109`		`- # install rust toolchain in a similar fashion like test-miniconda`
	`109`	`+ # install rust toolchain`
`110`	`110`	`export RUSTUP_HOME=$PWD/test-rustup`
`111`	`111`	`export CARGO_HOME=$PWD/test-rustup`
`112`	`112`