Skip to content

Conversation

@scosman
Copy link
Contributor

@scosman scosman commented Mar 19, 2025

Fixes #274

pyarrow has a few issues:

  • it's huge: about 100MB uncompressed
  • It's not compatible with all systems (Intel Macs)

This change allows client to exclude the pyarrow dep if they don't need it. It's only used for parquet file validation, which isn't needed by all users.

Note: I'm not removing the dependency- just making it run-time import. It still works as expected for all users, unless users go out of their way to manually exclude this dependency.

Have you read the Contributing Guidelines?
yes

Issue # #274

This allows client to exclude the pyarrow dep if they don't need it. Saved ~80MB and more compatible with older systems.

Will still get a runtime error if they exclude it, then try to use it.

Still works as expected unless users go out of their way to manually exclude this dependency (I'm not removing the dep, you need to manually exclude it).
@orangetin
Copy link
Member

@azahed98 @artek0chumak could you review this?

@scosman
Copy link
Contributor Author

scosman commented May 5, 2025

@orangetin I'd love to get this reviewed and integrated (or hear it's not going to make it so I can maintain my fork). Should be a quick 2 min review if you know the right folks.

Copy link
Member

@orangetin orangetin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! i'd like some changes before we can merge this:

  1. Move pyarrow an optional dependency in a new group in the pyproject.toml file so it doesn't get installed by default
  2. Add the try/except wrapper (see comment below)
  3. Add a small note in the readme about this


def _check_parquet(file: Path) -> Dict[str, Any]:
# in method import - this allows client to exclude the pyarrow dep if they don't need it. Saved ~80MB and more compatible with older systems.
from pyarrow import ArrowInvalid, parquet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wrap this in a try/except with details on how to install this with the dependency group? something like pip install together[parquet]

… to use parquet files.

Example Error
```
$ uv run python test_pyarrow.py
Expected ImportError: pyarrow is not installed and is required to use parquet files. Please install it via `pip install together[pyarrow]`
```

Confirmed installing resolves issue:
```
uv pip install "dist/together-1.5.0-py3-none-any.whl[pyarrow]"
Resolved 33 packages in 394ms
Installed 1 package in 30ms
 + pyarrow==20.0.0
```
@scosman
Copy link
Contributor Author

scosman commented Jun 2, 2025

@orangetin made those changes. It should be ready.

Copy link
Member

@orangetin orangetin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@orangetin
Copy link
Member

@scosman ty for making the changes, but it looks like the pre-merge checks are failing. could you run poetry lock and the formatter (instructions here)?

@scosman
Copy link
Contributor Author

scosman commented Jun 2, 2025

@orangetin done!

@orangetin orangetin merged commit 7e93fbc into togethercomputer:main Jun 2, 2025
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make pyarrow dependency optional

2 participants