Hi, thank you for taking the time to improve Snowflake's Snowpark Python or Snowpark pandas APIs!
Many questions can be answered by checking our docs or looking for already existing bug reports and enhancement requests on our issue tracker.
Please start by checking these first!
In that case we'd love to hear from you! Please open a new issue to get in touch with us.
We encourage everyone to first open a new issue to discuss any feature work or bug fixes with one of the maintainers. The following should help guide contributors through potential pitfalls.
We require our contributors to sign a CLA, available at https://github.com/snowflakedb/CLA/blob/main/README.md. A Github Actions bot will assist you when you open a pull request.
git clone <YOUR_FORKED_REPO>
cd snowpark-python-
Create a new Python virtual environment with any Python version that we support.
-
The Snowpark Python API supports Python 3.9, Python 3.10, Python 3.11, Python 3.12 and Python 3.13.
-
The Snowpark pandas API supports Python 3.9, Python 3.10, and Python 3.11. Additionally, Snowpark pandas requires Modin 0.36.x or 0.37.x, and pandas 2.2.x or 2.3.x.
conda create --name snowpark-dev python=3.9
-
-
Activate the new Python virtual environment. For example,
conda activate snowpark-dev
-
Go to the cloned repository root folder.
-
To install the Snowpark Python API in edit/development mode, use:
python -m pip install -e ".[development, pandas]" -
To install the Snowpark pandas API in edit/development mode, use:
python -m pip install -e ".[modin-development]"
The
-etellspipto install the library in edit, or development mode. -
You can use PyCharm, VS Code, or any other IDE. The following steps assume you use PyCharm, VS Code or any other similar IDE.
Download the newest community version of PyCharm and follow the installation instructions.
Download and install the latest version of VS Code
Open project and browse to the cloned git directory. Then right-click the directory src in PyCharm
and "Mark Directory as" -> "Source Root". NOTE: VS Code doesn't have "Source Root" so you can skip this step if you use VS Code.
Configure PyCharm interpreter or Configure VS Code interpreter to use the previously created Python virtual environment.
This section covers guidelines for developers that wish to contribute code to Session, ServerConnection, MockServerConnection and other related objects that are critical to correct functionality of snowpark-python.
- If the config parameter is set once during initialization and never changed, it is safe to add the parameter to the
Sessionobject. - If the config parameter can be updated by the user, and the update has side-effects during compilation i.e.
analyzer.analyze(),analyzer.resolve()etc, add a warning at config update usingwarn_session_config_update_in_multithreaded_mode.
Once you have decided that the new component being added with required protection during concurrent access, following can be used:
Session._thread_store,ServerConnection._thread_storearethreading.local()objects which can be used to store a per-thread instance of the component. The python connector cursor object is an example of this.Session._lockandServerConnection._lockareRLockobjects which can be used to serialize access to shared resources.Session.query_tagis an example of this.Session._package_lockis aRLockobject which can be used to protectpackagesandimportsfor stored procedures and user defined functions.Session._plan_lockis aRLockobject which can be used to serializeSnowflakePlanandSelectablemethod calls.SnowflakePlan.plan_stateis an example.QueryHistory(session, include_thread_id=True)can be used to log the query history with thread id.
An example PR to make auto temp table cleaner thread-safe can be found here.
If you are an open-source developer modifying existing Snowpark APIs (such as by adding a parameter to a Dataframe API), or creating new Snowpark APIs in your PR, please request a review from the snowpark-ir team and add the snowpark-ast label. You can also raise an issue on our issue tracker with the snowpark-ast label and assign it to the snowpark-ir team to request a review. We will add code to support detailed logging of the usage of your modified or newly created API if relevant to your PR. After we do so, we will also update your PR description by completing the required AST support acknowledgement checkbox.
If you are an internal developer, please ensure you complete the PR checklist for AST support found in the Snowpark Python AST developer guide, before completing the AST support acknowledgement checkbox.
The README under tests folder tells you how to set up to run tests.
If this happens to you do not panic! Any PRs originating from a fork will fail some automated tests. This is because forks do not have access to our repository's secrets. A maintainer will manually review your changes then kick off the rest of our testing suite. Feel free to tag @snowflakedb/snowpark-python-api or @snowflakedb/snowpark-pandas-api if you feel like we are taking too long to get to your PR.
Following tree diagram shows the high-level structure of the Snowpark pandas.
snowflake
└── snowpark
└── modin
└── pandas ← pandas API frontend layer
└── core
├── dataframe ← folder containing abstraction
│ for Modin frontend to DF-algebra
│── execution ← additional patching for I/O
└── plugin
├── _interal ← Snowflake specific internals
├── io ← Snowpark pandas IO functions
├── compiler ← query compiler, Modin -> Snowpark pandas DF
└── utils ← util classes from Modin, logging, …