Kunaljubce/add decorators for functions#253
Kunaljubce/add decorators for functions#253kunaljubce wants to merge 45 commits intomrpowers-io:planning-1.0-releasefrom
Conversation
…a function and as a decorator
…d func definition of validate_schema()
…r code snippets relevant to quinn without making them visible to git
update column extension function names and desc in readme
|
I am looking into the pre-commit failures! |
quinn/dataframe_validator.py
Outdated
| ) -> function: | ||
| required_schema: StructType, | ||
| ignore_nullable: bool = False, | ||
| _df: DataFrame = None, |
There was a problem hiding this comment.
Why are we using private variables (I mean _df) naming convention for a public API (I mean function arguments)?
There was a problem hiding this comment.
Duh! Sorry, I meant to change this and completely forgot. Let me fix this.
Meanwhile, can we not have ruff-format as one of the pre-commit hooks? First - It's experimental and called out so; second - it seems to be reformatting a whole lot of files which are not part of this PR when I run it on local.
There was a problem hiding this comment.
@SemyonSinchenko Renamed _df to df_to_be_validated
There was a problem hiding this comment.
@kunaljubce In the CI/CD pipeline I see that only 1 file is improperly formatted. I see your .pre-commit-config.yaml contains unstaged changes, my guess that is causing your issue. Why is that changed, and what does the pre-commit-config.yaml look like?
There was a problem hiding this comment.
@fpgmaas The unstaged changes are because I added - id: ruff-format to my pre-commit-config.yaml. Here's a screen recording of my changes and pre-commit succeeding before ruff-format v/s pre-commit failing and fixing 9 files after ruff-format - https://ufile.io/792yfg0c
I could not upload the video here itself due to size restrictions.
There was a problem hiding this comment.
I don't think you should edit that file manually, did you try pulling or rebasing on top of planning-1.0-release?
If I look at your branch, you are still using an outdated version of ruff, see here. That is likely causing the issue you are seeing. In order to prevent that you have to update your branch with the changes on planning-1.0-release from this repo. e.g.
git fetch upstream
git rebase -i upstream/planning-1.0-releaseThere was a problem hiding this comment.
@kunaljubce Ah, that explains! Understandable mistake :) Good you were able to figure it out.
* According to one of the points in the issue mrpowers-io#199 by @MrPowers, this function should never have been created. * This particular commit removes this function and its references from the quinn repo.
Deprecate and remove `exists` and `forall` functions from the codebase. * Remove `exists` and `forall` functions from `quinn/functions.py`. * Remove import statements for `exists` and `forall` from `quinn/__init__.py`. * Remove tests related to `exists` and `forall` functions from `tests/test_functions.py`.
- Update deps - Update Ruff - Corresponding updates of pyproject - Slightly update Makefile - Drop Support of Spark2 - Drop Support of Spark 3.1-3.2 - Minimal python is 3.10 from now - Apply new ruff rules - Apply ruff linter On branch 202-drop-pyspark2 Changes to be committed: modified: .github/workflows/ci.yml modified: Makefile modified: poetry.lock modified: pyproject.toml modified: quinn/append_if_schema_identical.py modified: quinn/dataframe_helpers.py modified: quinn/keyword_finder.py modified: quinn/math.py modified: quinn/schema_helpers.py modified: quinn/split_columns.py modified: quinn/transformations.py
This commit introduces the following changes: * Updates the `ci.yml` file by introducing a new step under the `test` job to perform tests using Spark-Connect. * Creates a shell-script that downloads & installs Spark binaries and then runs the Spark-Connect server. * Creates a pytest module/file that tests a very simple function on Spark-Connect. * Updates the Makefile to add a new step for the Spark-Connect tests.
As per the review comment, the recently added dependencies such as Pyarrow, Pandas etc., are optional and not required for Spark-Classic. Update the pyproject.toml to reflect that and lock the poetry file
apply hotfix update lock file
Proposed changes
Took another stab at #140, as an extension to #144.
Types of changes
What types of changes does your code introduce to quinn?
Put an
xin the boxes that applyFurther comments: Implementation details for
validate_schemaSo we are implementing a decorator factory here so that our function can be used both as a decorator as well as a callable function. In this implementation:
The
validate_schemafunction acts as both a decorator factory and a decorator. It takesrequired_schema,ignore_nullable, and an optional_dfargument._dfis None, it means the function is being used as a decorator factory, and it returns the decorator decorator._dfis not None, it means the function is being called directly with a DataFrame, and it applies the decorator to_dfimmediately.When
validate_schemais called directly with a DataFrame, the validation logic gets executed by wrapping the DataFrame in a lambda function and immediately calling the decorator.