-
Notifications
You must be signed in to change notification settings - Fork 25
feat(pkg-py): Add a new .data module
#124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| from seaborn import load_dataset | ||
| from querychat import QueryChat | ||
| from querychat.data import titanic | ||
|
|
||
| titanic = load_dataset("titanic") | ||
| titanic = titanic() | ||
| qc = QueryChat(titanic, "titanic") | ||
cpsievert marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| app = qc.app() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| """ | ||
| Sample datasets for getting started with querychat. | ||
|
|
||
| This module provides easy access to sample datasets that can be used with QueryChat | ||
| to quickly get started without needing to install additional dependencies. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from importlib.resources import files | ||
|
|
||
| import pandas as pd | ||
|
|
||
|
|
||
| def titanic() -> pd.DataFrame: | ||
| """ | ||
| Load the Titanic dataset. | ||
|
|
||
| This dataset contains information about passengers on the Titanic, including | ||
| whether they survived, their class, age, sex, and other demographic information. | ||
|
|
||
| Returns | ||
| ------- | ||
| pandas.DataFrame | ||
| A DataFrame with 891 rows and 15 columns containing Titanic passenger data. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> from querychat.data import titanic | ||
| >>> from querychat import QueryChat | ||
| >>> df = titanic() | ||
| >>> qc = QueryChat(df, "titanic") | ||
| >>> app = qc.app() | ||
|
|
||
| """ | ||
| # Get the path to the gzipped CSV file using importlib.resources | ||
| data_file = files("querychat.data") / "titanic.csv.gz" | ||
| return pd.read_csv(str(data_file), compression="gzip") | ||
|
|
||
|
|
||
| def tips() -> pd.DataFrame: | ||
| """ | ||
| Load the tips dataset. | ||
|
|
||
| This dataset contains information about restaurant tips, including the total | ||
| bill, tip amount, and information about the party (sex, smoker status, day, | ||
| time, and party size). | ||
|
|
||
| Returns | ||
| ------- | ||
| pandas.DataFrame | ||
| A DataFrame with 244 rows and 7 columns containing restaurant tip data. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> from querychat.data import tips | ||
| >>> from querychat import QueryChat | ||
| >>> df = tips() | ||
| >>> qc = QueryChat(df, "tips") | ||
| >>> app = qc.app() | ||
|
|
||
| """ | ||
| # Get the path to the gzipped CSV file using importlib.resources | ||
| data_file = files("querychat.data") / "tips.csv.gz" | ||
| return pd.read_csv(str(data_file), compression="gzip") | ||
|
|
||
|
|
||
| __all__ = ["tips", "titanic"] |
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| """Tests for the querychat.data module.""" | ||
|
|
||
| import pandas as pd | ||
| from querychat.data import tips, titanic | ||
|
|
||
|
|
||
| def test_titanic_returns_dataframe(): | ||
| """Test that titanic() returns a pandas DataFrame.""" | ||
| df = titanic() | ||
| assert isinstance(df, pd.DataFrame) | ||
|
|
||
|
|
||
| def test_titanic_has_expected_shape(): | ||
| """Test that the Titanic dataset has the expected number of rows and columns.""" | ||
| df = titanic() | ||
| assert df.shape == (891, 15), f"Expected (891, 15) but got {df.shape}" | ||
|
|
||
|
|
||
| def test_titanic_has_expected_columns(): | ||
| """Test that the Titanic dataset has the expected column names.""" | ||
| df = titanic() | ||
| expected_columns = [ | ||
| "survived", | ||
| "pclass", | ||
| "sex", | ||
| "age", | ||
| "sibsp", | ||
| "parch", | ||
| "fare", | ||
| "embarked", | ||
| "class", | ||
| "who", | ||
| "adult_male", | ||
| "deck", | ||
| "embark_town", | ||
| "alive", | ||
| "alone", | ||
| ] | ||
| assert list(df.columns) == expected_columns | ||
|
|
||
|
|
||
| def test_titanic_data_integrity(): | ||
| """Test basic data integrity of the Titanic dataset.""" | ||
| df = titanic() | ||
|
|
||
| # Check that survived column has only 0 and 1 values | ||
| assert set(df["survived"].dropna().unique()) <= {0, 1} | ||
|
|
||
| # Check that pclass has only 1, 2, 3 | ||
| assert set(df["pclass"].dropna().unique()) <= {1, 2, 3} | ||
|
|
||
| # Check that sex has only 'male' and 'female' | ||
| assert set(df["sex"].dropna().unique()) <= {"male", "female"} | ||
|
|
||
| # Check that fare is non-negative | ||
| assert (df["fare"].dropna() >= 0).all() | ||
|
|
||
|
|
||
| def test_titanic_creates_new_copy(): | ||
| """Test that titanic() returns a new copy each time it's called.""" | ||
| df1 = titanic() | ||
| df2 = titanic() | ||
|
|
||
| # They should not be the same object | ||
| assert df1 is not df2 | ||
|
|
||
| # But they should have the same data | ||
| assert df1.equals(df2) | ||
|
|
||
|
|
||
| def test_tips_returns_dataframe(): | ||
| """Test that tips() returns a pandas DataFrame.""" | ||
| df = tips() | ||
| assert isinstance(df, pd.DataFrame) | ||
|
|
||
|
|
||
| def test_tips_has_expected_shape(): | ||
| """Test that the tips dataset has the expected number of rows and columns.""" | ||
| df = tips() | ||
| assert df.shape == (244, 7), f"Expected (244, 7) but got {df.shape}" | ||
|
|
||
|
|
||
| def test_tips_has_expected_columns(): | ||
| """Test that the tips dataset has the expected column names.""" | ||
| df = tips() | ||
| expected_columns = [ | ||
| "total_bill", | ||
| "tip", | ||
| "sex", | ||
| "smoker", | ||
| "day", | ||
| "time", | ||
| "size", | ||
| ] | ||
| assert list(df.columns) == expected_columns | ||
|
|
||
|
|
||
| def test_tips_data_integrity(): | ||
| """Test basic data integrity of the tips dataset.""" | ||
| df = tips() | ||
|
|
||
| # Check that total_bill is positive | ||
| assert (df["total_bill"] > 0).all() | ||
|
|
||
| # Check that tip is non-negative | ||
| assert (df["tip"] >= 0).all() | ||
|
|
||
| # Check that sex has only expected values | ||
| assert set(df["sex"].dropna().unique()) <= {"Male", "Female"} | ||
|
|
||
| # Check that smoker has only expected values | ||
| assert set(df["smoker"].dropna().unique()) <= {"Yes", "No"} | ||
|
|
||
| # Check that size is positive | ||
| assert (df["size"] > 0).all() | ||
|
|
||
|
|
||
| def test_tips_creates_new_copy(): | ||
| """Test that tips() returns a new copy each time it's called.""" | ||
| df1 = tips() | ||
| df2 = tips() | ||
|
|
||
| # They should not be the same object | ||
| assert df1 is not df2 | ||
|
|
||
| # But they should have the same data | ||
| assert df1.equals(df2) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,4 +60,3 @@ def test_querychat_custom_id(sample_df): | |
| ) | ||
|
|
||
| assert qc.id == "custom_id" | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.