-
Notifications
You must be signed in to change notification settings - Fork 12
Test PR on all commits #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: empty
Are you sure you want to change the base?
Conversation
* Update README.md * FIX: lint --------- Co-authored-by: VincentAuriau <auriau.vincent@gmail.com>
* ADD: small enhancements * FIX: typo * ENH/FIX: plots
* ENH: few modifications in doc * ENH: Documentation Landing page
ADD: Custom KeyErrors with FeaturesStorage ADD: ArrayStorageIndexer to follow common structure ENH: ChoiceDataset does not automatically checks all IDs of FeaturesStorage, it is moved to a specific function that can be manually called FIX: ArrayStorage when directly instantiated from ndarray
* ENH: diverse improvements in the documentation * ENH: small enhancements in README
* ENH: small enhancements in README & Doc * ENH: higher TF & TFP version * ADD: few SimpleMNL tests
FIX: - ChoiceDataset with FeaturesStorage for availabilities - FeaturesStorage in the middle other features were ignoring last features ENH: - Faster OneHotStorage batching ADD: - Tests corresponding to fixes
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* FIX: rumnet regularization & evaluate * FIX: exact NLL now using custom re implementation
* ENH: description of tqdm bars with losses
GH Actions automatically comments a PyTest coverage report in Pull Requests
Few improvements --------- Co-authored-by: Vincent Auriau <auriau.vincent@gmail.com>
* ADD: possibility to add val weights in .fit() * ADD: corresponding tests * ADD: validation freq parameter in model.fit
* ADD: model get-set weights * update example with early stoppinp
| cat pytest-coverage.txt | ||
| - name: Pytest coverage comment | ||
| uses: VincentAuriau/pytest-coverage-comment@main |
Check warning
Code scanning / CodeQL
Unpinned tag for a non-immutable Action in workflow Medium
Uses Step
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Build draft PDF | ||
| uses: ./.github/actions/build-draft | ||
| with: | ||
| journal: joss | ||
| paper-path: docs/paper/paper.md | ||
|
|
||
| - name: Upload | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: paper | ||
| # This is the output path where Pandoc will write the compiled | ||
| # PDF. Note, this should be the same directory as the input | ||
| # paper.md | ||
| path: docs/paper/paper.pdf |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, fix this category of problem by explicitly setting the permissions key either at the workflow root (applies to all jobs that don’t override it) or under each job, granting only the scopes required (e.g., contents: read). This prevents the GITHUB_TOKEN from inheriting broader default permissions from the repository or organization.
For this workflow, the minimal non-breaking fix is to add a permissions block with contents: read. The natural place is at the top level of the workflow (between name: and on:), so all jobs inherit it. Alternatively, we could attach it directly under jobs.paper, but top-level keeps things simpler and still scoped to least privilege. No existing steps need to change: actions/checkout@v4 functions with contents: read, and actions/upload-artifact@v4 does not require repository write permissions because it works with artifacts, not repo contents. The only required edit is to .github/workflows/draft_paper.yml, inserting the permissions mapping.
-
Copy modified lines R3-R5
| @@ -1,5 +1,8 @@ | ||
| name: Paper draft to PDF | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| push: | ||
| branches: |
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install Python, choice-learn & run tests | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.10' | ||
|
|
||
| - name: Install from TestPyPI & run tests with installed package | ||
| id: install | ||
| run: | | ||
| python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ choice-learn | ||
| cd ../ | ||
| echo ${{ github.event.release.tag_name }} | ||
| python choice-learn/tests/manual_run.py | ||
| cd choice-learn | ||
| echo "b" | ||
| git fetch --all | ||
| BRANCH=$(git branch --list -r "origin/release_*" | tr '*' ' ') | ||
| echo $BRANCH | ||
| BRANCH="${BRANCH:9}" | ||
| echo $BRANCH | ||
| echo "BRANCH=$BRANCH" >> $GITHUB_OUTPUT | ||
| - name: publish to PyPI | ||
| uses: ./.github/actions/publish | ||
| with: | ||
| ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| PACKAGE_DIRECTORY: "./choice_learn/" | ||
| PYTHON_VERSION: "3.9" | ||
| PUBLISH_REGISTRY_PASSWORD: ${{ secrets.PYPI_PASSWORD }} | ||
| PUBLISH_REGISTRY_USERNAME: ${{ secrets.PYPI_USERNAME }} | ||
| UPDATE_CODE_VERSION: false | ||
| BRANCH: ${{ steps.install.outputs.BRANCH }} |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix this issue you explicitly declare a permissions block at the workflow or job level, tightening the GITHUB_TOKEN to the minimal scopes needed. For most read-only CI/test jobs, permissions: contents: read is sufficient. For jobs that publish releases, create tags, or modify pull requests, you selectively grant write on only the specific scopes required (e.g., contents: write, packages: write, or pull-requests: write).
For this specific workflow in .github/workflows/release_pypi.yaml, the job checks out code, installs from TestPyPI, runs tests, then calls a local publish composite action with ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}. The composite action is responsible for publishing to PyPI and, quite possibly, for interacting with the GitHub repository (e.g., pushing tags or commits). To avoid breaking existing functionality, we should not assume it needs only read access. The safest non-breaking change, while still following the recommendation, is to introduce an explicit permissions block that matches the current effective behavior. A common choice that mirrors legacy defaults is permissions: contents: write, which allows repository content modifications while remaining explicit. Since CodeQL’s suggested “minimal starting point” is contents: read, we can start from that and elevate just contents to write to preserve any potential write operations in the composite action.
Concretely, add a permissions block at the top level of the workflow (so it applies to all jobs) between the on: block and jobs: block:
permissions:
contents: writeNo additional imports or dependencies are needed; this is a pure YAML configuration change within .github/workflows/release_pypi.yaml.
-
Copy modified lines R10-R12
| @@ -7,6 +7,9 @@ | ||
| - completed | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write | ||
|
|
||
| jobs: | ||
| test-and-publish: | ||
| runs-on: ubuntu-latest |
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Publish choice-learn on TestPyPI | ||
| uses: ./.github/actions/publish | ||
| with: | ||
| ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| PACKAGE_DIRECTORY: "./choice_learn/" | ||
| PYTHON_VERSION: "3.9" | ||
| PUBLISH_REGISTRY_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }} | ||
| PUBLISH_REGISTRY_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }} | ||
| PUBLISH_REGISTRY: "https://test.pypi.org/legacy/" | ||
| UPDATE_CODE_VERSION: true | ||
| PUSH_BRANCH: release_${{ github.event.release.tag_name }} |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix this, explicitly declare GITHUB_TOKEN permissions in the workflow. At a minimum, add a permissions: block at the top level (alongside name: and on:) to set contents: read as the default for all jobs, following the principle of least privilege. Then, since this release workflow appears to push a branch (PUSH_BRANCH is provided to the publish action), the job that performs the publish should be given contents: write so it can push commits/tags as intended.
Concretely:
- Edit
.github/workflows/release_test_pypi.yaml. - Add a top‑level
permissions:section after thename:line that setscontents: read. This both satisfies CodeQL’s recommendation and documents the default minimal access. - Inside the
publish-service-client-packagejob, add apermissions:section that overrides the default and grantscontents: write, which is likely required for pushing therelease_${{ github.event.release.tag_name }}branch. Keep all existing steps and behavior unchanged.
No additional imports or external dependencies are required; only YAML changes to the workflow file.
-
Copy modified lines R2-R3 -
Copy modified lines R13-R14
| @@ -1,4 +1,6 @@ | ||
| name: Build and publish choice-learn on TestPyPI | ||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| release: | ||
| @@ -8,6 +10,8 @@ | ||
| jobs: | ||
| publish-service-client-package: | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: write | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Publish choice-learn on TestPyPI |
Summary of ChangesHello @VincentAuriau, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a major release, transforming a private choice modeling codebase into a public, feature-rich Python library. It consolidates extensive work into a well-structured project, offering a diverse set of discrete choice models, standardized data handling, and robust development tooling. The changes aim to provide researchers and practitioners with a powerful and user-friendly platform for formulating, estimating, and deploying choice models, complete with comprehensive documentation and examples. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a substantial amount of new functionality, establishing the core of the choice-learn library. It includes various choice models, data handling utilities, dataset loaders, and CI/CD configurations. The overall structure is well-thought-out, with a clear separation of concerns. However, there are several areas that need attention, particularly regarding CI/CD configuration correctness, performance of data processing code, and potential bugs in data handling. I've identified some critical issues in the GitHub Actions and pre-commit configurations that need to be addressed. Additionally, there are opportunities to significantly improve the performance of data preprocessing and negative sampling routines.
| exclude: ^(.svn|CVS|.bzr|.hg|.git|__pycache__|.tox|.ipynb_checkpoints|assets|tests/assets/|venv/|.venv/) | ||
| args: ["-x", "tests", --recursive, choice_learn] | ||
|
|
||
| exclude: ^(.svn|CVS|.bzr|.hg|.git|__pycache__|.tox|.ipynb_checkpoints|assets|tests/assets/|venv/|.venv/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exclude key is incorrectly indented. It should be a top-level key in the .pre-commit-config.yaml file, at the same level as repos and ci, for it to apply globally to all hooks. As it is now, it's part of the bandit repository configuration, which is not the correct syntax and will not work as intended. Please remove this line and add it at the root level of the file.
| Shape must be (batch_size,) | ||
| price_batch: np.ndarray | ||
| Batch of prices (integers) for each purchased item | ||
| Shape must be (batch_size,) | ||
| available_item_batch: np.ndarray | ||
| Batch of availability matrices (indicating the availability (1) or not (0) | ||
| of the products) (arrays) for each purchased item | ||
| Shape must be (batch_size, n_items) | ||
| Returns | ||
| ------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When concatenating datasets, the assortment index for trips from the other dataset needs to be updated. Since you are appending other.available_items to self.available_items, the indices for assortments from other will be shifted. You should iterate through other.trips and add len(self.available_items) to each trip's assortment index before concatenating the trip lists.
| negative_samples = tf.reshape( | ||
| tf.transpose( | ||
| tf.reshape( | ||
| tf.concat( | ||
| [ | ||
| self.get_negative_samples( | ||
| available_items=available_item_batch[idx], | ||
| purchased_items=basket_batch[idx], | ||
| future_purchases=future_batch[idx], | ||
| next_item=item_batch[idx], | ||
| n_samples=self.n_negative_samples, | ||
| ) | ||
| for idx in range(batch_size) | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a list comprehension to generate negative samples forces eager execution for each element in the batch. This prevents this part of the code from being compiled into an efficient TensorFlow graph and can cause significant performance bottlenecks, especially with large batch sizes. To ensure graph compatibility and improve performance, you should use tf.map_fn.
# Example of how to use tf.map_fn
negative_samples = tf.map_fn(
fn=lambda i: self.get_negative_samples(
available_items=available_item_batch[i],
purchased_items=basket_batch[i],
future_purchases=future_batch[i],
next_item=item_batch[i],
n_samples=self.n_negative_samples,
),
elems=tf.range(batch_size, dtype=tf.int64),
fn_output_signature=tf.TensorSpec(shape=(self.n_negative_samples,), dtype=tf.int32)
)| negative_samples = tf.stack( | ||
| [ | ||
| self.get_negative_samples( | ||
| available_items=available_item_batch[idx], | ||
| purchased_items=basket_batch[idx], | ||
| next_item=item_batch[idx], | ||
| n_samples=self.n_negative_samples, | ||
| ) | ||
| for idx in range(batch_size) | ||
| ], | ||
| axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a list comprehension to generate negative samples forces eager execution for each element in the batch. This prevents this part of the code from being compiled into an efficient TensorFlow graph and can cause significant performance bottlenecks, especially with large batch sizes. To ensure graph compatibility and improve performance, you should use tf.map_fn.
# Example of how to use tf.map_fn
negative_samples = tf.map_fn(
fn=lambda i: self.get_negative_samples(
available_items=available_item_batch[i],
purchased_items=basket_batch[i],
next_item=item_batch[i],
n_samples=self.n_negative_samples,
),
elems=tf.range(tf.shape(item_batch)[0], dtype=tf.int64),
fn_output_signature=tf.TensorSpec(shape=(self.n_negative_samples,), dtype=tf.int32)
)| # 2. Approximate the price of the items not in the trip with | ||
| # the price of the same item in the previous or next trip | ||
| for item_id in range(n_items): | ||
| if prices[item_id] == -1: | ||
| found_price = False | ||
| step = 1 | ||
| while not found_price: | ||
| # Proceed step by step to find the price of the item | ||
| # in the k-th previous or the k-th next trip | ||
| prev_session_id, prev_session_data = None, None | ||
| next_session_id, next_session_data = None, None | ||
|
|
||
| if trip_idx - step >= 0: | ||
| prev_session_id, prev_session_data = grouped_sessions[trip_idx - step] | ||
| if trip_idx + step < len(grouped_sessions): | ||
| next_session_id, next_session_data = grouped_sessions[trip_idx + step] | ||
|
|
||
| if ( | ||
| prev_session_data is not None | ||
| and item_id in prev_session_data["item_id"].tolist() | ||
| ): | ||
| # If item_id is in the previous trip, take the | ||
| # price of the item in the previous trip | ||
| if isinstance( | ||
| dataset.set_index(["item_id", "session_id"]).loc[ | ||
| (item_id, prev_session_id) | ||
| ]["price"], | ||
| pd.Series, | ||
| ): | ||
| # Then the price is a Pandas series (same value repeated) | ||
| prices[item_id] = ( | ||
| dataset.set_index(["item_id", "session_id"]) | ||
| .loc[(item_id, prev_session_id)]["price"] | ||
| .to_numpy()[0] | ||
| ) | ||
| else: | ||
| # Then the price is a scalar | ||
| prices[item_id] = dataset.set_index(["item_id", "session_id"]).loc[ | ||
| (item_id, prev_session_id) | ||
| ]["price"] | ||
| found_price = True | ||
|
|
||
| elif ( | ||
| next_session_data is not None | ||
| and item_id in next_session_data["item_id"].tolist() | ||
| ): | ||
| # If item_id is in the next session, take the | ||
| # price of the item in the next trip | ||
| if isinstance( | ||
| dataset.set_index(["item_id", "session_id"]).loc[ | ||
| (item_id, next_session_id) | ||
| ]["price"], | ||
| pd.Series, | ||
| ): | ||
| # Then the price is a Pandas series (same value repeated) | ||
| prices[item_id] = ( | ||
| dataset.set_index(["item_id", "session_id"]) | ||
| .loc[(item_id, next_session_id)]["price"] | ||
| .to_numpy()[0] | ||
| ) | ||
| else: | ||
| # Then the price is a scalar | ||
| prices[item_id] = dataset.set_index(["item_id", "session_id"]).loc[ | ||
| (item_id, next_session_id) | ||
| ]["price"] | ||
| found_price = True | ||
|
|
||
| if trip_idx - step < 0 and trip_idx + step >= len(grouped_sessions): | ||
| # Then we have checked all possible previous and next trips | ||
| break | ||
|
|
||
| step += 1 | ||
|
|
||
| if not found_price: | ||
| prices[item_id] = 1 # Or another default value > 0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The price imputation logic involves iterating through sessions and items, which is highly inefficient and will be very slow for large datasets. This can be significantly optimized by using vectorized pandas operations. You can group by item_id, sort by session (assuming session IDs are sequential in time), and then use ffill() and bfill() to propagate prices forward and backward. This will be orders of magnitude faster.
Example:
# Assuming 'dataset' is sorted by session_id
dataset['price'] = dataset.groupby('item_id')['price'].transform(lambda x: x.ffill().bfill())
# Handle remaining NaNs if any (e.g., items with no price at all)
dataset['price'].fillna(1, inplace=True)| vname=${vname:1} | ||
| echo $vname | ||
| sed -i -r 's/__version__ *= *".*"/__version__ = "'"$vname"'"/g' ${{ inputs.PACKAGE_DIRECTORY }}__init__.py | ||
| sed -i '0,/version =.*/s//version = "'"$vname"'"/' ./pyproject.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- | ||
| name: Bug report | ||
| about: Report a bug you have encountered | ||
| title: "[BUG]" | ||
| labels: bug | ||
| assignees: '' | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains two frontmatter sections (the parts enclosed in ---). This is invalid for GitHub issue templates and will likely cause parsing issues. The second frontmatter block seems more complete. Please remove the first, redundant block. The same issue exists in feature_request.md and question.md.
| --- | |
| name: Bug report | |
| about: Report a bug you have encountered | |
| title: "[BUG]" | |
| labels: bug | |
| assignees: '' | |
| --- | |
| --- | |
| name: 🐛 Bug report | |
| about: If something isn't working 🔧 | |
| title: '' | |
| labels: bug | |
| assignees: | |
| --- | |
| --- | ||
| name: Question | ||
| about: Any question about Choice-Learn? | ||
| title: '' | ||
| labels: question | ||
| assignees: '' | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains two frontmatter sections (the parts enclosed in ---). This is invalid for GitHub issue templates and will likely cause parsing issues. The second frontmatter block seems more complete. Please remove the first, redundant block.
| --- | |
| name: Question | |
| about: Any question about Choice-Learn? | |
| title: '' | |
| labels: question | |
| assignees: '' | |
| --- | |
| --- | |
| name: ❓ Question | |
| about: Any question about Choice-Learn? | |
| title: '' | |
| labels: question | |
| assignees: | |
| --- | |
| --- | ||
| name: Feature request | ||
| about: Suggest an idea to improve Choice-Learn | ||
| title: "[ADD]" | ||
| labels: new feature | ||
| assignees: '' | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains two frontmatter sections (the parts enclosed in ---). This is invalid for GitHub issue templates and will likely cause parsing issues. The second frontmatter block seems more complete. Please remove the first, redundant block.
| --- | |
| name: Feature request | |
| about: Suggest an idea to improve Choice-Learn | |
| title: "[ADD]" | |
| labels: new feature | |
| assignees: '' | |
| --- | |
| --- | |
| name: 🚀 Feature request | |
| about: Suggest an idea for this project 🏖 | |
| title: '' | |
| labels: enhancement | |
| assignees: | |
| --- | |
| """Different classes to optimize RAM usage with repeated features over time.""" | ||
|
|
||
| import numpy as np | ||
|
|
||
| from choice_learn.data.indexer import OneHotStoreIndexer, StoreIndexer | ||
|
|
||
|
|
||
| class Store: | ||
| """Class to keep OneHotStore and FeaturesStore with same parent.""" | ||
|
|
||
| def __init__(self, indexes=None, values=None, sequence=None, name=None, indexer=StoreIndexer): | ||
| """Build the store. | ||
| Parameters | ||
| ---------- | ||
| indexes : array_like or None | ||
| list of indexes of features to store. If None is given, indexes are created from | ||
| apparition order of values | ||
| values : array_like | ||
| list of values of features to store | ||
| sequence : array_like | ||
| sequence of apparitions of the features | ||
| name: string, optional | ||
| name of the features store -- not used at the moment | ||
| """ | ||
| if indexes is None: | ||
| indexes = list(range(len(values))) | ||
| self.store = {k: v for (k, v) in zip(indexes, values)} | ||
| self.sequence = np.array(sequence) | ||
| self.name = name | ||
|
|
||
| if sequence is not None and values is not None: | ||
| try: | ||
| width = len(values[0]) | ||
| except TypeError: | ||
| width = 1 | ||
| self.shape = (len(sequence), width) | ||
|
|
||
| self.indexer = indexer(self) | ||
|
|
||
| def _get_store_element(self, index): | ||
| """Getter method over self.sequence. | ||
| Returns the features stored at index index. Compared to __getitem__, it does take | ||
| the index-th element of sequence but the index-th element of the store. | ||
| Parameters | ||
| ---------- | ||
| index : (int, list, slice) | ||
| index argument of the feature | ||
| Returns | ||
| ------- | ||
| array_like | ||
| features corresponding to the index index in self.store | ||
| """ | ||
| if isinstance(index, list): | ||
| return [self.store[i] for i in index] | ||
| # else: | ||
| return self.store[index] | ||
|
|
||
| def __len__(self): | ||
| """Return the length of the sequence of apparition of the features.""" | ||
| return len(self.sequence) | ||
|
|
||
| @property | ||
| def batch(self): | ||
| """Indexing attribute.""" | ||
| return self.indexer | ||
|
|
||
|
|
||
| class FeaturesStore(Store): | ||
| """Base class to store features and a sequence of apparitions. | ||
| Mainly useful when features are repeated frequently over the sequence. | ||
| An example would be to store the features of a customers (supposing that the same customers come | ||
| several times over the work sequence) and to save which customer is concerned for each choice. | ||
| Attributes | ||
| ---------- | ||
| store : dict | ||
| Dictionary stocking features that can be called from indexes: {index: features} | ||
| shape : tuple | ||
| shape of the features store: (sequence_length, features_number) | ||
| sequence : array_like | ||
| List of elements of indexes representing the sequence of apparitions of the features | ||
| name: string, optional | ||
| name of the features store -- not used at the moment | ||
| dtype: type | ||
| type of the features | ||
| """ | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, values_dict, sequence): | ||
| """Instantiate the FeaturesStore from a dictionary of values. | ||
| Parameters | ||
| ---------- | ||
| values_dict : dict | ||
| dictionary of values to store, {index: value} | ||
| sequence : array_like | ||
| sequence of apparitions of the features | ||
| Returns | ||
| ------- | ||
| FeaturesStore created from the values in the dictionnary | ||
| """ | ||
| # Check uniform shape of values | ||
| return cls( | ||
| indexes=list(values_dict.keys()), values=list(values_dict.values()), sequence=sequence | ||
| ) | ||
|
|
||
| @classmethod | ||
| def from_list(cls, values_list, sequence): | ||
| """Instantiate the FeaturesStore from a list of values. | ||
| Creates indexes for each value | ||
| Parameters | ||
| ---------- | ||
| values_list : list | ||
| List of values to store | ||
| sequence : array_like | ||
| sequence of apparitions of the features | ||
| Returns | ||
| ------- | ||
| FeaturesStore | ||
| """ | ||
| # Check uniform shape of list | ||
| # Useful ? To rethink... | ||
| return cls(indexes=list(range(len(values_list))), values=values_list, sequence=sequence) | ||
|
|
||
| def __getitem__(self, sequence_index): | ||
| """Subsets self with sequence_index. | ||
| Parameters | ||
| ---------- | ||
| sequence_index : (int, list, slice) | ||
| index position of the sequence | ||
| Returns | ||
| ------- | ||
| array_like | ||
| features corresponding to the sequence_index-th position of sequence | ||
| """ | ||
| if isinstance(sequence_index, int): | ||
| sequence_index = [sequence_index] | ||
| new_sequence = self.sequence[sequence_index] | ||
| store = {} | ||
| for k, v in self.store.items(): | ||
| if k in new_sequence: | ||
| store[k] = v | ||
| else: | ||
| print(f"Key {k} of store with value {v} not in sequence anymore") | ||
|
|
||
| return FeaturesStore.from_dict(store, new_sequence) | ||
|
|
||
| def astype(self, dtype): | ||
| """Change the dtype of the features. | ||
| The type of the features should implement the astype method. | ||
| Typically, should work like np.ndarrays. | ||
| Parameters | ||
| ---------- | ||
| dtype : str or type | ||
| type to set the features as | ||
| """ | ||
| for k, v in self.store.items(): | ||
| self.store[k] = v.astype(dtype) | ||
|
|
||
|
|
||
| class OneHotStore(Store): | ||
| """Specific FeaturesStore for one hot features storage. | ||
| Inherits from FeaturesStore. | ||
| For example can be used to store a OneHot representation of the days of week. | ||
| Has the same attributes as FeaturesStore, only differs whit some One-Hot optimized methods. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| indexes=None, | ||
| values=None, | ||
| sequence=None, | ||
| name=None, | ||
| dtype=np.float32, | ||
| ): | ||
| """Build the OneHot features store. | ||
| Parameters | ||
| ---------- | ||
| indexes : array_like or None | ||
| list of indexes of features to store. If None is given, indexes are created from | ||
| apparition order of values | ||
| values : array_like or None | ||
| list of values of features to store that must be One-Hot. If None given they are created | ||
| from order of apparition in sequence | ||
| sequence : array_like | ||
| sequence of apparitions of the features | ||
| name: string, optional | ||
| name of the features store -- not used at the moment | ||
| """ | ||
| self.name = name | ||
| self.sequence = np.array(sequence) | ||
|
|
||
| if values is None: | ||
| self = self.from_sequence(sequence) | ||
| else: | ||
| self.store = {k: v for (k, v) in zip(indexes, values)} | ||
| self.shape = (len(sequence), np.max(values) + 1) | ||
|
|
||
| self.dtype = dtype | ||
| self.indexer = OneHotStoreIndexer(self) | ||
|
|
||
| @classmethod | ||
| def from_sequence(cls, sequence): | ||
| """Create a OneHotFeatureStore from a sequence of apparition. | ||
| One Hot vector are created from the order of apparition in the sequence: feature vectors | ||
| created have a length of the number of different values in the sequence and the 1 is | ||
| positioned in order of first appartitions in the sequence. | ||
| Parameters | ||
| ---------- | ||
| sequence : array-like | ||
| Sequence of apparitions of values, or indexes. Will be used to index self.store | ||
| Returns | ||
| ------- | ||
| FeatureStore | ||
| Created from the sequence. | ||
| """ | ||
| all_indexes = np.unique(sequence) | ||
| values = np.arange(len(all_indexes)) | ||
| return cls(indexes=all_indexes, values=values, sequence=sequence) | ||
|
|
||
| def __getitem__(self, sequence_index): | ||
| """Get an element at sequence_index-th position of self.sequence. | ||
| Parameters | ||
| ---------- | ||
| sequence_index : (int, list, slice) | ||
| index from sequence of element to get | ||
| Returns | ||
| ------- | ||
| np.ndarray | ||
| OneHot features corresponding to the sequence_index-th position of sequence | ||
| """ | ||
| if isinstance(sequence_index, int): | ||
| sequence_index = [sequence_index] | ||
| new_sequence = self.sequence[sequence_index] | ||
| store = {} | ||
| for k, v in self.store.items(): | ||
| if k in new_sequence: | ||
| store[k] = v | ||
| else: | ||
| print(f"Key {k} of store with value {v} not in sequence anymore") | ||
|
|
||
| return OneHotStore( | ||
| indexes=list(store.keys()), values=list(store.values()), sequence=new_sequence | ||
| ) | ||
|
|
||
| def astype(self, dtype): | ||
| """Change (mainly int or float) type of returned OneHot features vectors. | ||
| Parameters | ||
| ---------- | ||
| dtype : type | ||
| Type to set the features as | ||
| """ | ||
| self.dtype = dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems to be a duplicate or an alternative implementation of the functionality in choice_learn/data/storage.py. The ChoiceDataset class uses storage.py. To avoid code duplication and potential confusion for future maintainers, it would be best to consolidate these into a single implementation. If store.py is no longer used, it should be removed.
Coverage Report for Python 3.11
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report for Python 3.10
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report for Python 3.12
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No description provided.