Skip to content

Conversation

@Eshaan-byte
Copy link

Implements #765 by adding two new methods to manage cache cleanup:

  • clear_cache(): Clears entire dataset cache including global event dataframe and all task caches
  • clear_task_cache(task=None): Clears only the specified task's cache while preserving global event cache and other task caches

Both methods handle non-existent caches gracefully and provide comprehensive logging.

Implements sunlabuiuc#765 by adding two new methods to manage cache cleanup:

- clear_cache(): Clears entire dataset cache including global event
  dataframe and all task caches
- clear_task_cache(task=None): Clears only the specified task's cache
  while preserving global event cache and other task caches

Both methods handle non-existent caches gracefully and provide
comprehensive logging.
Copy link
Collaborator

@EricSchrock EricSchrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for these new methods to tests/core/test_caching.py, both to prove it works as expected and to act as example code?

- Extract task cache directory path generation into _get_task_cache_dir()
  helper method for consistency between set_task() and clear_task_cache()
- Update clear_task_cache() to use the helper method
- Clarify in docstring that clear_task_cache() only clears default cache
  location, not custom cache_dir paths
- Add 5 comprehensive tests to tests/core/test_caching.py:
  * test_clear_cache_removes_all_caches
  * test_clear_cache_handles_nonexistent_cache
  * test_clear_task_cache_removes_only_specified_task
  * test_clear_task_cache_handles_nonexistent_cache
  * test_get_task_cache_dir_consistency

Addresses review feedback from @Logiquo and @EricSchrock on PR sunlabuiuc#770
@Logiquo
Copy link
Collaborator

Logiquo commented Jan 6, 2026

CI has failed.

Store cache paths as strings before calling clear methods to prevent
the cache_dir property from recreating directories when accessed after
clearing. This ensures the tests correctly verify that caches are removed.

Fixes CI test failures in test_clear_cache_removes_all_caches and
test_clear_task_cache_removes_only_specified_task.
@Logiquo Logiquo added core Core functionality (Patient API, BaseDataset, event stream format, etc.) component: dataset Contribute a new dataset to PyHealth labels Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: dataset Contribute a new dataset to PyHealth core Core functionality (Patient API, BaseDataset, event stream format, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants