Skip to content

Commit 710e3e2

Browse files
Detector datasets splits (#103)
* Sort categories dict in loaded dataset by key * Add types to licenses in COCO schema * Add image width and height as required in COCO file (for consistency with schema) * Convert to torch datasets * Add examples infrastructure and proto example * Add subsetsum notebook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Dataset split with subset sum * Add time estimate * Draft detectors datasets module * Split by video-pair * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update .gitignore for exampels * Delete execution times file. Do not mock torch in docs build * Add torch_dataset_to_annotations_dataset (WIP) * Rename example * Polish annotations as torch dataset example * Select top 2 categories. Make sharp bits more explicit * Simplify example. Add bbox conversion using torch utils * Add note to example * Use Counter rather than itertools (itertools counts consecutive instances). Use tolerance in number of samples. Use moren intuitive variable names. Sort by count if not shuffled * Add dataset splits example (WIP) * Rrename approximate_subset_sum to _approximate_subset_sum * Define examples order in doc config * Sort examples by filename * Update example title * Add loguru dependency and clean up Sphinx gallery configuration * Improved approximate subset sum (WIP) * Extend and filter list in one step * Update example * Refactor dataset splitting logic to handle empty subsets and maintain input order in return values. * Updates to example * Reduce example * Move SubsetDict type definition * Remove torch conversion bits. Rename examples and set order in docs config. * Add some tests (WIP) * More tests (WIP) * Add docstring. Fix group ids as int * Simplify casting group IDs as int * Update docstring example * Rename module. Update docstrings * Add check and warning to random split. Return subsets in order of fractions. Rename input parameter * Rename ds input parameter in saving function * Expand tests * Add test for warning * Add pytest-loguru to development dependencies * Review and small edits to coco_bbox_ethology_and_movement.py * Edit example intro * Add sklearn groupkfold version and preliminary tests * Small edits to COCO example * Refactor dataset splitting functions and update tests to use new names. The functions `split_dataset_group_by_sklearn` and `split_dataset_group_by` have been renamed to `_split_dataset_group_by_kfold` and `_split_dataset_group_by_apss`, respectively. Corresponding test functions have also been updated to reflect these changes, and new tests for k-fold splitting with seed functionality have been added. * Add docstring with examples * Add wrapper function to split group by * Fix tests * Add test to check method dispatch * Add test to check auto delegates correctly * Add test for unknown method * Fix docstring example * Small edits to API * Add logger info for method and test * Fix movement link in docs * Review example * Fix movement link * Fix movement link --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 67ff878 commit 710e3e2

File tree

10 files changed

+1924
-35
lines changed

10 files changed

+1924
-35
lines changed

MANIFEST.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ recursive-exclude * *.py[co]
99
recursive-exclude docs *
1010
recursive-exclude tests *
1111
recursive-exclude examples *
12-
recursive-include docs *.md *.rst *.py
12+
recursive-include docs *
1313

1414
# Include json schemas
1515
recursive-include ethology/io/annotations/json_schemas/schemas *.json

docs/source/conf.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
1-
# Configuration file for the Sphinx documentation builder.
21
"""Sphinx configuration for ethology documentation."""
32

43
import os
54
import sys
65
from importlib.metadata import version as get_version
76

7+
from sphinx_gallery import sorting
8+
89
# Used when building API docs, put the dependencies
910
# of any class you are documenting here
10-
autodoc_mock_imports: list[str] = ["cv2", "torch"]
11+
autodoc_mock_imports: list[str] = ["cv2"]
1112

1213
# Add the module path to sys.path here.
1314
# If the directory is relative to the documentation root,
@@ -179,7 +180,8 @@
179180
"matplotlib": ("https://matplotlib.org/stable/", None),
180181
"numpy": ("https://numpy.org/doc/stable/", None),
181182
"pandera": ("https://pandera.readthedocs.io/en/stable/", None),
182-
"movement": ("https://movement.neuroinformatics.dev/", None),
183+
"movement": ("https://movement.neuroinformatics.dev/latest/", None),
184+
"sklearn": ("https://scikit-learn.org/stable/", None),
183185
}
184186

185187

@@ -207,6 +209,13 @@
207209

208210
sphinx_gallery_conf = {
209211
"examples_dirs": ["../../examples"],
212+
"within_subsection_order": sorting.ExplicitOrder(
213+
[
214+
"coco_bbox_ethology_and_movement.py",
215+
"approximate_subset_sum_split.py",
216+
"*",
217+
]
218+
),
210219
"filename_pattern": "/*.py", # which files to execute before inclusion
211220
"gallery_dirs": ["examples"], # output directory
212221
"run_stale_examples": True, # re-run examples on each build

ethology/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
from importlib.metadata import PackageNotFoundError, version
2-
32
import xarray as xr
3+
from pathlib import Path
44

55
# Set xarray attributes collapsed by default
66
xr.set_options(display_expand_attrs=False)
77

8+
# Set cache directory for ethology package
9+
ETHOLOGY_CACHE_DIR = Path.home() / ".ethology"
10+
ETHOLOGY_CACHE_DIR.mkdir(parents=True, exist_ok=True)
11+
812
try:
913
__version__ = version("ethology")
1014
except PackageNotFoundError:

ethology/datasets/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)