-
Notifications
You must be signed in to change notification settings - Fork 243
Labels
datasetsDatasets and data loadersDatasets and data loadersenhancementNew feature, improvement request or other non-bug code enhancementNew feature, improvement request or other non-bug code enhancementgood first issueGood for newcomersGood for newcomers
Description
Describe the feature or idea you want to propose
Include loaders to the MONSTER datasets in the datasets module.
The only downside is that we would have to put huggingface as an optional dependency. I'm not sure if there is other channels to load the datasets from to avoid another dependency?
Describe your proposed solution
Code to load the datasets from huggingface
import numpy as np
from aeon.utils.numba.general import z_normalise_series_3d
from huggingface_hub import hf_hub_download
univariate_monster_datasets = [
"CornellWhaleChallenge",
"AudioMNIST",
"WhaleSounds",
"Pedestrian",
"FruitFlies",
"AudioMNIST-DS",
"Traffic",
"LakeIce",
"MosquitoSound",
"InsectSound",
]
def load_monster(dataset_name, fold, normalize=True):
repo_id = f"monster-monash/{dataset_name}"
# Download data
data_path = hf_hub_download(
repo_id=repo_id, filename=f"{dataset_name}_X.npy", repo_type="dataset"
)
X = np.load(data_path, mmap_mode="r") # (#Samples, #Channel, #Length)
if normalize:
X = z_normalise_series_3d(X)
# Download labels
label_filename = f"{dataset_name}_Y.npy"
try:
label_path = hf_hub_download(
repo_id=repo_id, filename=label_filename, repo_type="dataset"
)
except:
label_filename = f"{dataset_name}_y.npy"
label_path = hf_hub_download(
repo_id=repo_id, filename=label_filename, repo_type="dataset"
)
y = np.load(label_path)
# Load test indices
try:
test_index_path = hf_hub_download(
repo_id=repo_id,
filename=f"test_indices_fold_{fold}.txt",
repo_type="dataset",
)
test_index = np.loadtxt(test_index_path, dtype=int)
except Exception as e:
logger.error(f"Failed to load test indices: {e}")
raise
test_bool_index = np.zeros(len(y), dtype=bool)
test_bool_index[test_index] = True
return (
X[~test_bool_index],
y[~test_bool_index],
X[test_bool_index],
y[test_bool_index],
)Describe alternatives you've considered, if relevant
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
datasetsDatasets and data loadersDatasets and data loadersenhancementNew feature, improvement request or other non-bug code enhancementNew feature, improvement request or other non-bug code enhancementgood first issueGood for newcomersGood for newcomers