-
Notifications
You must be signed in to change notification settings - Fork 161
Open
Description
Description
I've been trying out various cornac models and had some success. However, running the EASE model on a training set with ~600k ratings never succeeds. Memory consumption starts at around 500MB, then quickly grows to ~60GB until, at some point, the process is killed.

In which platform does it happen?
MacOS 15.2 running on an M1 Pro with 16GB of memory.
How do we replicate the issue?
Minimal example:
import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import EASE
from cornac.metrics import NDCG
import pandas as pd
path = "training_data/training_data_ratings_20241218_224839.parquet.snappy"
df_original = pd.read_parquet(path)
print("Loaded data")
# Convert dataframe to list of tuples (user_id, item_id, rating)
data = df_original[["userID", "itemID", "rating"]].values.tolist()
rs = RatioSplit(data, test_size=0.15, val_size=0.1, rating_threshold=3.0)
print(f"{len(data)} ratings: {rs.train_size} training and {rs.test_size} test")
ndcg = NDCG(k=10)
metrics = [ndcg]
ease = EASE()
models = [ease]
cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True, verbose=True).run()
print("Done!")
ease.recommend("my_user_id", k=10) # Never reaches this pointExpected behavior (i.e. solution)
The experiment should run successfully and output the results. 600k training samples isn't that much, and the model is extremely simple. I don't see how it would need this much memory.
Metadata
Metadata
Assignees
Labels
No labels