How to simulate evaluation metrics? #736

korchi · 2023-08-03T15:03:45Z

korchi
Aug 3, 2023

Hi. First of all, thank you for the great tool you are developing. However, I am puzzled already for a week about how can I replicate trainer.evaluation() results with inferencing the model.

My initial idea was to truncate every sessions by one (removing last item_id), call trainer.predict(truncated_sessions), and then compute recall(last_item_ids, predictions[:20]). However, I am getting different recall metric.

The only way I managed to "replicate" evaluate() results is by: (1) providing not-truncated inputs to the trainer.predict() and (2) changing -1 into -2 in

Transformers4Rec/transformers4rec/torch/model/prediction_task.py

Line 460 in 348c963

last_item_sessions = non_pad_mask.sum(dim=1) - 1

.
I am puzzled why, but this was the only way I could ensure that the x in

Transformers4Rec/transformers4rec/torch/model/prediction_task.py

Line 464 in 348c963

x, _ = self.pre(x) # type: ignore

(during inference) is the same as x in

Transformers4Rec/transformers4rec/torch/model/prediction_task.py

Line 444 in 348c963

    
           x, y = self.pre(x, targets=y, training=training, testing=testing)  # type: ignore

(during evaluation).

Is it because trainer.evaluate() shifts the inputs to the left by one position? Or what am I doing incorrectly? Could any provide me insights how to do it "correctly", please?

Thanks a lot.

karunaahuja · 2024-08-01T17:43:39Z

karunaahuja
Aug 1, 2024

@korchi - Could you figure it out? This seems like a big issue in getting the online performance at par with offline one

1 reply

korchi Aug 2, 2024
Author

HI @karunaahuja . Yes, I created a patch file for mask.py (find it in the attachment) which fixed the problem. Nvidia haven't collaborated with me too much on this problem.
mask.patch
Let me know, if it fixed your issue, i'll try to push it there again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to simulate evaluation metrics? #736

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to simulate evaluation metrics? #736

Uh oh!

korchi Aug 3, 2023

Replies: 1 comment · 1 reply

Uh oh!

karunaahuja Aug 1, 2024

Uh oh!

korchi Aug 2, 2024 Author

korchi
Aug 3, 2023

Replies: 1 comment 1 reply

karunaahuja
Aug 1, 2024

korchi Aug 2, 2024
Author