Skip to content

[BUG] Single k in top_ks metric leads to an error caused by iteration over 0-d tensor #464

@juriwiens

Description

@juriwiens

Bug description

Using a single k as top_ks parameter to a metric like e.g. RecallAt(top_ks=[4]) leads to a TypeError: iteration over a 0-d tensor when zip(metric, topks[name]) is run in the NextItemPredictionTask.compute_metrics method because metric is a 0-d (torch) tensor in this case.

Steps/Code to reproduce bug

  1. Import the torch variant and create a NextItemPredictionTask with a metric using a single value top_ks like e.g.:
from transformers4rec import torch as tr

prediction_task = tr.NextItemPredictionTask(
    hf_format=True,
    weight_tying=True,
    metrics=[tr.ranking_metric.RecallAt(top_ks=[4], labels_onehot=True)]
)
  1. Use prediction_task to run a training with a Trainer having compute_metrics=True and using the fit_and_evaluate function from examples_utils:
from transformers4rec.torch.utils.examples_utils import fit_and_evaluate

input_module = tr.TabularSequenceFeatures.from_schema(
    schema,
    max_sequence_length=20,
    aggregation="concat",
    d_output=64,
    masking="clm",
)

transformer_config = tr.XLNetConfig.build(
    d_model=64, n_head=1, n_layer=1, total_seq_length=20
)

model = transformer_config.to_torch_model(input_module, prediction_task)

training_args = tr.trainer.T4RecTrainingArguments(
    output_dir="./tmp",
    max_sequence_length=20,
    data_loader_engine='nvtabular',
    num_train_epochs=10,
    dataloader_drop_last=False,
    per_device_train_batch_size = 384,
    per_device_eval_batch_size = 512,
    learning_rate=0.0005,
    fp16=True,
    report_to = [],
    logging_steps=200
)

recsys_trainer = tr.Trainer(
    model=model,
    args=training_args,
    schema=schema,
    compute_metrics=True
)

# SESSIONS_BY_DAY_PATH = ...

aot_results = fit_and_evaluate(
    recsys_trainer,
    start_time_index=1,
    end_time_index=3,
    input_dir=SESSIONS_BY_DAY_PATH
)

Expected behavior

Metrics evaluation should not lead to an error.

Environment details

  • Transformers4Rec version: 0.1.8
  • Platform: linux
  • Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) \n[GCC 10.3.0]
  • Huggingface Transformers version: 4.12.0
  • PyTorch version (GPU?): 1.12.0a0+bd13bc6
  • Tensorflow version (GPU?): /

Additional context

Maybe just metric.unsqueeze(0) if metric.dim() == 0? :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0bugSomething isn't workings3

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions