Missing file data preprocess and Bug in training task MRC

Hi.
I am trying to use the ViDeBERTa model to refine an MRC task on a ViQuAD dataset. However, according to the provided code, file _Finetuning/QA/extractive-qa-mrc/utils/preprocess.py_ is missing.

![Screenshot from 2023-04-24 17-53-23](https://user-images.githubusercontent.com/59108875/233976626-7aead4c5-e054-4d3f-b992-2c5e9689ae76.png)


Then, I used the load_dataset function of the datasets library instead, and got this error during model training.

```from transformers import RobertaForQuestionAnswering, TrainingArguments, Trainer
model_checkpoint = "Fsoft-AIC/videberta-base"
model = RobertaForQuestionAnswering.from_pretrained(model_checkpoint)

model_name = model_checkpoint.split("/")[-1]
args = TrainingArguments(
    f"{model_name}-finetuned-quad2.0",
    num_train_epochs=2.0,
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    warmup_ratio=0.05,
    weight_decay=0.01,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    load_best_model_at_end=True,
    save_strategy="epoch",
    save_total_limit=5,
    # do_train = True,
    # do_eval = False,
    #change the number of training epochs to get a better result
    #push_to_hub=True,
)

from transformers import default_data_collator
data_collator = default_data_collator

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_valid,
    data_collator=data_collator,
    tokenizer=tokenizer,
)
```



![Screenshot from 2023-04-24 17-57-53](https://user-images.githubusercontent.com/59108875/233977371-7c6d3ef8-13e2-41d6-b10e-65128d0af6af.png)


Looking forward to getting an answer to solve this problem.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing file data preprocess and Bug in training task MRC #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing file data preprocess and Bug in training task MRC #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions