RAGAS with huggingface models


**Describe the bug**
A clear and concise description of what the bug is.

I tried using RAGAS with a model that is not OpenAI. In general whatever model I use I get this error back:

```
File /opt/conda/lib/python3.10/site-packages/ragas/evaluation.py:237, in evaluate(dataset, metrics, llm, embeddings, callbacks, in_ci, is_async, run_config, raise_exceptions, column_map)
    235 results = executor.results()
    236 if results == []:
--> 237     raise ExceptionInRunner()
    239 # convert results to dataset_like
    240 for i, _ in enumerate(dataset):

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

/opt/conda/lib/python3.10/site-packages/ipykernel/iostream.py:123: RuntimeWarning: coroutine 'as_completed.<locals>.sema_coro' was never awaited
  await self._event_pipe_gc()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
```

Which I solved using this:

```
import nest_asyncio
nest_asyncio.apply()
```

However, it is not returning error but it is returning:
`{'faithfulness': nan, 'answer_relevancy': nan, 'context_utilization': nan}`



**Code to Reproduce**
```
import pandas as pd
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer
from langchain import HuggingFacePipeline
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    context_utilization
)
from ragas import evaluate
from datasets import Dataset

import nest_asyncio
nest_asyncio.apply()

# embedding model
embedding_model = SentenceTransformer("microsoft/mpnet-base")

# evaluator
model_id = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

device = 0  # Use GPU (0 is typically the first GPU device)

pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1,
    do_sample=True,
    max_new_tokens = 200,
    repetition_penalty=1.1  # without this output begins repeating

)

evaluator = HuggingFacePipeline(pipeline=pipe)

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
}
dataset = Dataset.from_dict(data_samples)

# ragas
result = evaluate(
    dataset=dataset,
    llm=evaluator,
    embeddings=embedding_model,
    raise_exceptions=False,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_utilization,
    ]
)

print(result)
```

**Error trace**
No error, but basically is not working
**Expected behavior**
It should return the evaluation metrics

Thank you very much for your help





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAGAS with huggingface models #1090

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RAGAS with huggingface models #1090

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions