Skip to content

Failed to parse output.Β #1228

@g-hano

Description

@g-hano

Describe the bug
Local LLMs either raise Timeout error or Fails to parse output.

Ragas version: 0.1.15
Python version: 3.11.3

Code to Reproduce

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
import pandas as pd
df = pd.read_csv("output.csv", sep=";")

data_samples = {
    'question': df['question'].tolist(),
    'answer': df['answer'].tolist(),
    'contexts': df['contexts'].apply(lambda x: [x] if isinstance(x, str) else x).tolist(),
    'ground_truth': df['ground_truth'].tolist()
}

from datasets import Dataset 
dataset = Dataset.from_dict(data_samples)

from ragas import evaluate
from ragas.metrics import (faithfulness, 
                           answer_correctness,    
                           answer_relevancy,
                           context_recall,
                           context_precision)

from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
end = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.3", max_new_tokens=512)
huggingface_llm = ChatHuggingFace(llm=end, tokenizer=tokenizer)
huggingface_embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5")

metrics=[faithfulness, 
        answer_correctness,    
        answer_relevancy,
        context_recall,
        context_precision]

score = evaluate(dataset=dataset,
        metrics=metrics,
        llm=huggingface_llm,
        embeddings=huggingface_embeddings,
        raise_exceptions=False
)

Error trace

Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Exception raised in Job[304]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Exception raised in Job[444]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Exception raised in Job[169]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
Failed to parse output. Returning None.
Failed to parse output. Returning None..
Exception raised in Job[309]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Exception raised in Job[174]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Exception raised in Job[449]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Exception raised in Job[179]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Exception raised in Job[314]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Exception raised in Job[184]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Exception raised in Job[454]: ValidationError(2 validation errors for ContextPrecisionVerifications
__root__ -> 0 -> reason
  field required (type=value_error.missing)
__root__ -> 0 -> verdict
  field required (type=value_error.missing))
Failed to parse output. Returning None.
Exception raised in Job[461]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3'))
Exception raised in Job[196]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3'))
Exception raised in Job[462]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3'))

Additional context
it only evaluates for answer_correctness, other values are all NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions