-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
I was doing some test for eval, after inspect eval finished, upload to server failed.
(bedrock) gyliu513@Guangyas-MacBook-Pro langtrace % inspect eval example_eval.py --model openai/gpt-3.5-turbo --log-dir langtracefs://cm4lrz7tq00075jmgkdtlq6w4
Fetching dataset with id: cm4lrz7tq00075jmgkdtlq6w4 from Langtrace
Successfully fetched dataset with id: cm4lrz7tq00075jmgkdtlq6w4 from Langtrace
╭─ example_eval (1 sample): openai/gpt-3.5-turbo ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ total time: 0:00:04 dataset: cm4lrz7tq00075jmgkdtlq6w4 │
│ openai/gpt-3.5-turbo 505 tokens [I: 415, O: 90] │
│ openai/gpt-4o 97 tokens [I: 90, O: 7] │
│ │
│ accuracy: 1.0 stderr: 0 │
│ │
│ Log: langtracefs://cm4lrz7tq00075jmgkdtlq6w4/2024-12-12T16-13-53-05-00_example-eval_VPDWd4DhQD4rhCFnAF7FWv.eval │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Traceback (most recent call last):
File "/Users/gyliu513/bedrock/lib/python3.11/site-packages/inspect_ai/_util/file.py", line 82, in file
f.close()
File "/Users/gyliu513/bedrock/lib/python3.11/site-packages/langtrace_python_sdk/extensions/langtrace_filesystem.py", line 57, in close
self.upload_to_server(file_data)
File "/Users/gyliu513/bedrock/lib/python3.11/site-packages/langtrace_python_sdk/extensions/langtrace_filesystem.py", line 64, in upload_to_server
log = file_data.decode("utf-8")
^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 10: invalid start byte(bedrock) gyliu513@Guangyas-MacBook-Pro langtrace % cat example_eval.py
from inspect_ai import Task, task
from inspect_ai.dataset import csv_dataset
from inspect_ai.scorer import model_graded_fact
from inspect_ai.solver import self_critique, generate
@task
def example_eval():
return Task(
dataset=csv_dataset("langtracefs://cm4lrz7tq00075jmgkdtlq6w4"),
plan=[
generate(),
self_critique(model="openai/gpt-4o")
],
scorer=model_graded_fact()
)Metadata
Metadata
Assignees
Labels
No labels