Skip to content

jackhodkinson/fim-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIM Evaluation

Overview

This repository contains scripts for evaluating FIM capabilities of various models using a forked version of the human-eval-infilling tool from OpenAI.

The goal is to compare baseline performance of GPT, Claude, and Gemini models with various prompting methods, and to compare their performance with open-source and/or fine-tuned alternatives.

Setup

Run uv sync to install the dependencies.

If you use the llama model, you need to have ollama installed and running.

Usage

Modify the main.py file to set the model, benchmark, and other parameters.

Add requisit API keys to a .env file

Make sure your python path is set to the root of the repository: export PYTHONPATH=.

Run uv run --env-file .env main.py to generate the samples and evaluate the functional correctness for your configuration.

Run uv run src/review/review.py to review the failed results.

About

evaluate fim models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors