FIM Evaluation

Overview

This repository contains scripts for evaluating FIM capabilities of various models using a forked version of the human-eval-infilling tool from OpenAI.

The goal is to compare baseline performance of GPT, Claude, and Gemini models with various prompting methods, and to compare their performance with open-source and/or fine-tuned alternatives.

Setup

Run uv sync to install the dependencies.

If you use the llama model, you need to have ollama installed and running.

Usage

Modify the main.py file to set the model, benchmark, and other parameters.

Add requisit API keys to a .env file

Make sure your python path is set to the root of the repository: export PYTHONPATH=.

Run uv run --env-file .env main.py to generate the samples and evaluate the functional correctness for your configuration.

Run uv run src/review/review.py to review the failed results.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
notebooks		notebooks
packages		packages
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIM Evaluation

Overview

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FIM Evaluation

Overview

Setup

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages