ConfigurationLLMClassificator

This project enables experiments with large language models (LLMs) for classification tasks. It supports processing data using predefined configurations, handling multiple model setups, and generating evaluation reports.

Features

Run Experiments with Configurable Inputs:
Run classification tasks using CSV input files and configuration settings.
Support for Investigator and Model Modes:
- Investigator Mode: Execute experiments for a specific investigator using predefined configurations.
- Models Mode: Execute experiments for multiple models with their respective configurations.
Generative Model Integration:
Utilizes LLMs for predictions with user-defined prompts.
Partial Result Handling:
Saves intermediate results to prevent data loss during lengthy executions.
Evaluation Metrics:
Includes evaluation functionality such as edit distance analysis for classification performance.

Installation

Clone the repository:

git clone https://github.com/diverso-lab/ConfigurationLLMClassificator
cd ConfigurationLLMClassificator

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Running the Experiment

Investigator Mode:
Execute experiments for a specific investigator using their configuration:
```
python main.py --mode i --investigator investigatorName
```
Models Mode:
Run experiments for multiple models, optionally filtering by specific model names:
```
python main.py --mode models --models model1 model2
```

Configuration

Investigator Configuration

A JSON file (e.g., configs/investigatorName_config.json) defines the settings for a single investigator:

{
  "csv_path": "path/to/data.csv",
  "model": "model_name",
  "system_prompt": "Define classification prompt",
  "max_tokens": 256,
  "temperature": 1,
  "true_column": "class"
}

Models Configuration

A JSON file (e.g., configs/models_config.json) contains settings for multiple models:

[
  {
    "csv_path": "path/to/data1.csv",
    "model": "model1",
    "system_prompt": "Define prompt",
    "max_tokens": 256,
    "temperature": 1,
    "true_column": "class"
  },
  {
    "csv_path": "path/to/data2.csv",
    "model": "model2",
    "system_prompt": "Define another prompt",
    "max_tokens": 512,
    "temperature": 1,
    "true_column": "label"
  }
]

Output

Results Directory:
Results are saved in the output/ directory with a unique hash based on the configuration.
Files:
- config.csv: Saves the configuration used for this experiment.
- results.csv: Predicted labels for each instance.
- report.csv: Performance metrics and evaluation results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConfigurationLLMClassificator

Features

Installation

Usage

Running the Experiment

Configuration

Investigator Configuration

Models Configuration

Output

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ConfigurationLLMClassificator

Features

Installation

Usage

Running the Experiment

Configuration

Investigator Configuration

Models Configuration

Output