Skip to content

sarahlunette/Bat-Adapt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

📊 Bat‑Adapt

LLM Evaluation for the Bat‑Adapt Project (Open Booster Challenge — City Risks & Resilience)

Bat‑Adapt is a research and evaluation project that leverages large language models (LLMs) to assess or support resilience and risk‑related decision processes in the Bat‑Adapt initiative, presented at the Open Booster Challenge focused on city risks and resilience. This repository contains evaluation code and associated materials to benchmark LLM outputs relevant to the project.


🧠 What This Project Does

The Bat‑Adapt project is focused on:

  • Evaluating LLM performance for tasks associated with risk assessment and resilience planning in urban contexts.
  • Providing scripts and tools to run LLM evaluations programmatically.
  • Supporting repeatable and rigorous analysis of model outputs.

Although this repository currently has limited documentation, it includes at least:

📦 Bat‑Adapt
├── README.md
└── llm‑evaluation.py   # Python script for executing LLM evaluation logic

(Note: Repo layout and files are visible from GitHub’s repository listing.) ([GitHub][2])


🚀 Getting Started

These instructions help you get set‑up to run and evaluate LLMs using the provided scripts.

🛠️ Prerequisites

Ensure you have the following installed:

  • Python 3.9 or later
  • pip (Python package manager)
  • Internet access for model APIs (if applicable)
  • Optional: Virtual environment tool such as venv or conda

💻 Install Dependencies

This project may not include a requirements.txt, but the common dependencies for LLM evaluations typically include:

pip install openai transformers datasets numpy pandas

Modify based on the actual imports in llm‑evaluation.py.


🧪 Running the Evaluation

Assuming llm‑evaluation.py drives experiments, you may run:

python llm‑evaluation.py

This script likely:

  • Loads a set of input prompts
  • Sends them to a configured LLM
  • Records outputs and compares them against references

👉 You’ll want to inspect or modify this script to configure:

  • Model endpoint or API keys
  • Evaluation metrics (accuracy, coherence, relevance)
  • Dataset or prompt files if any

📈 How It Works (High‑Level)

The evaluation workflow generally includes:

  1. Loading a dataset of tasks relevant to city resilience and risk (possibly local prompts or test cases).
  2. Sending these tasks to an LLM (OpenAI, Hugging Face models, etc.).
  3. Capturing responses from the model.
  4. Computing evaluation metrics like relevance, correctness, or alignment with expected output.
  5. Generating a report of results.

You can customize this workflow to measure fluency, coherence, factuality, or task‑specific performance using established frameworks. ([GitHub Docs][3])


📂 Example Evaluation Script (Illustrative)

Below is a template of what such an evaluation script might look like internally. You should tailor it to your repository’s code.

import openai
import json

openai.api_key = "YOUR_API_KEY"

def evaluate(prompt):
    response = openai.ChatCompletion.create(
        model="gpt‑4o‑2024‑05‑13",
        messages=[{"role":"user","content":prompt}],
        max_tokens=512,
    )
    return response["choices"][0]["message"]["content"]

def run_evaluation(prompts_file):
    with open(prompts_file) as f:
        prompts = json.load(f)
    results = {}
    for idx, p in enumerate(prompts):
        result_text = evaluate(p)
        results[f"case_{idx}"] = result_text
    with open("results.json","w") as out:
        json.dump(results, out, indent=2)

if __name__ == "__main__":
    run_evaluation("prompts.json")

Replace model name, dataset, and prompt schema based on your actual setup.


🧩 Contributions

Contributions are welcome! You can help by:

  • Adding a requirements.txt
  • Expanding evaluation metrics and analysis
  • Adding example datasets and prompts
  • Improving documentation and notebooks demonstrating usage
  • Creating visualizations of evaluation results

📄 License

This repository does not list a specific license — you may want to add one (e.g., MIT License) for open reuse.


📌 Notes

  • The repository currently has no stars or forks but contains evaluation logic for an LLM focused on resilience tasks. ([GitHub][2])
  • The README above assumes the script llm‑evaluation.py is central to its purpose; adjust as needed if additional content is present.

About

LLM evaluation for the project Bat-Adapt presented at the Open Booster Challenge, city risks and resilience

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages