Skip to content

kamel-usp/jbcs2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

106 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jbcs2025

Project Setup

This project uses uv as the package manager for managing dependencies efficiently.

Prerequisites

Ensure you have uv installed. You can install it using:

curl -LsSf https://astral.sh/uv/install.sh | sh

Or using pip:

pip install uv

Setup Instructions

  1. Clone the repository:

    git clone <repository-url>
    cd jbcs2025
  2. Synchronize dependencies:

    uv sync
  3. Activate the virtual environment:

    • macOS/Linux:
      source .venv/bin/activate
    • Windows (PowerShell):
      .venv\Scripts\activate
  4. Install flash-attn package (Only needed if you want to fine-tune/inference with Decoder LMs like Llama8B or Phi4/3.5)

Due to some order installation issue, flash-attn has to be installed separatedly. Then run inside the environment

uv pip install flash-attn --no-build-isolation

Managing API Keys and Secrets

This project uses environment variables for managing API keys (such as OpenAI or Maritaca AI) and other sensitive information.

  1. Create a .env file in the project root:
touch .env  # On Linux/macOS
# Or manually create the file on Windows
  1. Add your API keys to the .env file:
#API Keys
MARITACA_API_KEY=your_maritaca_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
HUGGINGFACE_TOKEN=your_hf_token_here
  1. How it works:
  • The project uses python-dotenv to load environment variables from the .env file
  • API keys are referenced in YAML configuration files like this:
model:
  api_key: ${env:MARITACA_API_KEY,""}  # Falls back to empty string if not found
  • For additional security, you can also set environment variables directly in your system.
  1. Security best practices:
  • Never commit .env files to Git (it's already in .gitignore)
  • Don't share API keys in code, chat, or documentation
  • Rotate API keys periodically

Dependency Management

  • To add a new dependency, use:
    uv add <package-name>

Deactivating the Environment

To deactivate the virtual environment, run:

deactivate

Usage

The project provides several scripts for different machine learning workflows:

1. Training and Fine-tuning Models

Single Experiment

Run a single experiment using the main configuration:

python scripts/run_experiment.py

You can override configuration parameters:

python scripts/run_experiment.py experiments.model.name=microsoft/phi-3.5-mini-instruct experiments.dataset.grade_index=0

Inference Experiments

Run inference experiments on pre-trained models:

python scripts/run_inference_experiment.py

Sequential Experiments

Run multiple experiments in sequence using a configuration file:

python scripts/run_sequential_experiments.py configs/sequential_experiments.yaml --mode train

For inference mode:

python scripts/run_sequential_experiments.py configs/sequential_experiments.yaml --mode inference

2. Model Management and Hugging Face Hub Integration

Push Single Model to Hub

Push a trained model to Hugging Face Hub:

python scripts/push_model_to_hub.py /path/to/model/directory

Batch Push Models

Push multiple models from a date folder to Hugging Face Hub:

python scripts/batch_push_models.py outputs/2025-07-06 --config configs/push_model_config.yaml

Options:

  • --delay: Delay between pushes in seconds (default: 2)
  • --dry-run: Show what would be pushed without actually pushing

Generate Hub Models List

Generate a list of model IDs for batch inference:

python scripts/generate_hub_models_list.py --output configs/hub_models_list.txt

Use --dry-run to preview the list without writing to file.

Batch Inference from Hub

Run inference on multiple models from Hugging Face Hub:

python scripts/batch_inference_from_hub.py --models model1 model2 model3

Or use a file containing model IDs:

python scripts/batch_inference_from_hub.py --models-file configs/hub_models_list.txt

Options:

  • --output-base: Base output directory (default: inference_output)
  • --delay: Delay between experiments in seconds (default: 10)

3. Configuration Management

Configuration files are located in the configs/ directory:

  • config.yaml: Main configuration file
  • sequential_experiments.yaml: Configuration for running multiple experiments
  • push_model_config.yaml: Configuration for model pushing to Hub
  • hub_models_list.txt: List of Hub model IDs for batch processing

Example experiment configurations are organized in:

  • configs/experiments/api_models_*/: API-based model configs (e.g. Sabia; ChatGPT or DeepSeek)
  • configs/experiments/(base|large|slm_decoder)_models/: Fine-tuned model configs for different architectures

4. Output Structure

  • outputs/YYYY-MM-DD/HH-MM-SS: Training outputs organized by date
  • inference_output/YYYY-MM-DD/HH-MM-SS/: Inference results with timestamps

---

Now you're ready to contribute to `jbcs2025`! 🎉

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published