sumerjoshi
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 2 deletions b/‎.gitignore‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 29 additions & 81 deletions b/‎README.md‎
Lines changed: 29 additions & 81 deletions
diff --git a/‎check_bias.py‎
Lines changed: 88 additions & 0 deletions b/‎check_bias.py‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎check_bias_fixed.py‎
Lines changed: 83 additions & 0 deletions b/‎check_bias_fixed.py‎
Lines changed: 83 additions & 0 deletions
@@ -200,7 +200,10 @@ cython_debug/
 model/pretrained/pretrained_models/*
 
 # Prevent any .pth files being added as a saved model.
-model/pretrained/saved_models/*.pth*
+model/saved_models/*.pth*
 
 # No data in the data/train folder
-data/train/*
+data/train/*
+
+# No data in data_split folder
+data_split/*
@@ -4,30 +4,6 @@ This project uses deep learning to analyze audio files and detect AI-generated c
 
 ## Setup
 
-You can set up this project using either `uv` (recommended) or `pip`.
-
-### Option 1: Using uv (Recommended)
-
-1. Install `uv` if you haven't already:
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-
-2. Create and activate a virtual environment:
-```bash
-uv venv
-source .venv/bin/activate  # On Unix/macOS
-# or
-.venv\Scripts\activate  # On Windows
-```
-
-3. Install the package in development mode:
-```bash
-uv pip install -e .
-```
-
-### Option 2: Using pip
-
 1. Create and activate a virtual environment:
 ```bash
 python -m venv .venv
@@ -50,39 +26,53 @@ For the training step, I used this file from here [Link text][https://github.com
 For the example here, I set up a data folder at the top level with /data/train/ai and /data/train/real
 and would .mp3 and .wav files that I want to fintune against. I got the real data from
 FMA [Link Text][https://github.com/mdeff/fma] for testing, and the AI generated data from
-Facebook's Music Gen.
+Facebook's Music Gen. There needs to be the word "ai" in the path of the ai folders and "real" in the 
+path to the real songs.
 
-**NOTE: In /model/pretrained/cnn14.py, I'm hardcoding the path to be /mode/pretrained/pretrained_models/Cnn14_16k_mAP=0.438.pth.gz. This would have to be changed in the future. Cnn14 only takes in gzip files
+**NOTE: In /model/pretrained/cnn14.py, I'm hardcoding the path to be /model/pretrained/pretrained_models/Cnn14_16k_mAP=0.438.pth.gz. This would have to be changed in the future. Cnn14 only takes in gzip files
 so gzip your file beforehand**
 
 Steps:
-1. First place files in audio-processing-ai/data/train (if you are going to finetune data against your model)
+1. First place files in audio-processing-ai/data/train (if you are going to finetune data against your model) 
+    **All AI Files should go in the /data/train/ai and all of the real files goes in /data/train/real. This is because we need to do supervised learning befor training the classfier which file is AI music and which is Real**
 2. Figure out the model you are going to finetune against
 3. Update this line (PRETRAINED_MODEL_PATH = 'model/pretrained/pretrained_models/Cnn14_16k_mAP=0.438.pth.gz') at cnn14.py to the .pth.gz file location of your choice
 
 To train the model:
 ```bash
-cd audio-processing-ai
-python train.py --epoch 5 --dataFolder data/train/ --savePath model/saved_models/your_model.pth
+python train.py \
+    --num-epochs 5 \
+    --dataFolder data/train/ \
+    --savedPath model/saved_models/your_model.pth \
+    [--resume-from path/to/checkpoint.pth]  # Optional: resume from a checkpoint
 ```
 
-### Inference
-
-If you have an already trained/finetuned model and you just want to run the prediction,
-run it as such.
-
-Folder is the path to the audio files you want to test against.
+Required arguments:
+- `--savedPath`: Path where the model will be saved (must end in .pth)
+- `--dataFolder`: Directory containing training data (default: "data/train/")
+- `--num-epochs`: Number of training epochs (default: 5)
 
-Example lists the model path as model/saved_models/your_model.pth but that is changeable 
-depending on where you saved it.
+Optional arguments:
+- `--resume-from`: Path to a checkpoint to resume training from
 
-The outputted file is predictions_timestamp.csv
+### Inference
 
 To run predictions on audio files:
 ```bash
-python predict.py --folder path/to/audio/files --model model/saved_models/your_model.pth
+python predict.py \
+    --folder path/to/audio/files \
+    --model model/saved_models/your_model.pth
 ```
 
+Required arguments:
+- `--folder`: Directory containing .mp3/.wav files to analyze
+- `--model`: Path to your trained model (.pth file)
+
+The script will:
+1. Process each audio file in the specified folder
+2. Generate predictions for AI-generated content and audio scene tags
+3. Save results to a CSV file named `predictions_YYYYMMDD_HHMM.csv`
+
 ## Project Structure
 
 - `inference/`: Inference scripts for prediction
@@ -101,45 +91,3 @@ python predict.py --folder path/to/audio/files --model model/saved_models/your_m
 - Training data should be organized in the `data/train/` directory
 - Model checkpoints are saved in `model/saved_models/`
 - The project is installed as a Python package for proper import handling
-
-## Code Quality
-
-This project uses Ruff for both linting and formatting Python code. Ruff is a fast Python linter and formatter written in Rust.
-
-### Using Ruff
-
-1. Install Ruff (it's already included in the dev dependencies):
-```bash
-# Using pip (recommended if you want to use your existing virtual environment)
-pip install -e ".[dev]"
-
-# OR using uv pip (if you want to use uv but keep your current virtual environment)
-uv pip install -e ".[dev]"
-
-# Note: Do NOT use 'uv venv' unless you want to create a new virtual environment
-# with pyenv. If you want to use uv while keeping your current environment,
-# use 'uv pip' instead.
-```
-
-2. Format your code:
-```bash
-ruff format .
-```
-
-3. Lint your code:
-```bash
-ruff check .
-```
-
-4. Fix linting issues automatically:
-```bash
-ruff check --fix .
-```
-
-The Ruff configuration is in `pyproject.toml`. Currently, it:
-- Uses a line length of 88 characters (same as Black)
-- Targets Python 3.9
-- Enables pycodestyle (`E`) and Pyflakes (`F`) rules by default
-- Ignores line length violations (`E501`)
-
-You can customize the Ruff configuration by modifying the `[tool.ruff]` section in `pyproject.toml`.
@@ -0,0 +1,88 @@
+import os
+import glob
+import wave
+import torch
+import torchaudio
+import numpy as np
+from tqdm import tqdm
+from model.pretrained.dual_head_cnn14 import DualHeadCnn14
+
+def load_audio(path, sample_rate=16000, target_length=64000):
+    waveform, sr = torchaudio.load(path)
+    if sr != sample_rate:
+        waveform = torchaudio.functional.resample(waveform, sr, sample_rate)
+
+    waveform = waveform.mean(dim=0, keepdim=True)  # mono: [1, T]
+
+    # Ensure the input has at least target_length (e.g., 4s of audio)
+    if waveform.shape[1] < target_length:
+        pad_len = target_length - waveform.shape[1]
+        waveform = torch.nn.functional.pad(waveform, (0, pad_len))
+    else:
+        waveform = waveform[:, :target_length]  # truncate if too long
+
+    return waveform  # [1, target_length]
+
+def get_logits(model, files, device):
+    logits = []
+    for file in tqdm(files, desc="Evaluating"):
+        try:
+            waveform = load_audio(file, target_length=64000).to(device)
+            waveform = waveform.to(device)  # waveform: [1, T]
+            waveform = waveform.unsqueeze(0)       # [1, 1, T]
+            waveform = waveform.unsqueeze(2)       # [1, 1, 1, T]
+            
+            if waveform.ndim != 4:
+                
+                print(f"❌ Invalid input shape {waveform.shape} → expected [1, 1, 1, T]")
+                continue
+            
+            with torch.no_grad():
+                print(f"{file} → waveform shape: {waveform.shape}")
+                logit = model(waveform).squeeze().item()
+                logits.append(logit)
+                
+        except Exception as e:
+            print(f"❌ Error processing {file}: {e}")
+    return logits
+
+def safe_mean(x):
+    return np.mean(x) if len(x) > 0 else float("nan")
+
+# Gather files (wav + mp3) from data/train
+real_files = glob.glob("data/train/real/**/*.wav", recursive=True) + \
+             glob.glob("data/train/real/**/*.mp3", recursive=True)
+ai_files = glob.glob("data/train/ai/**/*.wav", recursive=True) + \
+           glob.glob("data/train/ai/**/*.mp3", recursive=True)
+
+print(f"🟩 Found {len(real_files)} real audio files.")
+print(f"🟥 Found {len(ai_files)} AI audio files.")
+
+if not real_files:
+    print("⚠️ Warning: No real files found.")
+if not ai_files:
+    print("⚠️ Warning: No AI files found.")
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+model = DualHeadCnn14(pretrained=True)
+model.load_state_dict(torch.load("/Users/sumerjoshi/upwork/audio-processing-ai/model/saved_models/Cnn14_16k_mAP_around2000_20250614_1223.pth", map_location=device))
+model.eval()
+model.to(device)
+
+real_logits = get_logits(model, real_files, device)
+ai_logits = get_logits(model, ai_files, device)
+
+avg_real_logit = safe_mean(real_logits)
+avg_ai_logit = safe_mean(ai_logits)
+
+print("\n=== Bias Check Results ===")
+if real_logits:
+    print(f"Real  Logit Avg: {avg_real_logit:.4f} | Sigmoid: {torch.sigmoid(torch.tensor(avg_real_logit)).item():.4f}")
+else:
+    print("⚠️ No real logits computed.")
+
+if ai_logits:
+    print(f"AI    Logit Avg: {avg_ai_logit:.4f} | Sigmoid: {torch.sigmoid(torch.tensor(avg_ai_logit)).item():.4f}")
+else:
+    print("⚠️ No AI logits computed.")
@@ -0,0 +1,83 @@
+from logging import BufferingFormatter
+import os
+import glob
+import wave
+import torch
+import torchaudio
+import numpy as np
+from tqdm import tqdm
+from torch import Tensor
+from model.pretrained.dual_head_cnn14 import DualHeadCnn14Simple
+from predict import preprocess_audio as load_audio  # reuse the same logic
+    
+
+def get_logits(model, files, device):
+    logits = []
+    for file_path in tqdm(files, desc="Evaluating"):
+        try:
+            waveform = load_audio(file_path=file_path).to(device)  # [1, T]
+            print(f"Initial waveform shape: {waveform.shape}")
+            if waveform.ndim == 2:
+                waveform = waveform.unsqueeze(0)  # [1, 1, T]
+            elif waveform.ndim == 3 and waveform.shape[0] == 1 and waveform.shape[1] == 1:
+                pass  # already correct
+            else:
+                print(f"❌ Unexpected input shape: {waveform.shape}")
+                continue
+
+            with torch.no_grad():
+                input_tensor = waveform  # expected by model
+                print(f"{file_path} → waveform shape: {input_tensor.shape}")
+                binary_logit, _ = model(input_tensor)
+                logit = binary_logit.squeeze().item()
+                prob = torch.sigmoid(binary_logit).squeeze().item()
+    
+                print(f"  → Logit: {logit:.4f}, Sigmoid Prob: {prob:.4f}")
+                logits.append(logit)
+
+        except Exception as e:
+            print(f"❌ Error processing {file_path}: {e}")
+    return logits
+
+def safe_mean(x):
+    return np.mean(x) if len(x) > 0 else float("nan")
+
+# Gather files (wav + mp3) from data/train
+real_files = glob.glob("data/train/real/**/*.wav", recursive=True) + \
+             glob.glob("data/train/real/**/*.mp3", recursive=True)
+ai_files = glob.glob("data/train/ai/**/*.wav", recursive=True) + \
+           glob.glob("data/train/ai/**/*.mp3", recursive=True)
+
+print(f"🟩 Found {len(real_files)} real audio files.")
+print(f"🟥 Found {len(ai_files)} AI audio files.")
+
+
+if not real_files:
+    print("⚠️ Warning: No real files found.")
+if not ai_files:
+    print("⚠️ Warning: No AI files found.")
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+state_dict = torch.load("/Users/sumerjoshi/upwork/audio-processing-ai/model/saved_models/Cnn14_16k_mAP_around2000_samplingAndRealTransformChanges_20250615_0746.pth", map_location=device)
+model = DualHeadCnn14Simple(pretrained=False)  # Use False since weights are from your training
+model.load_state_dict(state_dict)
+model.eval()
+model.to(device)
+
+real_logits = get_logits(model, real_files, device)
+ai_logits = get_logits(model, ai_files, device)
+
+avg_real_logit = safe_mean(real_logits)
+avg_ai_logit = safe_mean(ai_logits)
+
+print("\n=== Bias Check Results ===")
+if real_logits:
+    print(f"Real  Logit Avg: {avg_real_logit:.4f} | Sigmoid: {torch.sigmoid(torch.tensor(avg_real_logit)).item():.4f}")
+else:
+    print("⚠️ No real logits computed.")
+
+if ai_logits:
+    print(f"AI    Logit Avg: {avg_ai_logit:.4f} | Sigmoid: {torch.sigmoid(torch.tensor(avg_ai_logit)).item():.4f}")
+else:
+    print("⚠️ No AI logits computed.")