Skip to content

missarah96/kaitlyn_catalyst

Repository files navigation

Gorongosa National Park

How New Images Are Processed

When you run training.py, the script always scans the image folder. If a cached file exists: (resnet_training/full_df_filtered.csv), the script will:

  • Load the cached filtered dataframe

  • Compare filenames against the current folder

  • Detect new images not previously processed

  • Run MegaDetector only on those new images

  • Append passing images to the cached CSV

  • Save the updated CSV

  • This prevents reprocessing the entire dataset.

First run : If the cached CSV does not exist, MegaDetector runs on all images and a CSV is created

Subsequent Runs : Only new images are processed so it is much faster

📁 Expected Directory Structure

kaitlyn_catalyst/
├── speciesnet/
│   ├── training.py
│   ├── inference.py
│   ├── splitting.py
│   ├── detector.py
│   ├── dataloader.py
│   └── utilities.py
│
├── images/
│   └── all_species_images/
│       ├── IMG_0001_{site}_{class}.jpg
│       ├── IMG_0002_{site}_{class}.jpg
│       └── ...
│
└── resnet_training/
    ├── full_df_filtered.csv
    ├── last_epoch_predictions_*.json
    └── last_model_state_resnet18_*.pkl

🧠 MegaDetector Threshold

Configured inside training.py: "megadetector_conf": 0.2

Typical values:

Threshold	Behavior
0.1–0.2	Permissive (keeps more animals, more false positives)
0.3–0.4	Balanced
0.5–0.6	Strict (fewer false positives, may miss small animals)

Change this value if:

  • too many empty images are kept → increase threshold
  • animals are being missed → decrease threshold

🚀 Running Training

Activate environment:

conda activate speciesnet

Run training:

python training.py

If new images were added, only those will be processed by MegaDetector.

🧪 Running Inference

Single image :

python inference.py --image path/to/image.jpg

Folder :

python inference.py --folder path/to/images

Folder + save CSV :

python inference.py --folder path/to/images --output preds.csv

The script automatically loads the latest saved checkpoint.

💾 Model Checkpoints :

Saved in:

resnet_training/last_model_state_resnet18_YYYYMMDD_HHMMSS.pkl

Checkpoint contains :

  • model_state

  • optimizer_state

  • class_names

  • training config

🔁 Resume Training (Optional)

  • Load the checkpoint

  • Restore model + optimizer state

  • Continue training loop

(Current script saves checkpoints but does not auto-resume — can be added.)

📊 Outputs

After training completes:

resnet_training/
├── full_df_filtered.csv
├── last_epoch_predictions_train.json
├── last_epoch_predictions_valid.json
├── last_epoch_predictions_holdout.json
└── last_model_state_resnet18_*.pkl

🧩 Conceptual Flow

Raw Images
    ↓
build_df_from_folder()
    ↓
MegaDetector (incremental filtering)
    ↓
Train/Val/Holdout split
    ↓
ResNet18 training
    ↓
Checkpoint saved
    ↓
Inference

Important Notes

If an image fails MegaDetector, it will not appear in full_df_filtered.csv. Adding new images does NOT overwrite previous results. To completely rebuild filtering, delete: resnet_training/full_df_filtered.csv, then rerun training.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors