When you run training.py, the script always scans the image folder. If a cached file exists: (resnet_training/full_df_filtered.csv), the script will:
-
Load the cached filtered dataframe
-
Compare filenames against the current folder
-
Detect new images not previously processed
-
Run MegaDetector only on those new images
-
Append passing images to the cached CSV
-
Save the updated CSV
-
This prevents reprocessing the entire dataset.
First run : If the cached CSV does not exist, MegaDetector runs on all images and a CSV is created
Subsequent Runs : Only new images are processed so it is much faster
📁 Expected Directory Structure
kaitlyn_catalyst/
├── speciesnet/
│ ├── training.py
│ ├── inference.py
│ ├── splitting.py
│ ├── detector.py
│ ├── dataloader.py
│ └── utilities.py
│
├── images/
│ └── all_species_images/
│ ├── IMG_0001_{site}_{class}.jpg
│ ├── IMG_0002_{site}_{class}.jpg
│ └── ...
│
└── resnet_training/
├── full_df_filtered.csv
├── last_epoch_predictions_*.json
└── last_model_state_resnet18_*.pkl
🧠 MegaDetector Threshold
Configured inside training.py:
"megadetector_conf": 0.2
Threshold Behavior
0.1–0.2 Permissive (keeps more animals, more false positives)
0.3–0.4 Balanced
0.5–0.6 Strict (fewer false positives, may miss small animals)
Change this value if:
- too many empty images are kept → increase threshold
- animals are being missed → decrease threshold
🚀 Running Training
Activate environment:
conda activate speciesnet
Run training:
python training.py
If new images were added, only those will be processed by MegaDetector.
🧪 Running Inference
Single image :
python inference.py --image path/to/image.jpg
Folder :
python inference.py --folder path/to/images
Folder + save CSV :
python inference.py --folder path/to/images --output preds.csv
The script automatically loads the latest saved checkpoint.
💾 Model Checkpoints :
Saved in:
resnet_training/last_model_state_resnet18_YYYYMMDD_HHMMSS.pkl
Checkpoint contains :
-
model_state
-
optimizer_state
-
class_names
-
training config
🔁 Resume Training (Optional)
-
Load the checkpoint
-
Restore model + optimizer state
-
Continue training loop
(Current script saves checkpoints but does not auto-resume — can be added.)
📊 Outputs
After training completes:
resnet_training/
├── full_df_filtered.csv
├── last_epoch_predictions_train.json
├── last_epoch_predictions_valid.json
├── last_epoch_predictions_holdout.json
└── last_model_state_resnet18_*.pkl
🧩 Conceptual Flow
Raw Images
↓
build_df_from_folder()
↓
MegaDetector (incremental filtering)
↓
Train/Val/Holdout split
↓
ResNet18 training
↓
Checkpoint saved
↓
Inference
If an image fails MegaDetector, it will not appear in full_df_filtered.csv. Adding new images does NOT overwrite previous results. To completely rebuild filtering, delete: resnet_training/full_df_filtered.csv, then rerun training.