Gorongosa National Park

How New Images Are Processed

When you run training.py, the script always scans the image folder. If a cached file exists: (resnet_training/full_df_filtered.csv), the script will:

Load the cached filtered dataframe
Compare filenames against the current folder
Detect new images not previously processed
Run MegaDetector only on those new images
Append passing images to the cached CSV
Save the updated CSV
This prevents reprocessing the entire dataset.

First run : If the cached CSV does not exist, MegaDetector runs on all images and a CSV is created

Subsequent Runs : Only new images are processed so it is much faster

📁 Expected Directory Structure

kaitlyn_catalyst/
├── speciesnet/
│   ├── training.py
│   ├── inference.py
│   ├── splitting.py
│   ├── detector.py
│   ├── dataloader.py
│   └── utilities.py
│
├── images/
│   └── all_species_images/
│       ├── IMG_0001_{site}_{class}.jpg
│       ├── IMG_0002_{site}_{class}.jpg
│       └── ...
│
└── resnet_training/
    ├── full_df_filtered.csv
    ├── last_epoch_predictions_*.json
    └── last_model_state_resnet18_*.pkl

🧠 MegaDetector Threshold

Configured inside training.py: "megadetector_conf": 0.2

Typical values:

Threshold	Behavior
0.1–0.2	Permissive (keeps more animals, more false positives)
0.3–0.4	Balanced
0.5–0.6	Strict (fewer false positives, may miss small animals)

Change this value if:

too many empty images are kept → increase threshold
animals are being missed → decrease threshold

🚀 Running Training

Activate environment:

conda activate speciesnet

Run training:

python training.py

If new images were added, only those will be processed by MegaDetector.

🧪 Running Inference

Single image :

python inference.py --image path/to/image.jpg

Folder :

python inference.py --folder path/to/images

Folder + save CSV :

python inference.py --folder path/to/images --output preds.csv

The script automatically loads the latest saved checkpoint.

💾 Model Checkpoints :

Saved in:

resnet_training/last_model_state_resnet18_YYYYMMDD_HHMMSS.pkl

Checkpoint contains :

model_state
optimizer_state
class_names
training config

🔁 Resume Training (Optional)

Load the checkpoint
Restore model + optimizer state
Continue training loop

(Current script saves checkpoints but does not auto-resume — can be added.)

📊 Outputs

After training completes:

resnet_training/
├── full_df_filtered.csv
├── last_epoch_predictions_train.json
├── last_epoch_predictions_valid.json
├── last_epoch_predictions_holdout.json
└── last_model_state_resnet18_*.pkl

🧩 Conceptual Flow

Raw Images
    ↓
build_df_from_folder()
    ↓
MegaDetector (incremental filtering)
    ↓
Train/Val/Holdout split
    ↓
ResNet18 training
    ↓
Checkpoint saved
    ↓
Inference

Important Notes

If an image fails MegaDetector, it will not appear in full_df_filtered.csv. Adding new images does NOT overwrite previous results. To completely rebuild filtering, delete: resnet_training/full_df_filtered.csv, then rerun training.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
ct_classifier		ct_classifier
notebooks		notebooks
scripts		scripts
speciesnet		speciesnet
.gitignore		.gitignore
2018spp_kingdom.csv		2018spp_kingdom.csv
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gorongosa National Park

How New Images Are Processed

Typical values:

Important Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

missarah96/kaitlyn_catalyst

Folders and files

Latest commit

History

Repository files navigation

Gorongosa National Park

How New Images Are Processed

Typical values:

Important Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages