Box_Counter is a computer vision project that uses drone imagery and depth estimation to count boxes in storage bins. The workflow involves:
- Preprocessing drone images of bins
- Estimating depth maps using pre-trained models
- Calibrating depth maps using known barcode positions
- Constructing point clouds to count boxes
Final report can be found here
- Setup environment:
python -m venv venv && source venv/bin/activate && pip install -r venv_requirements.txt - Run complete pipeline:
python src/main.py - Run with specific depth model:
python src/main.py --depth_model="depth_anything" - Run individual components:
- Image cropping:
python src/preprocessing/crop_images.py - Image registration:
python src/preprocessing/register_bin_images.py - Depth estimation:
python src/depth_map_estimation/depth_estimations.py - Depth calibration:
python src/depth_map_estimation/extract_pixel_depths.py
- Image cropping:
- Create virtual environment
mamba create -n vlm_env python=3.10 mamba activate vlm_env pip install -r requirements_vlm.txt - Count boxes with ChatGPT API
python -m src.count_boxes --model gpt --cache true - Count boxes with local VLM
python -m src.count_boxes --model Qwen/Qwen2.5-VL-7B-Instruct - Fine-tune VLM for box counting
python -m src.fine_tune
# Standard library imports first
import os
import sys
# Third-party imports next
import numpy as np
import torch
import cv2
# Local module imports last
from src.depth_map_estimation.utils import process_image- Use docstrings with parameter descriptions
- Document function purpose, parameters, and return values
- Functions/variables:
snake_case - Classes:
PascalCase - Constants:
UPPER_CASE
- Use descriptive error messages
- Handle exceptions with try/except blocks
- Return success/failure indicators from functions
- Add type hints to function signatures where possible:
def process_image(image_path: str, output_path: str) -> bool:
"""Process an image and return success state"""- Store data files in the
data/directory - Use relative paths for data access
- Document data format in function docstrings
src/main.py- Main pipeline orchestrationsrc/preprocessing/- Image preprocessing modulespreprocess_raw_data.py- Creates train/test splits and groups images by bincrop_images.py- Removes borders from wide-angle camera imagesregister_bin_images.py- Aligns multiple images of the same bin
src/depth_map_estimation/- Depth estimation and processingdepth_estimations.py- Generates depth maps using MiDaS or Depth Anything modelsextract_pixel_depths.py- Calibrates depth maps using barcode positionspoint_cloud_construction.py- Template for 3D point cloud creation (in development)
data/- Contains all data filesimages/- Various image directories (original, processed, cropped)depth/- Contains depth maps (.pfm files) and calibration parameters (json)