GPU-accelerated microarchitecture simulator for digital vision chips with integrated power and area modeling.
Glimpse is a high-performance simulator designed for digital vision processing architectures. It provides GPU-accelerated simulation capabilities for vision chip microarchitectures, with comprehensive power and area modeling using CACTI and McPAT.
- High-performance simulation: Via GPU-acceleration
- Real-time Processing: Live camera feed processing with OpenCV integration
- VLIW Architecture Support: Variable instruction-level parallelism (1-4 slots)
- Pipelining Support: 3-stage pipeline implementation
- Power & Area Modeling: Integrated CACTI-based power and area analysis
- Flexible Bit Depth: Support for 1-8 bit pixel processing
- Edge detection (1-bit)
- Prewitt edge detection (6-bit)
- Image smoothing (1-bit and 6-bit)
- Image thinning (1-bit)
- Binary belief propagation Ising model
- CUDA Toolkit (12.2.0 or later)
- OpenCV 4.x
- GCC 10.3.0 or later
- NVIDIA GPU with compute capability 7.5+
- Camera (for real-time processing)
-
Clone the repository:
git clone <repository-url> cd glimpse
-
Install dependencies:
- Install CUDA Toolkit
- Install OpenCV 4.x development libraries
- Ensure pkg-config can find OpenCV
- If on Imperial College London lab machines, run
source setup_project.shinstead
-
Build the project:
make clean all
Run all test programs on an image:
./build/main./build/main [OPTIONS]| Option | Description | Default |
|---|---|---|
-i, --image |
Input image file | images/whitecat_600.jpg |
-d, --dimension |
Image dimension (square) | 128 |
-p, --program |
Program file (.vis) | programs/1_vliw_slot/edge_detection_one_bit.vis |
-g, --use-gpu |
Enable GPU acceleration | false |
-w, --vliw-width |
VLIW width (1-4) | 1 |
-r, --real-time |
Real-time camera processing | false |
-l, --pipelining |
Enable pipelining | false |
-b, --bits |
Bits per pixel (1-8) | 1 |
-t, --twos-complement |
Two's complement output | false |
--display-dimension |
Display window size | 1000 |
-h, --help |
Show help message | - |
GPU-accelerated real-time edge detection:
./build/main --dimension 1024 --use-gpu --real-time --program programs/1_vliw_slot/edge_detection_one_bit.vis --bits 1High-resolution belief propagation:
./build/main --dimension 512 --use-gpu --real-time --program programs/1_vliw_slot/binary_bp_ising_model.vis --bits 8Batch processing with custom image:
./build/main --image images/custom.jpg --dimension 256 --use-gpuVision programs are located in the programs/ directory, organized by VLIW width:
1_vliw_slot/- Single instruction slot programs2_vliw_slot/- Dual instruction slot programs3_vliw_slot/- Triple instruction slot programs4_vliw_slot/- Quad instruction slot programspipelining/- Pipelined implementations
edge_detection_one_bit.vis- Simple edge detectionprewitt_edge_detection_*.vis- Prewitt edge detectionsmoothing_*.vis- Image smoothing filtersthinning_one_bit.vis- Morphological thinningbinary_bp_ising_model.vis- Belief propagation
The simulator generates:
-
Performance Metrics:
- Processing time per frame
- Frame rate (FPS)
- Average chip performance (μs)
- Instruction count and utilization
-
Power Analysis:
- Dynamic power consumption
- Leakage power (subthreshold and gate)
- Total power (W)
-
Area Analysis:
- Compute area (μm²)
- Memory area (μm²)
- Total chip area
-
Processed Images:
- Output images saved to
outputimages/directory - Original quantized images for comparison
- Output images saved to
- Configurable VLIW width (1-4 instruction slots)
- Local memory per processing element
- Carry register support
- Neighbor communication capabilities
- Local memory per PE (configurable size)
- Shared neighbor values
- External output storage
- Binary operations with carry support
- Memory addressing modes
- Neighbor data access (up, down, left, right)
- Photodiode (PD) input access
glimpse/
├── src/ # Source code
│ ├── main.cu # Main application
│ ├── isa.cu/h # Instruction set architecture
│ ├── pe.cu/h # Processing element implementation
│ ├── powerandarea.cu/h # Power and area modeling
│ └── utils/ # Utility functions
├── programs/ # Vision algorithm programs
├── images/ # Test images
├── cacti/ # CACTI power/area modeling tool
├── notes/ # Documentation and BNF grammars
└── outputimages/ # Generated output images
For debugging builds, modify the Makefile to use:
CFLAGS = -arch=sm_75 -Xptxas -O1 -Xcompiler -O1 -use_fast_math -rdc=trueCUDA Errors: Ensure CUDA toolkit is properly installed and GPU compute capability is supported.
OpenCV Issues: Verify OpenCV 4.x is installed with development headers and pkg-config can locate it.
Camera Access: For real-time mode, ensure camera permissions and V4L2 support on Linux.
Memory Issues: Large images or high VLIW widths may require significant GPU memory.
This project is licensed under the MIT License - see the LICENSE file for details.
We welcome contributions including bug fixes, performance improvements, and documentation updates. Please submit a pull request to contibute!