-
Notifications
You must be signed in to change notification settings - Fork 194
Description
Accelerating AI Inferencing on an Embedded Device
Problem Statement
Real-time object detection is critical in systems like autonomous vehicles, robotics, and surveillance. Traditionally, AI inference is done on cloud servers, but real-world applications need low-latency, high-throughput results on edge devices. This project explores accelerating AI inference on an embedded device (Jetson Orin Nano) using parallel programming and GPU-optimized frameworks.
Discussion
AI inferencing is a parallel computing problem because:
-
It involves applying deep neural networks, especially convolutional layers, across large image data.
-
Operations like convolutions, matrix multiplications, and activations are naturally parallelizable.
-
Using a GPU drastically increases throughput compared to CPU.
Using accelerators like the Jetson Orin Nano enables real-time performance without needing cloud connectivity. The use of frameworks like TensorRT or CUDA allows us to exploit data parallelism for faster inference.
Reflection on Team's Prior Knowledge
-
Familiarity with Python and basic OpenCV.
-
Exposure to CUDA and GPU concepts in class.
-
Basic understanding of neural networks and inference flow.
-
Limited experience with embedded GPU platforms like Jetson.
References
-
NVIDIA Jetson Orin Nano Developer Kit documentation
-
NVIDIA DeepStream SDK
-
TensorRT documentation
-
YOLOv5 and YOLOv8 GitHub repositories
-
“Programming Massively Parallel Processors” textbook