Skip to content

README.md #10

@N01160158

Description

@N01160158

Accelerating AI Inferencing on an Embedded Device

Problem Statement

Real-time object detection is critical in systems like autonomous vehicles, robotics, and surveillance. Traditionally, AI inference is done on cloud servers, but real-world applications need low-latency, high-throughput results on edge devices. This project explores accelerating AI inference on an embedded device (Jetson Orin Nano) using parallel programming and GPU-optimized frameworks.

Discussion

AI inferencing is a parallel computing problem because:

  • It involves applying deep neural networks, especially convolutional layers, across large image data.

  • Operations like convolutions, matrix multiplications, and activations are naturally parallelizable.

  • Using a GPU drastically increases throughput compared to CPU.

Using accelerators like the Jetson Orin Nano enables real-time performance without needing cloud connectivity. The use of frameworks like TensorRT or CUDA allows us to exploit data parallelism for faster inference.

Reflection on Team's Prior Knowledge

  • Familiarity with Python and basic OpenCV.

  • Exposure to CUDA and GPU concepts in class.

  • Basic understanding of neural networks and inference flow.

  • Limited experience with embedded GPU platforms like Jetson.

References

  • NVIDIA Jetson Orin Nano Developer Kit documentation

  • NVIDIA DeepStream SDK

  • TensorRT documentation

  • YOLOv5 and YOLOv8 GitHub repositories

  • “Programming Massively Parallel Processors” textbook

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions