Skip to content

IITISoC-IVR/LaneDetection-IVR1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IITISoC-23-IVR1-LaneDetection-using-LimitedComputationPower

Goal :

To Develop a robust lane detection pipeline that consumes meager computational resources (No GPU allowed, limited CPU and RAM usage) and could be deployed on NVIDIA Jetson Nano Board or even a Raspberry Pi board

People Involved :

Mentors:

Members:

Outline :

This repository contains the implementation of a lane detection system using two different approaches. The main goal of this project was to understand the core fundamentals of lane detection in images. The two approaches utilized are as follows:

  • Approach-1 - Foundation Approach( Lane Detection using Canny Egde and Hough transform)
  • Approach-2 - Advanced approach with Deep Learning.

Approach-1 :

In this approach, traditional computer vision techniques were employed to detect lanes in an image. The input pipeline for Approach 1 consists of a sequence of techniques applied to the input image to detect lane lines. Each step in the pipeline is essential for accurate and reliable lane detection. Sequence is as follows:

  • Preprocessing : The input image was preprocessed to enhance lane features and reduce noise.

  • Canny Edge Detection : The Canny edge detection algorithm was applied to extract edges in the image.

  • Region of Interest selection : When finding lane lines, we don’t need to check the cloud and the mountains in a image. Thus the objective of this technique is to concentrate on the region of interest for us that is the road and lane on the road

  • Hough Transform : The Probabilistic Hough transform algorithm was used to detect lines in the edge-detected image, which represent potential lane markings.

  • Post-processing : The detected lines were further processed to combine and extend them to form complete lane boundaries.

Predicted Results :

Disadvantages :

  • Fails in complex curved roads.
  • Does not give satisfactory results in rainy and foggy environment.

Approach-2 :

In this advanced approach, we have explored the effectiveness of deep learning models, including Convolutional Neural Networks (CNNs) and Transformer models, for accurate and efficient lane detection. While CNNs are widely known for their image analysis capabilities, we also investigated the potential of Transformer-based models, such as LSTR (Lane Shape Transformer), which are specifically designed for sequence-to-sequence tasks like lane detection.

Model Selection :

We carefully curated and tested several state-of-the-art deep learning models for lane detection. The following models were among those evaluated :

  • 3 CNN Models : We explored three different CNN architectures that we found through YouTube and GitHub. These models were chosen for their effectiveness in image analysis tasks and had demonstrated promising results in lane detection scenarios.

  • YOLOP and YOLOPv2 : We experimented with YOLOP and its upgraded version, YOLOPv2, which are well-known for their real-time object detection capabilities. We adapted these models for lane detection and evaluated their performance. A summarized architecture description for the YOLO-based network :
    • Encoder
      The network has a shared encoder consisting of:
      • Backbone: CSPDarknet, a classic image classification network known for its excellent performance in object detection. It ensures real-time performance and feature propagation.
      • Neck: Composed of Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (FPN) modules, which fuse features from different scales and semantic levels using concatenation.
    • Decoder
      The network has three specific decoders for different tasks:
      • Detect Head: Utilizes a multi-scale detection scheme with PAN (Path Aggregation Network) for better feature fusion. It predicts object positions, scales, category probabilities, and confidence.
      • Drivable Area Segment Head & Lane Line Segment Head: Both use the same network structure and rely on the bottom layer of FPN for segmentation. The output feature map is restored to the input size using upsampling, with no extra SPP module. Nearest Interpolation is used for upsampling, resulting in fast and precise output.

  • HybridNets : HybridNets is a popular deep learning architecture specifically designed for lane detection. We examined its performance and capabilities for detecting complex lane geometries.summarized model architecture for HybridNets :
    • Encoder
      • Backbone: EfficientNet-B3, which efficiently extracts features and reduces computational cost through neural architecture search.
      • Neck: BiFPN module based on EfficientDet, enabling bidirectional feature fusion at different resolutions through cross-scale connections.
    • Decoder
      • Detection Head:
        • Prior anchors: Each grid in the multi-scale fusion feature maps from the Neck network is assigned nine prior anchors with different aspect ratios, determined using k-means clustering.
        • Prediction: The detection head predicts the bounding box offsets, class probabilities, and confidence scores.
      • Segmentation Head:
        • Classes: There are 3 classes for output - background, drivable area, and lane line.
        • Feature levels: Utilizes 5 feature levels {𝑃3, …, 𝑃7} from the Neck network to the segmentation branch.
        • Upsampling: Each level is upsampled to have the same output feature map size (W/4, H/4, 64).
        • Feature fusion: The upsampled levels are combined using summation for better feature fusion.
        • Output: The final output feature map is of size (W, H, 3) representing the probability of each pixel's class.

    • LSTR(Lane Shape Transformer) : While the Transformer-based model LSTR showed impressive frames per second (fps) performance, we found that its detection results did not meet our expectation.In this Transformer-based lane detection architecture, the model consists of several key components:
      • Backbone: The backbone extracts low-resolution features from the input image I and converts them into a sequence S by collapsing the spatial dimensions.
      • Reduced Transformer Network: The sequence S, along with positional embeddings Ep, is fed into the transformer encoder to produce a representation sequence Se. The transformer is responsible for capturing long- range dependencies and interactions within the sequence.
      • Decoder: The decoder generates an output sequence Sd by attending to an initial query sequence Sq and a learned positional embedding ELL, which implicitly learns positional differences. The decoder computes interactions with Se and Ep to attend to related features.
      • Feed-forward Networks (FFNs): Several feed-forward networks are employed to directly predict the parameters of proposed lane outputs.
      • Hungarian Loss: The model utilizes the Hungarian Loss, a specific loss function tailored for lane detection tasks, to optimize the parameters and ensure accurate lane predictions.
      • The architecture leverages the power of the transformer model for sequence-to-sequence tasks, allowing for more effective lane detection, especially in scenarios involving curved lanes and complex lane geometries.

    Predicted Results :

    the result for the three CNN models, YOLOP, YOLOPv2, LSTR and hybridnets can be found here

    System Specifications :

    All the work is done in 3 devices(2 of which are same) namely asus vivobook 15 pro and HP Pavilion Gaming 15 ec2008AX. HP Pavilion Gaming 15 ec2008AX has Processor AMD Hexa Core Ryzen 5 5600H, RAM 8 GB DDR4 RAM meanwhile the vivobook 15 pro is a 12th Gen Intel Core H-series processors with 16 GB of LPDDR5 RAM, and an NVIDIA ® GeForce ® RTX ™ 3050 Ti GPU

    Model Evaluation :

    During our comprehensive testing, we considered multiple deep learning architectures, such as CNNs, HybridNets, YOLOP, YOLOPv2, and LSTR. Each model underwent rigorous evaluation using performance metrics like Mean Average Precision (mAP), Intersection over Union (IoU), and inference speed (fps), precision, recall, f1 score.
    Comparison of 3 CNN Models

    Model Parameters Size(KB) Precision Recall F1 Score FPS Dice coefficient IoU
    CNN 2 129,498 580.15 0.939 0.747 0.8327 12 0.8327 0.72
    CNN 3 181,693 150.02 0.980 0.731 0.837 15 0.837 0.72
    CNN 1 125,947 55 0.97 0.984 0.99 5 0.987 0.976

    Graphical representation of comparison of models

    Untitled design

    Comparison of YOLOP, YOLOPV2, Hybridnets :

    Model Parameters(million) Size(KB) Accuracy IoU(Lane line) IoU(Drivable area) FPS
    YOLOP 7.9 31,763 0.70 0.262 0.91 10
    YOLOPv2 38.64 154,660 0.87 0.27 0.93 41
    Hybridnets 13 54,482 0.85 0.31 0.95 12

    Graphical representation of comparison of models graphical_comparison

    Visualization

    model used in all three are trained on the BDD100k dataset. Comparison.

    A glimpse of the inference we obtained on our campus videos

    Model Quantization:

    In our pursuit of finding a balance between accuracy and computational efficiency, we explored the post-training quantization technique for one of the satisfactory models, YOLOP, which also boasts a simpler architecture compared to YOLOPv2. We chose YOLOP for quantization due to its smaller number of parameters and model size, making it more amenable to this process. In conclusion, the implementation of post-training quantization on YOLOP demonstrated its viability as an optimized solution for lane detection with limited computation power. This approach allows us to achieve near-comparable accuracy to the original model, YOLOP, while benefiting from reduced parameters and model size, thus making it well-suited for deployment in resource-constrained environments.

    Deployment and Future Improvements

    After post-training static quantization we end up reducing our model size and get a balance between accuracy and computational efficiency. Now we are ready to deploy it on an edge computing device like te NVIDIA Jetson Xavier. In the future, we plan to deploy our lane detection pipeline on the NVIDIA Xavier platform, a powerful and energy-efficient system-on-a-chip (SoC) designed for edge computing and AI applications. The NVIDIA Xavier's advanced architecture and computational capabilities make it an ideal candidate for running deep learning models, even in real-time scenarios. The successful deployment on Xavier will pave the way for scalable and practical integration of our lane detection solution in various real-time applications.

    Refrerences :

    [1] MLND Capstone project for Udacity's Machine Learning Nanodegree, (2017), Github reposoitory, https://github.com/mvirgo/MLND-Capstone

    [2] Pytorch Profiler ,PyTorch Recipes[ + ], https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html

    [3] Vision Transformers for Computer Vision[+],https://towardsdatascience.com/using-transformers-for-computer-vision-6f764c5a078b

    [4] Chuan-en Lin (2018 , Dec. 17). “Tutorial: Build a lane detector”, Available: https://towardsdatascience.com/tutorial-build-a-lane-detector-679fd8953132.[Apr 06, 2019]\

    [5] article : https://link.springer.com/article/10.1007/s11633-022-1339-y

    [6]HybridNets model, Original Paper : https://arxiv.org/abs/2203.09035

    [7] ibaiGorordo / ONNX-LSTR-Lane-Detection(2021), Github repository, https://github.com/ibaiGorordo/ONNX-LSTR-Lane-Detection

    [8]CAIC-AD / YOLOPv2(2022), Github Repository, https://github.com/CAIC-AD/YOLOPv2

    [9]For quantisation : https://www.researchgate.net/publication/372248473_Q-YOLOP_Quantization-aware_You_Only_Look_Once_for_Panoptic_Driving_Perception

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •