Skip to content

TF-IDF agent crashes on real-world projects due to unhandled comment extraction errors #114

@AnujRewar

Description

@AnujRewar

When running the TF-IDF agent of Atarashi on real-world source trees (e.g. BusyBox), the scan crashed due to an unhandled exception originating from the nirjas comment extractor.
This caused atarashi to terminate it's scan in middle and this impact's the reliability of atarshi especially when used as a part of FOSSology scan.

Image

how to produce - 1. Download BusyBox-wget https://busybox.net/downloads/busybox-1.36.1.tar.bz2
tar -xf busybox-1.36.1.tar.bz2
2. get in atarashi cloned directory
3. atarashi -a tfidf -s CosineSim path-of-file

Earlier it was assuming that comment extraction would always succeed for supported files extension and was checking it with if-else statement . Now i have added a try except around comment extraction to remove any failure during extraction of comments from Nirjas and complete it's scan .
Should i make a PR for this?
@GMishx @shaheemazmalmmd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions