When running the TF-IDF agent of Atarashi on real-world source trees (e.g. BusyBox), the scan crashed due to an unhandled exception originating from the nirjas comment extractor.
This caused atarashi to terminate it's scan in middle and this impact's the reliability of atarshi especially when used as a part of FOSSology scan.
how to produce - 1. Download BusyBox-wget https://busybox.net/downloads/busybox-1.36.1.tar.bz2
tar -xf busybox-1.36.1.tar.bz2
2. get in atarashi cloned directory
3. atarashi -a tfidf -s CosineSim path-of-file
Earlier it was assuming that comment extraction would always succeed for supported files extension and was checking it with if-else statement . Now i have added a try except around comment extraction to remove any failure during extraction of comments from Nirjas and complete it's scan .
Should i make a PR for this?
@GMishx @shaheemazmalmmd