-
Notifications
You must be signed in to change notification settings - Fork 0
gsoc final report
authored by @m1kit
Detecting the contents of license documents is a major task for software that automatically analyzes software metadata, such as package managers. OSS licenses can be detected automatically by comparing the license document with templates of major OSS licenses managed by SPDX using a string algorithm. However, typical string comparison algorithms are vulnerable to small additions of malicious wording.
For this reason, SPDX has provided the guideline for matching license documents. In addition, several license detection algorithms have been implemented. Among them, spdx_python_licensematching aims to faithfully reproduce the guidelines.
I have been working on two major tasks: to improve the existing implementation, and to release it as a library YALM to the world.
First of all, I would like to thank @anshuldutt21 san for creating the first code base. The idea of transpiling the templates into regular expressions to follow the guidelines was his, and his implementation was very well thought out. This project won't be like this without his efforts.
Also, this project would not have been possible without the mentoring of @goneall san. The SPDX community has been working on the problem of license matching for many years, and they have knowledge about issues and solutions that I did not know. They were kind enough to provide me with relevant knowledge as the situation required.
Finally, I would like to thank members of the community for discussing with me and the GSoC staffs for giving me this great opportunity.