This repository contains the code for the TF-IDF algorithm.
The algorithm (short for term frequency–inverse document frequency), is a measure of importance of a word to a document in a collection or corpus.
The importance of a word is measured like this:
Where
The formula of
Where
The formula of
Where
The algorithm is implemented in the tfidf.py file. It can be used like this:
python3 tfidf.py
If you have a text file to test the program you can add an argument to the command:
python3 tfidf.py [PATH]/[Name of the file]