Skip to content

Commit 84b18d1

Browse files
authored
Merge pull request #339 from dishant26/Plagiarism-Checker
Added Plagiarism Checker
2 parents b138697 + 3e6181c commit 84b18d1

File tree

6 files changed

+71
-1
lines changed

6 files changed

+71
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,5 +94,5 @@ Once you are done working on your script edit this `README.md` file and add the
9494
67| Music Visualizer | Creates a visualization of any music file | [Find me here](https://github.com/GDSC-RCCIIT/General-Purpose-Scripts/tree/janinirami/scripts/Music-Visualizer) |
9595
68| Mood Sentiment Analyzer| Evaluate given texts/sentences general mood. | [Find me Here](https://github.com/GDSC-RCCIIT/General-Purpose-Scripts/tree/main/scripts/Mood%20Sentiment%20Analyzer)
9696
69| Morse Code Translation | Convert morse code to english language and vice versa | [Find me here](https://github.com/mahalrupi/General-Purpose-Scripts/tree/main/scripts/Morse-code-translation)
97-
97+
70| Plagiarism Checker| Find similarity/plagiarism score between 2 texts | [Find me Here](https://github.com/GDSC-RCCIIT/General-Purpose-Scripts/tree/main/scripts/Plagiarism-Checker)
9898
### Good Luck and don't forget to have fun with Open Source 🚀
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
## Check Plagiarism between Two Texts
2+
3+
[![](https://img.shields.io/badge/Made_with-Python-red?style=for-the-badge&logo=python)](https://www.python.org/)
4+
[![](https://img.shields.io/badge/Made_with-Scikit_Learn-blue?style=for-the-badge&logo=scikit-learn)](https://scikit-learn.org/)
5+
6+
### About
7+
A Python script to check the Plagiarism based on Similarity Scores between Two Texts.
8+
9+
### Setup
10+
11+
* Install Python3 from [here](https://www.python.org/)
12+
* Open Windows Command Prompt.
13+
* Clone the repository
14+
```bash
15+
git clone https://github.com/GDSC-RCCIIT/General-Purpose-Scripts.git
16+
```
17+
* Navigate inside the ```scripts/Plagiarism-Checker``` directory.
18+
* Run this command
19+
```bash
20+
pip install -r requirements.txt
21+
```
22+
* Now open the ```data.csv``` file.
23+
* In the ```text1``` column and the ```text2``` column, add the Two Texts for which you want to find the Plagiarism (Similarity) Score.
24+
* Now we are good to go.
25+
26+
Run using Python:
27+
```bash
28+
python check_plagiarism.py
29+
```
30+
31+
* An ```output.csv``` will be generated with an additional column named ```"similarity"``` that has the Plagiarism (Similarity) scores between the Two texts in that Row.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from sklearn.feature_extraction.text import TfidfVectorizer
2+
from sklearn.metrics.pairwise import cosine_similarity
3+
import pandas as pd
4+
5+
data = pd.read_csv("data.csv")
6+
7+
8+
def check_plagiarism(data):
9+
similarity_list = []
10+
11+
for i in range(len(data)):
12+
corpus = [data["text1"][i], data["text2"][i]]
13+
vectors = TfidfVectorizer().fit_transform(corpus).toarray()
14+
15+
text1_vector = [vectors[0]]
16+
text2_vector = [vectors[1]]
17+
18+
similarity = cosine_similarity(text1_vector, text2_vector)[0][0]
19+
similarity_list.append(similarity)
20+
21+
return similarity_list
22+
23+
24+
similarity_list = check_plagiarism(data)
25+
data["similarity"] = similarity_list
26+
27+
data.to_csv("output.csv", index=False)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
text1,text2
2+
I wish you a very good luck for all you life. May you get all your desires and may you get success on every single step. I wish you the best of luck.,I wish you a very good luck for all you life. May you always get what you want and may you be at the right place. I wish you the best of luck.
3+
Hard work is the most important key to success.,Achievements without hard work are impossible.
4+
My name is Walter White.,My name is Jesse Pinkman.
5+
I was born in United States of America.,I was born in United States of America.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
text1,text2,similarity
2+
I wish you a very good luck for all you life. May you get all your desires and may you get success on every single step. I wish you the best of luck.,I wish you a very good luck for all you life. May you always get what you want and may you be at the right place. I wish you the best of luck.,0.7777777941407904
3+
Hard work is the most important key to success.,Achievements without hard work are impossible.,0.15976420924144444
4+
My name is Walter White.,My name is Jesse Pinkman.,0.43161341897075156
5+
I was born in United States of America.,I was born in United States of America.,1.0000000000000004
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
pandas
2+
scikit-learn

0 commit comments

Comments
 (0)