🤖 AI vs Human Content Detector 2025

A lightweight yet powerful ML pipeline to automatically detect whether text is AI-generated or human-written.

Ideal for content moderation, academic integrity checks, and blog/article verification.

📌 Table of Contents

Overview
Demo
Tech Stack
Dataset
Project Structure
How It Works
Getting Started
Model Performance
Usage
Author

🧠 Overview

With the rapid rise of AI-generated content, distinguishing between human and machine-written text has become increasingly important. This project builds a binary text classifier using classical NLP and machine learning techniques to tackle this challenge.

Key highlights:

Cleans and preprocesses raw text data
Converts text to numerical features using TF-IDF Vectorization
Trains a Logistic Regression classifier with high accuracy
Saves the trained model and vectorizer as .pkl files for production reuse
Includes an app.py script for real-time inference — no retraining needed

🎬 Demo

Enter text: The mitochondria is the powerhouse of the cell...

🔍 Prediction: HUMAN ✅
📊 Confidence: 91.4%

🛠 Tech Stack

Category	Tools
Language	Python 3.8+
ML & NLP	scikit-learn, TF-IDF, Logistic Regression
Data Handling	pandas, numpy
Visualization	matplotlib, seaborn
Model Persistence	joblib / pickle
Environment	Jupyter Notebook, VS Code

📂 Dataset

Field	Description
File	`ai_human_content_detection_dataset.csv`
`text`	Input text sample
`label`	Target class — `AI` or `Human`
Split	80% Train / 20% Test

🗂 Project Structure

ai-vs-human-content-detector-2025/
│
├── 📓 AI_vs_Human_Content_Detection.IPYNB   # Main notebook: EDA, preprocessing, training & evaluation
├── 📊 ai_human_content_detection_dataset.csv # Labeled dataset (AI & Human samples)
├── 🤖 logreg_model.pkl                       # Saved Logistic Regression model
├── 🔤 tfidf_vectorizer.pkl                   # Saved TF-IDF vectorizer
├── 🚀 app.py                                 # Inference script — load model & predict on new text
├── 📋 requirements.txt                       # Python dependencies
└── 📄 README.md                              # Project documentation

⚙️ How It Works

Raw Text
   │
   ▼
┌─────────────────────────────┐
│     Text Preprocessing      │
│  • Lowercasing              │
│  • Remove punctuation       │
│  • Strip extra whitespace   │
│  • (Optional) Stopwords     │
└─────────────────────────────┘
   │
   ▼
┌─────────────────────────────┐
│    TF-IDF Vectorization     │
│  Converts text → numbers    │
└─────────────────────────────┘
   │
   ▼
┌─────────────────────────────┐
│   Logistic Regression       │
│   Binary Classifier         │
│   AI  vs  Human             │
└─────────────────────────────┘
   │
   ▼
  Prediction + Confidence Score

🚀 Getting Started

1. Clone the Repository

git clone https://github.com/Musawir456/ai-vs-human-content-detector-2025.git
cd ai-vs-human-content-detector-2025

2. (Optional) Create a Virtual Environment

python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Run the Notebook (Training & Analysis)

jupyter notebook "AI_vs_Human_Content_Detection.IPYNB"

5. Run the Inference App

python app.py

📈 Model Performance

Metric	Score
Accuracy	~XX%
Precision	~XX%
Recall	~XX%
F1-Score	~XX%

📝 Update this table with your actual evaluation results after training.

💡 Usage

Once the model is trained, use app.py to predict on any new text:

import joblib

model = joblib.load("logreg_model.pkl")
vectorizer = joblib.load("tfidf_vectorizer.pkl")

text = ["Your sample text goes here..."]
features = vectorizer.transform(text)
prediction = model.predict(features)

print(f"Prediction: {prediction[0]}")

Screenshots:

👨‍💻 Author

Abdul Musawir AI/ML Engineer & Data Scientist 📍 Lahore, Pakistan

⭐ If you found this project useful, please give it a star! ⭐

Made with ❤️ by Abdul Musawir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AI vs Human Content Detector 2025

📌 Table of Contents

🧠 Overview

🎬 Demo

🛠 Tech Stack

📂 Dataset

🗂 Project Structure

⚙️ How It Works

🚀 Getting Started

1. Clone the Repository

2. (Optional) Create a Virtual Environment

3. Install Dependencies

4. Run the Notebook (Training & Analysis)

5. Run the Inference App

📈 Model Performance

💡 Usage

Screenshots:

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
AI_vs_Human_Content_Detection .IPYNB		AI_vs_Human_Content_Detection .IPYNB
README.md		README.md
Screenshot (1151).png		Screenshot (1151).png
ai_human_content_detection_dataset.csv		ai_human_content_detection_dataset.csv
app.py		app.py
logreg_model.pkl		logreg_model.pkl
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

Folders and files

Latest commit

History

Repository files navigation

🤖 AI vs Human Content Detector 2025

📌 Table of Contents

🧠 Overview

🎬 Demo

🛠 Tech Stack

📂 Dataset

🗂 Project Structure

⚙️ How It Works

🚀 Getting Started

1. Clone the Repository

2. (Optional) Create a Virtual Environment

3. Install Dependencies

4. Run the Notebook (Training & Analysis)

5. Run the Inference App

📈 Model Performance

💡 Usage

Screenshots:

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages