Fake News Classifier - NLP Project

This repository contains a full Machine Learning pipeline to classify news as real (1) or fake (0) using Natural Language Processing (NLP) techniques, built entirely in Python.

The project includes preprocessing, feature engineering, model training with cross-validation, and prediction on unseen validation data.

Final Deliverables:

predictions_validation.csv
fake_news_nlp.ipynb

Project Structure

dataset:
- data.csv # Labeled dataset for training and validation
- validation_data.csv # Unlabeled dataset to predict
results:
- predictions_validation.csv # Final predictions on validation data
- predictions_validation_onlytext.csv # Final predictions on validation data with only text model
models:
- random_forest_model.pkl # Saved trained model
- tfidf_vectorizer.pkl # Saved TF-IDF vectorizer
- random_forest_model_onlytext.pkl # Saved trained model with only text as a feature
- tfidf_vectorizer_onlytext.pkl # Saved TF-IDF vectorizer with only text as a feature
fake_news_script.py # Main Python script
fake_news_nlp.ipynb # Jupyter Notebook step by step how-to process to train the model
only_text_nlp_model.ipynb # Jupyter Notebook training a new model using only text as feature
README.md # Project description
requirements.txt
first_steps_data_exploration.ipynb # Jupyter Notebook with the first steps we took exploring the data

How to Run

Install the dependencies:

pip install -r requirements.txt
python fake_news_script.py

The script will:

- Preprocess the data (tokenize, remove stopwords, apply stemming)
- Combine title + text into a single feature
- Apply TF-IDF vectorization + One-Hot Encoding for subject
- Train a RandomForestClassifier
- Perform 5-fold cross-validation with visual output
- Save the model and vectorizer
- Predict on validation_data.csv and save results in predictions_validation.csv
- Evaluation: Cross-Validation

Results:

We use 5-fold cross-validation to evaluate the model’s ability to generalize. - Mean accuracy: e.g. 99.8% - Standard deviation: e.g. ± 0.0013

Developed as part of the Ironhack Data Science program — NLP Challenge.

Authors: Affiong Akpanisong & Mo Benet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Classifier - NLP Project

Final Deliverables:

Project Structure

How to Run

Results:

Developed as part of the Ironhack Data Science program — NLP Challenge.

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
models		models
predictions		predictions
Presentation_NLP_Classifier.pptx		Presentation_NLP_Classifier.pptx
README.md		README.md
fake_news_nlp.ipynb		fake_news_nlp.ipynb
fake_news_script.py		fake_news_script.py
first_steps_data_exploration.ipynb		first_steps_data_exploration.ipynb
only_text_nlp_model.ipynb		only_text_nlp_model.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Fake News Classifier - NLP Project

Final Deliverables:

Project Structure

How to Run

Results:

Developed as part of the Ironhack Data Science program — NLP Challenge.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages