Text Analysis Toolkit

Overview

The Text Analysis Toolkit is a Python-based tool for analyzing textual data extracted from articles. It performs sentiment analysis, calculates readability metrics, and extracts key linguistic features to provide valuable insights into the content of the articles.

Features

Data Extraction: Extracts article text from provided URLs and saves them into separate text files.
Sentiment Analysis: Determines the sentiment of articles (positive, negative, or neutral) and calculates sentiment scores.
Readability Metrics: Computes various readability metrics such as average sentence length, percentage of complex words, Fog Index, etc.
Output Data: Prepares an output CSV file containing calculated metrics for further analysis.

Usage

Setup: Install required Python packages using pip install -r requirements.txt.
Data Extraction: Provide URLs of articles in an input Excel file (Input.xlsx) and run data_extraction.py to extract text.
Sentiment Analysis: Run sentiment_analysis.py to perform sentiment analysis on the extracted text.
Readability Metrics: Run readability_metrics.py to calculate readability metrics.
Output: The calculated metrics are saved to Output_Data.csv for further analysis.

Repository Structure

Articles: Contains extracted text files from articles.
StopWords: Includes stop words lists for filtering out common words.
MasterDictionary: Contains dictionaries of positive and negative words.
Input.xlsx: Input file with URLs of articles.
Output_Data.csv: Output file with calculated metrics.
data_extraction.py: Script for extracting text from URLs.
sentiment_analysis.py: Script for performing sentiment analysis.
requirements.txt: List of required Python packages.

Requirements

Python
BeautifulSoup
NLTK

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
MasterDictionary		MasterDictionary
StopWords		StopWords
Articles Analysis.ipynb		Articles Analysis.ipynb
Input.xlsx		Input.xlsx
Output Data Structure.xlsx		Output Data Structure.xlsx
Output_Data.csv		Output_Data.csv
ProjectReport.pdf		ProjectReport.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis Toolkit

Overview

Features

Usage

Repository Structure

Requirements

About

Uh oh!

Releases

Packages

Languages

saish05/News-Article-analysis-with-NLP

Folders and files

Latest commit

History

Repository files navigation

Text Analysis Toolkit

Overview

Features

Usage

Repository Structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages