Skip to content

The Text Analysis Toolkit is a Python-based tool for analyzing textual data extracted from articles. It performs sentiment analysis, calculates readability metrics, and extracts key linguistic features to provide valuable insights into the content of the articles.

Notifications You must be signed in to change notification settings

saish05/News-Article-analysis-with-NLP

Repository files navigation


Text Analysis Toolkit

Overview

The Text Analysis Toolkit is a Python-based tool for analyzing textual data extracted from articles. It performs sentiment analysis, calculates readability metrics, and extracts key linguistic features to provide valuable insights into the content of the articles.

Features

  • Data Extraction: Extracts article text from provided URLs and saves them into separate text files.
  • Sentiment Analysis: Determines the sentiment of articles (positive, negative, or neutral) and calculates sentiment scores.
  • Readability Metrics: Computes various readability metrics such as average sentence length, percentage of complex words, Fog Index, etc.
  • Output Data: Prepares an output CSV file containing calculated metrics for further analysis.

Usage

  1. Setup: Install required Python packages using pip install -r requirements.txt.
  2. Data Extraction: Provide URLs of articles in an input Excel file (Input.xlsx) and run data_extraction.py to extract text.
  3. Sentiment Analysis: Run sentiment_analysis.py to perform sentiment analysis on the extracted text.
  4. Readability Metrics: Run readability_metrics.py to calculate readability metrics.
  5. Output: The calculated metrics are saved to Output_Data.csv for further analysis.

Repository Structure

  • Articles: Contains extracted text files from articles.
  • StopWords: Includes stop words lists for filtering out common words.
  • MasterDictionary: Contains dictionaries of positive and negative words.
  • Input.xlsx: Input file with URLs of articles.
  • Output_Data.csv: Output file with calculated metrics.
  • data_extraction.py: Script for extracting text from URLs.
  • sentiment_analysis.py: Script for performing sentiment analysis.
  • requirements.txt: List of required Python packages.

Requirements

  • Python
  • BeautifulSoup
  • NLTK

About

The Text Analysis Toolkit is a Python-based tool for analyzing textual data extracted from articles. It performs sentiment analysis, calculates readability metrics, and extracts key linguistic features to provide valuable insights into the content of the articles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published