Skip to content

arkaprovob/sentiment-analysis

Repository files navigation

Sentiment Analysis API

CI License: MIT Python 3.8+

A robust sentiment analysis API powered by state-of-the-art transformer models. This API can analyze text sentiment with high accuracy, handling both short and long texts through intelligent chunking.

Features

  • πŸš€ Fast and Accurate: Powered by pre-trained transformer models
  • πŸ“ Long Text Support: Automatically chunks and analyzes texts longer than model capacity
  • 🎯 Detailed Analysis: Returns confidence scores, sentiment distribution, and chunk-level analysis
  • πŸ”§ RESTful API: Easy-to-use endpoints with comprehensive documentation
  • πŸ“Š Token Counting: Check text length before analysis
  • 🐳 Docker Support: Ready-to-deploy Docker configuration
  • βœ… Well-Tested: Comprehensive test suite with unit and integration tests

Project Structure

sentiment-analysis/
β”œβ”€β”€ sentiment_analysis/         # Main package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                 # FastAPI application setup
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── routes.py          # Route definitions
β”‚   β”œβ”€β”€ core/                  # Core business logic
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── analyzer.py        # Sentiment analysis logic
β”‚   β”œβ”€β”€ models/                # Data models
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── schemas.py         # Pydantic schemas
β”‚   └── tests/                 # Test suite
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ test_analyzer.py   # Unit tests
β”‚       └── test_integration.py # Integration tests
β”œβ”€β”€ .github/                   # GitHub Actions workflows
β”‚   └── workflows/
β”‚       └── ci.yml
β”œβ”€β”€ main.py                    # Application entry point
β”œβ”€β”€ requirements.txt           # Production dependencies
β”œβ”€β”€ requirements-dev.txt       # Development dependencies
β”œβ”€β”€ setup.py                   # Package setup
β”œβ”€β”€ Dockerfile                 # Docker configuration
β”œβ”€β”€ LICENSE                    # MIT License
β”œβ”€β”€ CONTRIBUTING.md           # Contribution guidelines
└── README.md                 # This file

Installation

Using pip

# Clone the repository
git clone https://github.com/yourusername/sentiment-analysis.git
cd sentiment-analysis

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e .

# For development
pip install -e ".[dev]"

Using Docker

# Build the image
docker build -t sentiment-analysis .

# Run the container
docker run -p 8000:8000 sentiment-analysis

Quick Start

  1. Start the API server:

    python main.py
  2. Access the API documentation:

  3. Make your first request:

    curl -X POST "http://localhost:8000/api/v1/analyze" \
         -H "Content-Type: application/json" \
         -d '{"text": "I love this API! It works great."}'

API Endpoints

Sentiment Analysis

Endpoint: POST /api/v1/analyze

Analyzes the sentiment of provided text.

Request:

{
  "text": "Your text to analyze"
}

Response:

{
  "label": "POSITIVE",
  "score": 0.9876,
  "raw_scores": [0.001, 0.002, 0.009, 0.9876, 0.0004],
  "numerical_sentiment": 1.0,
  "sentiment_distribution": {"POSITIVE": 100.0},
  "confidence_level": "high",
  "num_chunks": 1,
  "chunk_votes": {"POSITIVE": 1}
}

Token Counting

Endpoint: POST /api/v1/count-tokens

Counts tokens in the provided text.

Request:

{
  "text": "Your text to count tokens"
}

Response:

{
  "token_count": 8,
  "max_tokens": 512,
  "percentage": 1.56,
  "truncated": false
}

Model Information

Endpoint: GET /api/v1/model-info/capacity

Returns the maximum token capacity of the model.

Response:

{
  "max_tokens": 512
}

Health Check

Endpoint: GET /api/v1/health

Check service health status.

Response:

{
  "status": "healthy",
  "service": "sentiment-analysis-api"
}

Long Text Analysis

The API automatically handles long texts that exceed the model's token limit (512 tokens) by:

  1. Chunking: Dividing text into overlapping segments
  2. Individual Analysis: Analyzing each chunk separately
  3. Aggregation: Combining results using:
    • Voting: Most common sentiment wins
    • Weighted Scoring: Confidence-weighted averaging
    • Distribution Analysis: Percentage breakdown by sentiment
    • Confidence Assessment: Based on agreement between chunks

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=sentiment_analysis --cov-report=html

# Run specific test file
pytest sentiment_analysis/tests/test_analyzer.py -v

Code Quality

# Format code
black sentiment_analysis/

# Lint code
flake8 sentiment_analysis/

# Type checking
mypy sentiment_analysis/

Pre-commit Hooks

# Install pre-commit
pip install pre-commit

# Set up hooks
pre-commit install

Configuration

The API uses the following model by default:

  • Model: tabularisai/robust-sentiment-analysis
  • Max tokens: 512
  • Classes: 5 (Very Negative, Negative, Neutral, Positive, Very Positive)

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support


Made with ❀️ by the Sentiment Analysis API Team

About

🎯 Sentiment analysis without the computational overhead. A practical example of frugal AI - using the right tool for the job instead of throwing LLMs at everything. Open-source toolbox project.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors