Skip to content

feat: Add xgb detectors #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

christinaexyou
Copy link

@christinaexyou christinaexyou commented Jul 22, 2025

Summary by Sourcery

Add XGBoost-based SMS spam detector with end-to-end training, inference API, and containerization

New Features:

  • Introduce training script to load and preprocess SMS spam dataset, perform TF-IDF vectorization, and tune XGBoost hyperparameters
  • Implement Detector class and FastAPI service for spam detection inference
  • Provide Makefile and Dockerfile targets to run training and build the detector image

Build:

  • Add requirements.txt for detector dependencies

Documentation:

  • Add README with setup instructions, API usage, and local testing guidelines

Tests:

  • Add integration tests for the XGBoost detector endpoint

Copy link

sourcery-ai bot commented Jul 22, 2025

Reviewer's Guide

This PR introduces a full XGBoost-based spam detection feature, comprising a training pipeline with data loading, text preprocessing, TF-IDF feature extraction, hyperparameter search, model artifact serialization, a runtime detector class with batching and CUDA support, FastAPI-based service integration with Prometheus instrumentation, end-to-end integration tests, and supporting documentation and Docker setup.

Sequence diagram for FastAPI XGB detector request handling

sequenceDiagram
    actor User
    participant FastAPI as FastAPI app
    participant Detector as Detector
    participant Model as XGB Model
    User->>FastAPI: POST /api/v1/text/contents
    FastAPI->>Detector: run(request)
    Detector->>Model: vectorizer.transform(text)
    Detector->>Model: model.predict(vectorized_text)
    Detector-->>FastAPI: ContentAnalysisResponse
    FastAPI-->>User: ContentsAnalysisResponse
Loading

Class diagram for new XGB detector components

classDiagram
    class Detector {
        - model
        - vectorizer
        - cuda_device
        - batch_size
        + __init__()
        + run(request: ContentAnalysisHttpRequest) ContentAnalysisResponse
    }
    Detector --> ContentAnalysisHttpRequest
    Detector --> ContentAnalysisResponse
    class ContentAnalysisHttpRequest
    class ContentAnalysisResponse
    class DetectorBaseAPI
    class FastAPI
    DetectorBaseAPI <|-- FastAPI
    class DetectorRegistry
    Detector ..> DetectorRegistry : uses
    Detector ..> logger : logs
    Detector ..> torch : uses
    Detector ..> xgb : uses
    Detector ..> pickle : loads model
    Detector ..> TfidfVectorizer : uses
    Detector ..> GridSearchCV : uses
    Detector ..> PorterStemmer : uses
    Detector ..> stopwords : uses
    Detector ..> pd : uses
    Detector ..> load_dataset : uses
Loading

File-Level Changes

Change Details Files
Added training pipeline for XGBoost detector
  • Load SMS spam dataset from Hugging Face
  • Preprocess text with NLTK stopwords removal and stemming
  • Vectorize text using TF-IDF
  • Perform GridSearchCV for hyperparameter tuning and train final XGBClassifier
  • Serialize vectorizer and model with pickle
detectors/xgb/build/train.py
detectors/xgb/build/Makefile
Implemented runtime detector class
  • Load model and vectorizer artifacts from filesystem
  • Support CUDA device selection and batching logic
  • Predict on incoming text batches and aggregate spam detection
  • Wrap predictions into ContentAnalysisResponse
detectors/xgb/build/detector.py
Created FastAPI application for serving detector
  • Define lifespan context manager to initialize and cleanup detector
  • Instrument API with Prometheus metrics
  • Define POST endpoint /api/v1/text/contents for unary detection requests
detectors/xgb/build/app.py
Added integration tests for XGB detector API
  • Setup FastAPI TestClient with detector fixture
  • Parametrized tests for spam and non-spam content
  • Validate HTTP response code and detection results
detectors/xgb/test_xgb.py
tests/detectors/xgb/test_xgb.py
Provided documentation and environment setup
  • Add README with build, run, and API instructions
  • Specify XGB detector dependencies in requirements.txt
  • Include Dockerfile for container image and Makefile targets
detectors/xgb/README.md
detectors/xgb/requirements.txt
detectors/Dockerfile.xgb

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant