-
-
Notifications
You must be signed in to change notification settings - Fork 434
Description
Feature and its Use Cases
Overview
EduAid currently performs AI question generation (MCQ, Short Questions, Boolean Questions) synchronously inside Flask API endpoints. These endpoints directly run transformer-based models (such as T5-large and T5-base) during the request-response cycle.
Because transformer inference is computationally expensive, this design causes several limitations:
- API requests remain blocked during model inference
- Response times can become very slow for long documents
- Concurrent user requests cannot be handled efficiently
- The system does not scale well when multiple users generate quizzes simultaneously
To address these limitations, this feature proposes introducing a Distributed Asynchronous Inference Pipeline using a task queue and background worker architecture.
Problem Statement
Currently the request flow follows a synchronous architecture:
Client Request
↓
Flask API Endpoint
↓
AI Model Inference (T5)
↓
Response Returned
Issues with the current architecture:
- Long-running model inference blocks the API thread
- Multiple simultaneous requests may cause server slowdowns or timeouts
- No ability to queue tasks or distribute workloads
- Poor scalability for larger deployments or classroom-scale usage
Since EduAid uses transformer models, which are computationally intensive, a distributed inference architecture would significantly improve system performance and reliability.
Proposed Solution
Introduce an asynchronous task processing pipeline that decouples API requests from model inference.
Proposed architecture:
Client Request
↓
Flask API Server
↓
Task Queue (Redis / RabbitMQ)
↓
Background Worker (Celery)
↓
AI Model Inference
↓
Store Result
↓
Client retrieves result via Task ID
Core Components
1. Flask API (Gateway Layer)
The API server will receive requests and enqueue them as asynchronous tasks.
2. Task Queue System
A message broker such as Redis or RabbitMQ will handle task distribution.
3. Worker Processes
Background worker processes will perform AI model inference using the existing generator classes:
MCQGeneratorShortQGeneratorBoolQGenerator
4. Result Backend
Generated quiz results will be temporarily stored and retrieved using task IDs.
Expected API Workflow
- Client sends a quiz generation request.
- API creates a background task and returns a task_id immediately.
- Worker processes perform the AI inference asynchronously.
- Client retrieves the result using the task ID.
Example flow:
POST /generate_mcq_async
↓
Returns: task_id
GET /task_status/<task_id>
GET /task_result/<task_id>
Benefits of this Enhancement
1. Improved Scalability
Multiple worker processes can run inference tasks concurrently, enabling the system to handle multiple users.
2. Faster API Response Times
The API returns immediately with a task ID instead of waiting for model inference.
3. Better Resource Utilization
Workers can be scaled depending on available CPU/GPU resources.
4. Improved Reliability
Task queues enable:
- retry mechanisms
- failure recovery
- workload balancing
5. Production-Ready Architecture
This architecture follows industry best practices used in machine learning inference systems.
Implementation Considerations
Possible technologies:
- Celery for distributed task processing
- Redis as message broker and result backend
- Worker pools to manage model inference tasks
- Task status tracking endpoints
To ensure backward compatibility, existing synchronous endpoints can remain unchanged while asynchronous endpoints are introduced.
Potential Future Extensions
- Real-time task progress updates
- WebSocket-based notifications
- GPU-aware worker scheduling
- Task prioritization for large workloads
Additional Context
Additional Context
EduAid currently performs AI inference synchronously within Flask endpoints. As the system evolves and usage grows, introducing a distributed asynchronous processing architecture will help ensure scalability and maintain consistent performance for AI-based quiz generation.
This enhancement aligns with modern ML deployment patterns and would significantly improve the backend infrastructure of EduAid.
Code of Conduct
- I have joined the Discord server and will post updates there
- I have searched existing issues to avoid duplicates