[FEATURE]: Introduce Distributed Asynchronous Inference Pipeline for AI Question Generation

### Feature and its Use Cases

#### **Overview**

EduAid currently performs AI question generation (MCQ, Short Questions, Boolean Questions) synchronously inside Flask API endpoints. These endpoints directly run transformer-based models (such as **T5-large** and **T5-base**) during the request-response cycle.

Because transformer inference is computationally expensive, this design causes several limitations:

- API requests remain **blocked during model inference**
- Response times can become **very slow for long documents**
- **Concurrent user requests cannot be handled efficiently**
- The system does not scale well when multiple users generate quizzes simultaneously

To address these limitations, this feature proposes introducing a **Distributed Asynchronous Inference Pipeline** using a **task queue and background worker architecture**.

---

## Problem Statement

Currently the request flow follows a synchronous architecture:

```
Client Request
      ↓
Flask API Endpoint
      ↓
AI Model Inference (T5)
      ↓
Response Returned
```

Issues with the current architecture:

- Long-running model inference blocks the API thread
- Multiple simultaneous requests may cause server slowdowns or timeouts
- No ability to queue tasks or distribute workloads
- Poor scalability for larger deployments or classroom-scale usage

Since EduAid uses **transformer models**, which are computationally intensive, a **distributed inference architecture** would significantly improve system performance and reliability.

---

## Proposed Solution

Introduce an **asynchronous task processing pipeline** that decouples API requests from model inference.

Proposed architecture:

```
Client Request
      ↓
Flask API Server
      ↓
Task Queue (Redis / RabbitMQ)
      ↓
Background Worker (Celery)
      ↓
AI Model Inference
      ↓
Store Result
      ↓
Client retrieves result via Task ID
```

### Core Components

**1. Flask API (Gateway Layer)**  
The API server will receive requests and enqueue them as asynchronous tasks.

**2. Task Queue System**  
A message broker such as **Redis** or **RabbitMQ** will handle task distribution.

**3. Worker Processes**  
Background worker processes will perform AI model inference using the existing generator classes:

- `MCQGenerator`
- `ShortQGenerator`
- `BoolQGenerator`

**4. Result Backend**  
Generated quiz results will be temporarily stored and retrieved using task IDs.

---

## Expected API Workflow

1. Client sends a quiz generation request.
2. API creates a **background task** and returns a **task_id** immediately.
3. Worker processes perform the AI inference asynchronously.
4. Client retrieves the result using the task ID.

Example flow:

```
POST /generate_mcq_async
    ↓
Returns: task_id

GET /task_status/<task_id>

GET /task_result/<task_id>
```

---

## Benefits of this Enhancement

### **1. Improved Scalability**
Multiple worker processes can run inference tasks concurrently, enabling the system to handle multiple users.

### **2. Faster API Response Times**
The API returns immediately with a task ID instead of waiting for model inference.

### **3. Better Resource Utilization**
Workers can be scaled depending on available CPU/GPU resources.

### **4. Improved Reliability**
Task queues enable:
- retry mechanisms
- failure recovery
- workload balancing

### **5. Production-Ready Architecture**
This architecture follows industry best practices used in **machine learning inference systems**.

---

## Implementation Considerations

Possible technologies:

- **Celery** for distributed task processing
- **Redis** as message broker and result backend
- Worker pools to manage model inference tasks
- Task status tracking endpoints

To ensure backward compatibility, existing synchronous endpoints can remain unchanged while asynchronous endpoints are introduced.

---

## Potential Future Extensions

- Real-time task progress updates
- WebSocket-based notifications
- GPU-aware worker scheduling
- Task prioritization for large workloads

### Additional Context

## Additional Context

EduAid currently performs AI inference synchronously within Flask endpoints. As the system evolves and usage grows, introducing a **distributed asynchronous processing architecture** will help ensure scalability and maintain consistent performance for AI-based quiz generation.

This enhancement aligns with modern ML deployment patterns and would significantly improve the backend infrastructure of EduAid.

---


### Code of Conduct

- [x] I have joined the [Discord server](https://discord.gg/hjUhu33uAn) and will post updates there
- [x] I have searched existing issues to avoid duplicates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: Introduce Distributed Asynchronous Inference Pipeline for AI Question Generation #586

Feature and its Use Cases

Overview

Problem Statement

Proposed Solution

Core Components

Expected API Workflow

Benefits of this Enhancement

1. Improved Scalability

2. Faster API Response Times

3. Better Resource Utilization

4. Improved Reliability

5. Production-Ready Architecture

Implementation Considerations

Potential Future Extensions

Additional Context

Additional Context

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE]: Introduce Distributed Asynchronous Inference Pipeline for AI Question Generation #586

Description

Feature and its Use Cases

Overview

Problem Statement

Proposed Solution

Core Components

Expected API Workflow

Benefits of this Enhancement

1. Improved Scalability

2. Faster API Response Times

3. Better Resource Utilization

4. Improved Reliability

5. Production-Ready Architecture

Implementation Considerations

Potential Future Extensions

Additional Context

Additional Context

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions