[Proposal]: AI Discussion Seeding for Unit-Level Discussions in Open edX

### Type of Request

Fast Track Change (small, low-risk improvements)

### Feature Description

## TL;DR

- Introduce **AI-generated questions** to seed unit-level discussions directly under course content (web + mobile).
- Course authors **review and approve** AI-generated questions via a wizard before they are gradually published.
- Feature is **opt-in at instance level**, disabled by default, with clear requirements for **transparency to learners**.
- **Fully automatic mode** is optional and restricted; recommended default is **manual review**.

---

## 1. Overview

This proposal introduces **AI-powered seeding of unit-level discussions** in Open edX (web and mobile).  
A subset of early questions under each unit is generated by AI, reviewed by course authors, and then published over time to make courses feel more active and reduce the **“empty room”** effect.

---

## 2. Summary

This proposal introduces **AI Discussion Seeding** for unit-level discussions in Open edX, available both on the web and in the official mobile applications.

For each course unit (video, text, PDF, interactive xBlock), the platform provides a dedicated discussion thread directly under the learning content. A subset of early questions in this thread can be generated by AI based on the unit content and previous units. These questions are designed to look like natural, meaningful questions that real learners might ask.

Course authors are guided through a wizard for the whole course: they review AI-generated candidate questions per unit, select the useful ones, optionally edit them, and then approve. Once approved, the system gradually publishes these questions in the unit-level discussions with delays of **1–7 days**, simulating a natural, non-robotic flow of discussion.

Optionally, an instance may enable a **fully automatic mode** where the system selects and publishes questions without manual review by the author.

The feature is **disabled by default**. It can only be enabled at the LMS instance level by an owner/administrator who accepts all related risks and commits to informing learners about the use of AI-generated questions.

**Live demo:** <https://stepanok.com/discussions-booster/>

---

## 3. Problem

Asynchronous courses often feel **“empty” and lonely**: most units have no visible discussion, and learners are reluctant to be the first to post. This reduces perceived course quality and motivation and makes existing discussions underused.

---

## 4. Background

Asynchronous online learning often feels lonely:

- Learners study alone and do not see visible activity from other learners.
- Under most units there are no questions or comments.
- Even when learners have questions, they are reluctant to be **“the first one to post”**.

As a result:

- Discussion forums and comments are underused.
- Courses appear **“dead”** or inactive, which damages perceived quality and motivation.
- Course authors spend time curating FAQs and typical questions, but these rarely appear as part of a live, ongoing discussion under specific units.

From marketing and online events we know techniques such as **“auto-webinars”** and lobby-filling: first gather real live activity, then reuse it so new attendees never land in an empty room. People are much more likely to participate when they see that “someone has already asked something”.

In Open edX today:

- Discussions are often separate from specific units (at course/topic level).
- Learners must navigate to a forum and find the right thread, which significantly reduces conversion from **“learner with a question”** to **“learner who actually posts”**.
- When a course has been running for some time, new learners still see many units with **zero discussion activity** underneath.

There is a clear opportunity to:

- Bring discussions closer to the learning content at the **unit level**.
- **Seed** the first questions so learners do not face a completely empty discussion space.
- Help instructors scale their ability to anticipate and surface meaningful questions.

---

## 5. Use Cases

- **As a learner**, I need to see active, meaningful questions directly under each unit so that I feel less alone and more comfortable asking my own questions.
- **As a course author**, I need help seeding good questions for dozens of units so that I can spark discussions without manually writing every prompt myself.
- **As an instructor or tutor**, I need AI-generated questions to appear gradually over time so that I can respond at a sustainable pace and avoid sudden spikes of activity.
- **As an instance administrator**, I need to control where and how AI-generated questions are used, so that we comply with institutional policies and transparency requirements.
- **As a mobile learner**, I need the same discussion experience on my phone as on the web so that I can participate in discussions while learning on the go.

---

## 6. Goals & Non-Goals

### Goals

- Reduce the feeling of loneliness in asynchronous courses and remove the **“empty room”** effect in discussions.
- Increase the percentage of learners who participate in discussions at least once.
- Provide learners with ready-made **entry points** into discussions via meaningful questions under each unit.
- Help course authors quickly obtain a set of good questions per unit without manually writing all of them.
- Ensure that questions appear over time in a **natural manner** (not as a sudden block of messages).

### Non-Goals

- This feature does **not** replace real instructors, tutors, or human support. Answers should be written by real humans.
- The feature is **not** intended to manipulate perceived course quality via fake praise or artificial reviews.
- The feature is **not** meant to turn discussions into a chatbot or fully synthetic conversation. It is about **seeding**, not replacing people.
- This proposal does **not** aim to redesign the entire Discussions UX. It only introduces the functionality required to seed and surface questions under units.

---

## 7. Proposed Solution

### 7.1. Unit-Level Discussions Under Each Unit

For each course unit (xBlock) there is a **dedicated discussion thread**.

This thread is displayed:

- Directly in the unit view on the web LMS.
- As a discussion/comments block under the unit content in the official Open edX mobile applications.

The discussion block is positioned **before** the primary navigation controls (e.g., Next/Previous unit), making it more likely that learners will see existing questions and comments as part of the normal reading flow.

Learners do not need to navigate to a separate forum. The discussion space is **co-located** with the learning content.

### 7.2. AI-Generated Candidate Questions (Discussion Seeding)

For each unit, the system can generate a list of candidate questions.

**Input for AI:**

- The content of the current unit (text, video transcript, PDF text, assignment description, etc.).
- Short summaries of previous units, so the model knows what has already been explained.

**Output:**

- A list of questions that real learners might plausibly ask after engaging with this unit.
- Optionally, suggested draft answers or hints **for instructors only** (not for direct publication).

**Behavioral constraints for AI:**

- No compliments or praise toward the course, the instructor, or the platform.
- No evaluation statements like “this is the best course”, “amazing lesson”, etc.
- Only substantive, content-related questions, clarifications, examples of application, and common misunderstandings.

**Additional requirements:**

- Questions should resemble **real learner language**: sometimes informal, with simple phrasing, not overly polished academic style.
- Questions may vary in complexity:
  - **Basic:** “I did not understand the difference between X and Y.”
  - **Advanced:** “In which scenarios would you prefer approach A over B?”

### 7.3. Modes: Wizard / Manual Review vs Fully Automatic

#### Mode 1: Recommended – Wizard-Based Manual Review

For the course author, enabling AI Discussion Seeding appears as a **course-level wizard**:

1. The author enables AI Discussion Seeding in course settings.
2. The system generates candidate questions for all units in the course.
3. The author goes through a wizard covering the course:
   - For each unit, the author sees a list of candidate questions.
   - The author selects questions to use by ticking checkboxes.
   - The author can edit question text where needed.
   - The author can delete any question that feels incorrect, low-quality, or inappropriate.
4. Until the author finishes reviewing (for the whole course or at least for a given unit), **no seeded questions are published**.

**Advantages of this mode:**

- Full human control over the content that will appear in discussions.
- Reduced risk of hallucinations and problematic content.
- Ability to adapt tone, style, and difficulty to a specific audience.

This mode is the **recommended default**.

#### Mode 2: Optional – Fully Automatic

For some instances/courses, a **fully automatic** scenario may be acceptable:

- The instance owner or course author explicitly enables a **Fully Automatic** mode.
- The system:
  - Generates questions for units.
  - Automatically selects and publishes them **without manual review**.

This mode can be useful for:

- Small internal trainings.
- Corporate environments where instructors are comfortable with AI automation.
- Experimental or pilot deployments.

Fully Automatic mode:

- Should be clearly **separated** from the recommended manual-review mode.
- May require additional confirmation or restrictions.
- Should likely be **disabled by default** at the instance level.

### 7.4. Publishing Schedule and Natural Flow

Once questions are approved (or Fully Automatic is enabled), the system:

- Places questions in a **per-unit publishing queue**.
- Publishes them gradually over time, with delays of **1–7 days** between questions.
- Optionally considers activity, for example:
  - Do not publish new seeded questions if several previous ones remain unanswered.

The system should avoid:

- Publishing many questions at once under a unit.
- Creating a perfectly regular, robotic pattern (e.g., exactly every 24 hours).

The goal is to approximate a **human-like, organic flow** of new questions.

### 7.5. Decreasing Intensity Across the Course

To reflect realistic learner behavior:

- Early units may receive **more seeded questions**.
- Later units receive fewer. Example pattern:
  - 8–10 questions in early units.
  - 2–3 in the middle.
  - 0–1 in the final units.
- For simple or trivial units, the author or system may skip seeding entirely.

This matches the fact that fewer learners typically reach later sections, and that not every unit warrants heavy discussion.

### 7.6. Role of Instructor / Tutor

A key principle of this proposal:

> Seeded questions are prompts for real humans; they are **not** meant to be answered by bots.

In this feature:

- AI does **not** post answers to learners on its own.
- AI may provide **draft answers or hints** to instructors as optional support, but human instructors decide:
  - Whether to use these drafts.
  - How to phrase the final answer.

**Benefits:**

- Instructors reinforce their expertise and presence.
- Learners build a sense of connection with a **real instructor**, not just a system.
- The risk of incorrect or unethical AI-generated answers appearing in the thread is significantly reduced.

---

## 8. User Experience (Web & Mobile)

### 8.1. Learner Experience

Target UX: as easy and natural as posting a comment on Reddit or YouTube.

Key characteristics:

- The discussion is visible directly under the unit content.
- To post a comment, a learner simply:
  - Scrolls down to the discussion block.
  - Types in a single comment field.
  - Clicks **“Post”** with minimal friction.

Core capabilities:

- Nested replies to support conversation threads.
- Basic formatting (lists, links, code/formula where appropriate).
- Sorting by **“new”** or **“top”** comments.

In mobile applications:

- The comments block is integrated under the unit view.
- Interactions should follow established mobile patterns for comments/conversations.

### 8.2. Instructor / Author Experience

The course author gets:

- A **course-level wizard** for AI Discussion Seeding:
  - An overview of all units.
  - For each unit, a list of AI-generated candidate questions.
  - A simple review flow: select, edit, or discard questions.
  - Status indicators per unit, e.g.:
    - AI questions: generated / reviewed / scheduled / posted.
- The ability to:
  - Turn off seeding for specific units.
  - Stop publishing new seeded questions at any time.

The goal is to **minimize instructor workload**: ideally, the author just reviews and checks boxes, rather than crafting all questions from scratch.

### 8.3. Instance Owner / Administrator Experience

At the instance level:

- The feature is **disabled by default**.

Enabling AI Discussion Seeding requires:

- An explicit configuration change.
- An acknowledgement that the instance will **inform learners** about the use of AI-generated questions.

The administrator can:

- Allow or disallow **Fully Automatic** mode.
- Configure global limits (for example, maximum number of seeded questions per week per course).
- Define default behavior for intervals between seeded questions.

---

## 9. Ethics, Transparency & Governance

To maintain trust and avoid manipulation:

- The feature is strictly **opt-in** at the instance level:
  - It ships as **disabled by default**.
  - It becomes active only when an instance owner consciously enables it and accepts all ethical and legal responsibilities.
- Institutions are required to **inform learners** that some questions in discussions may be AI-generated:
  - This can be done via course policies, a global FAQ, onboarding messages, or similar mechanisms.

The level of explicit labeling of AI content should be **configurable**:

- **Minimal option**: learners are informed in policy/FAQ, but questions are not individually marked.
- **Stronger option**: seeded questions are associated with a clearly marked **system profile** whose user profile states that it is an AI/system account used to improve learner experience.
- **Optional**: a visual indicator next to seeded questions, if required by institutional policies.

In the recommended **manual-review mode**:

- Every seeded question has been **approved by a human** before publication.
- AI does not post anything directly without instructor oversight.

The feature should be designed to respect:

- Applicable **local laws and regulations** regarding AI.
- Institutional policies on **transparency and data usage**.

---

## 10. Technical Considerations (High-Level)

This section outlines high-level technical aspects, leaving implementation details to follow-up design documents.

### Data and Modeling

Store AI-generated questions as a separate entity associated with:

- `course_id`
- `unit_id` (xBlock)
- `status`: `generated` / `approved` / `rejected` / `scheduled` / `posted`

Maintain timestamps and metadata for **scheduling and analytics**.

### AI Integration

A service that:

- Receives structured content from the current unit.
- Receives summaries or key concepts from previous units.
- Produces a list of candidate questions (plus optional draft answers/hints).

### Scheduling Component

Responsible for:

- Managing queues of approved questions for each unit.
- Publishing questions according to configured intervals and constraints.
- Optionally, checking instructor load (for example, limiting the number of unanswered seeded questions at any given time).

### Web and Mobile Integration

Use or extend existing **Discussions APIs** so that:

- Seeded and normal learner questions are handled through the **same interfaces**.
- Mobile clients can display and interact with seeded questions just like any other discussion post (unless explicit UI labeling is required).

### Configuration

Support settings at:

- **Instance level** (feature flags, default behaviors, fully automatic mode availability).
- **Course level** (enable/disable AI seeding, choose mode).
- **Unit level** (enable/disable AI seeding for specific units if needed).

---

## 11. Risks & Open Questions

### Potential Risks

- Some learners may react negatively if AI-seeded questions are deliberately presented as human while still showing clear signs of being machine-generated, which makes it especially important to maintain **high quality and naturalness** of these questions.
- If institutions do not implement **transparent communication**, trust in the platform may be eroded.
- Instructors may be overwhelmed by **too many candidate questions** to review.
- Seeding may fail to significantly increase genuine learner activity if the underlying UX for discussions remains inconvenient or unfamiliar.

### Open Questions for the Community

- What is an acceptable **default level of labeling** for AI-generated questions?
- Should there be a **global cap** on the number of seeded questions per course per week or per unit?
- Are the **1–7 day** publishing intervals a good default, or should we support more dynamic rules (e.g., based on course pacing or enrollment spikes)?
- Should **Fully Automatic** mode be available to all instances by default, or only to those that explicitly request it (e.g., via configuration or feature toggles)?
- How do we best support **multilingual courses**, where questions may need to be generated in multiple languages?

---

## 12. Success Metrics

Suggested metrics to evaluate the impact of AI Discussion Seeding:

- Increase in the percentage of learners who **post at least one comment** in a course.
- Increase in the average number of **questions/comments per unit**.
- Reduction in the average **time from course launch to the first question** under each unit.
- Increase in the number of courses where unit-level discussions show **meaningful activity**.

Qualitative feedback:

- Learners report that the course feels **less lonely** and more “alive”.
- Instructors report that it has become easier to **start and maintain discussions** and to address common questions.

---

## Long-Term Ownership and Maintenance

We propose to treat AI Discussion Seeding as an **official extension** to the Core Product Discussions experience.

**Primary technical ownership:** Raccoon Gang (or another designated maintainer), in collaboration with the Open edX Core Product and Mobile working groups.

**Code location:** under the `openedx` GitHub organization, following standard contribution and review practices.

**Maintenance scope:**

- Keep compatibility with supported Open edX releases.
- Maintain integrations with the official mobile applications.
- Monitor and address issues related to AI quality, scheduling, and transparency configuration.

If, in the future, the feature becomes part of the Core Product, ownership can transition to the appropriate maintainers as defined by **OEP-57 (Core Product)**.

---

## Contact Person

**Proposal coordinator:** Ivan Stepanok, Raccoon Gang  
📧 `ivan.stepanok@raccoongang.com`

---

## 13. Implementation Plan

### Phase 0 – Discovery & Design (4–6 weeks)

- Validate detailed requirements with the Core Product Working Group and Mobile subgroup.
- Finalize UX flows for authors, learners, and admins (including labeling options and configuration screens).
- Align with existing Discussions architecture and APIs on web and mobile.

### Phase 1 – Backend and Web MVP (8–10 weeks)

- Implement backend services for AI question generation, storage, and scheduling.
- Extend Discussions APIs to support seeded questions and their lifecycle statuses.
- Implement the course-level wizard for manual review on the web LMS.
- Ship an MVP behind a feature flag to a small number of pilot instances.

### Phase 2 – Mobile Integration (6–8 weeks)

- Integrate seeded questions into official iOS and Android apps using extended Discussions APIs.
- Ensure UX parity with web, including sorting, replies, and labeling where needed.
- Optimize performance and payload size for mobile clients.

### Phase 3 – Metrics, Governance & Rollout (4–6 weeks)

- Implement metrics instrumentation (discussion activity, time to first post, etc.).
- Finalize configuration options for instance admins (frequency caps, automatic mode gating).
- Document deployment and governance guidelines for operators.

---

## 14. Resources & Funding Status

**Raccoon Gang** is currently looking for funding to implement this proposal and is prepared to lead the technical implementation in collaboration with the Open edX product and mobile working groups.


### Link to Product Proposal

https://openedx.atlassian.net/wiki/x/BoA9PgE

### Status

New

### Proposed By

Ivan Stepanok (Raccoon Gang)

[Proposal]: AI Discussion Seeding for Unit-Level Discussions in Open edX #482

Description

Type of Request

Feature Description

TL;DR

1. Overview

2. Summary

3. Problem

4. Background

5. Use Cases

6. Goals & Non-Goals

Goals

Non-Goals

7. Proposed Solution

7.1. Unit-Level Discussions Under Each Unit

7.2. AI-Generated Candidate Questions (Discussion Seeding)

7.3. Modes: Wizard / Manual Review vs Fully Automatic

Mode 1: Recommended – Wizard-Based Manual Review

Mode 2: Optional – Fully Automatic

7.4. Publishing Schedule and Natural Flow

7.5. Decreasing Intensity Across the Course

7.6. Role of Instructor / Tutor

8. User Experience (Web & Mobile)

8.1. Learner Experience

8.2. Instructor / Author Experience

8.3. Instance Owner / Administrator Experience

9. Ethics, Transparency & Governance

10. Technical Considerations (High-Level)

Data and Modeling

AI Integration

Scheduling Component

Web and Mobile Integration

Configuration

11. Risks & Open Questions

Potential Risks

Open Questions for the Community

12. Success Metrics

Long-Term Ownership and Maintenance

Contact Person

13. Implementation Plan

Phase 0 – Discovery & Design (4–6 weeks)

Phase 1 – Backend and Web MVP (8–10 weeks)

Phase 2 – Mobile Integration (6–8 weeks)

Phase 3 – Metrics, Governance & Rollout (4–6 weeks)

14. Resources & Funding Status

Link to Product Proposal

Status

Proposed By

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions