Skip to content

Commit 23645e1

Browse files
authored
Update monday.mdx
1 parent 7241985 commit 23645e1

File tree

1 file changed

+385
-0
lines changed

1 file changed

+385
-0
lines changed

docs/showcase/monday.mdx

Lines changed: 385 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,386 @@
1+
---
2+
title: Monday – Voice-First AI Learning Assistant
3+
description: An accessible, multimodal AI learning companion that delivers contextual reasoning, 3D visualizations, and curated educational content via natural voice interaction.
4+
sidebar_position: 9
5+
keywords: [monday, AI, VR, education, accessibility, voice-assistant, 3D-visualization, multimodal-learning, perplexity, elevenlabs]
6+
---
7+
8+
# Monday – Voice-First AI Learning Assistant
9+
10+
**Monday** is a voice-enabled AI learning companion designed to bridge the gap between natural language queries and high-quality educational content. Inspired by Marvel’s JARVIS and FRIDAY, and educational platforms like Khan Academy and 3Blue1Brown, Monday delivers tailored responses in three modes—Basic, Reasoning, and Deep Research—while integrating immersive visualizations, curated video content, and accessibility-first design.
11+
12+
Our mission: make learning adaptive, inclusive, and hands-free—whether you’re seeking quick facts, step-by-step reasoning, or deep academic research.
13+
14+
## Features
15+
16+
- **Three Learning Modes**:
17+
- **Basic Mode** – Quick factual answers with citations.
18+
- **Reasoning Mode** – Step-by-step logical explanations (triggered by the phrase "think about").
19+
- **Deep Research Mode** – Multi-source investigations visualized as connected knowledge webs (triggered by the phrase "research into").
20+
- **Voice-first interaction** for hands-free learning.
21+
- **Real-time 3D visualizations** of concepts using Three.js & WebXR.
22+
- **Curated educational Youtube video integration** from trusted sources.
23+
- **Smart Search Algorithm that extracts keywords from AI response content using NLP and filters results for educational, embeddable content.
24+
- **Multi-modal feedback** combining text, speech (via ElevenLabs), and spatial panels.
25+
- **VR-optional** design for immersive experiences without requiring a headset.
26+
- **Accessibility-focused interface** for mobility- and vision-impaired users.
27+
28+
## Example Flow:
29+
30+
User: "Hey Monday, think about photosynthesis"
31+
AI Response: "Photosynthesis involves chlorophyll, sunlight, and carbon dioxide..."
32+
Keywords Extracted: ["photosynthesis", "chlorophyll", "sunlight"]
33+
YouTube Query: "photosynthesis chlorophyll sunlight explained tutorial analysis"
34+
Result: 3 relevant educational videos about photosynthesis
35+
36+
## Prerequisites
37+
38+
Before using Monday, ensure you have:
39+
40+
- A device with a microphone.
41+
- Modern web browser (Chrome, Edge, or Firefox recommended).
42+
- Optional: VR headset for immersive mode (WebXR compatible).
43+
- Internet connection for API-driven responses and 3D assets.
44+
45+
## Installation
46+
47+
# Clone the repository
48+
git clone https://github.com/srivastavanik/monday.git
49+
cd monday
50+
git checkout final
51+
cd nidsmonday
52+
53+
# Install dependencies
54+
npm install
55+
56+
# Create a .env file and set your API keys
57+
PERPLEXITY_API_KEY=your_api_key
58+
ELEVENLABS_API_KEY=your_api_key
59+
YOUTUBE_API_KEY=your_api_key
60+
61+
#Start Backend Server (Terminal 1)
62+
node backend-server.js
63+
64+
# Start frontend
65+
npm run dev
66+
67+
68+
## Usage
69+
70+
1. Launch the app in your browser.
71+
2. Say **"Hey Monday"** to activate the assistant.
72+
3. Ask a question in one of three modes:
73+
- **Basic Mode** – “What is photosynthesis?”
74+
- **Reasoning Mode** – “Think about how blockchain works.”
75+
- **Deep Research Mode** – “Research into the history of quantum mechanics.”
76+
4. View answers as:
77+
- Floating text panels.
78+
- Voice responses.
79+
- Interactive 3D models (when relevant)
80+
81+
## Code Explanation
82+
Voice Capture (Frontend)
83+
ts
84+
CopyEdit
85+
// Captures finalized speech recognition results and forwards them to the command processor.
86+
this.recognition.onresult = (event: SpeechRecognitionEvent) => {
87+
let finalTranscript = ''
88+
for (let i = event.resultIndex; i < event.results.length; i++) {
89+
if (event.results[i].isFinal) {
90+
finalTranscript += event.results[i][0].transcript
91+
}
92+
}
93+
if (finalTranscript) {
94+
console.log('VoiceController: 🎤 Final transcript:', finalTranscript)
95+
this.currentTranscript = finalTranscript
96+
this.onTranscriptChange?.(finalTranscript)
97+
// Send directly to command processor—no filtering here
98+
this.commandProcessor.queueCommand(finalTranscript, Date.now())
99+
}
100+
}
101+
102+
Description.
103+
The client’s VoiceSystemController uses the Web Speech API to continuously listen for speech. In the onresult handler above, any finalized recognition result is captured as finalTranscript and immediately forwarded to the command-processing system via queueCommand. This converts spoken input into text and injects it into the pipeline without local filtering, delegating interpretation to the command processor.
104+
105+
Voice Command Processing & Activation (Frontend)
106+
ts
107+
CopyEdit
108+
private async processCommand(event: CommandEvent): Promise<void> {
109+
const normalizedTranscript = event.transcript.toLowerCase().trim()
110+
const isActivation = normalizedTranscript.includes('hey monday')
111+
const isWithinConversation = this.isConversationActive()
112+
113+
console.log(`🔍 CommandProcessor: Evaluating command: "${event.transcript}"`, {
114+
isActivation,
115+
isWithinConversation,
116+
conversationActive: this.conversationContext.active,
117+
timeSinceLastCommand: this.conversationContext.lastCommandTime ?
118+
Date.now() - this.conversationContext.lastCommandTime : 'N/A'
119+
})
120+
121+
if (isActivation || isWithinConversation) {
122+
console.log(`✅ CommandProcessor: Processing command: "${event.transcript}"`)
123+
124+
// Update context
125+
if (isActivation && !this.conversationContext.active) {
126+
this.startConversation()
127+
}
128+
this.conversationContext.lastCommandTime = event.timestamp
129+
this.conversationContext.commandCount++
130+
131+
// Send to backend
132+
await this.sendToBackend(event.transcript, isActivation)
133+
134+
// Notify UI listeners
135+
this.notifyListeners()
136+
} else {
137+
console.log(`🚫 CommandProcessor: Ignoring non-conversation command: "${event.transcript}"`)
138+
}
139+
140+
event.processed = true
141+
}
142+
143+
Description.
144+
The CommandProcessor manages voice-command routing and conversation context on the client. It checks whether the transcript contains the wake phrase (“hey monday”) or an ongoing conversation is active. Only then is the user’s command treated as actionable. On activation, it may start a new conversation session, timestamp the interaction, and dispatch the raw transcript to the backend (sendToBackend). Inputs outside an active session without the trigger phrase are ignored.
145+
146+
Backend Voice Command Handler (Socket.IO Server)
147+
ts
148+
CopyEdit
149+
socket.on('voice_command', async (data: any) => {
150+
logger.info('Voice command received', { socketId: socket.id, command: data.command?.substring(0, 50) })
151+
152+
const command = parseCommand(data.command || '')
153+
if (!command) {
154+
socket.emit('monday_response', {
155+
type: 'error',
156+
content: 'Please start your command with "Monday"',
157+
timestamp: Date.now()
158+
})
159+
return
160+
}
161+
162+
// Handle different command types
163+
switch (command.type) {
164+
case 'greeting':
165+
socket.emit('monday_response', {
166+
type: 'greeting',
167+
content: "Hello! I'm Monday, your AI learning companion. ... What would you like to learn about today?",
168+
timestamp: Date.now()
169+
})
170+
break
171+
172+
case 'basic':
173+
if (command.content) {
174+
const response = await perplexityService.processQuery({
175+
query: command.content,
176+
mode: 'basic',
177+
sessionId: data.sessionId
178+
})
179+
socket.emit('monday_response', {
180+
type: 'basic_response',
181+
content: response.content,
182+
citations: response.citations,
183+
metadata: response.metadata,
184+
timestamp: Date.now()
185+
})
186+
}
187+
break
188+
189+
case 'reasoning':
190+
if (command.content) {
191+
const response = await perplexityService.processQuery({
192+
query: command.content,
193+
mode: 'reasoning',
194+
sessionId: data.sessionId
195+
})
196+
socket.emit('monday_response', {
197+
type: 'reasoning_response',
198+
content: response.content,
199+
reasoning: response.reasoning,
200+
citations: response.citations,
201+
metadata: response.metadata,
202+
timestamp: Date.now()
203+
})
204+
}
205+
break
206+
207+
case 'deepResearch':
208+
if (command.content) {
209+
const response = await perplexityService.processQuery({
210+
query: command.content,
211+
mode: 'research',
212+
sessionId: data.sessionId
213+
})
214+
socket.emit('monday_response', {
215+
type: 'research_response',
216+
content: response.content,
217+
sources: response.sources,
218+
citations: response.citations,
219+
metadata: response.metadata,
220+
timestamp: Date.now()
221+
})
222+
}
223+
break
224+
225+
// ... (spatial and focus commands omitted for brevity)
226+
}
227+
})
228+
229+
Description.
230+
The server receives voice_command events and parses them to infer intent (e.g., greeting, basic Q&A, reasoning, deep research). For each type, it invokes the Perplexity service with the corresponding mode and the user’s query. The resulting answer—including content, citations, and, where applicable, a reasoning chain or research sources—is emitted back to the client as a monday_response with a type aligned to the mode.
231+
232+
AI Query Processing (Perplexity Service Integration)
233+
ts
234+
CopyEdit
235+
const result = await this.makeRequest('/chat/completions', requestData)
236+
return {
237+
id: result.id || 'reasoning_query',
238+
model: result.model || 'sonar-reasoning',
239+
content: result.choices?.[0]?.message?.content || 'No response generated',
240+
citations: this.extractCitations(result),
241+
reasoning: this.extractReasoningSteps(result.choices?.[0]?.message?.content || ''),
242+
metadata: {
243+
tokensUsed: result.usage?.total_tokens || 0,
244+
responseTime: 0
245+
}
246+
}
247+
248+
Description.
249+
PerplexityService prepares a mode-specific request and calls the external API. It returns a structured result containing the main answer (content), any citations, and—when in reasoning mode—a parsed list of reasoning steps. Using the Sonar API, It also includes metadata such as token usage and the model identifier.
250+
251+
Reasoning Workflow — Extracting Step-by-Step Logic
252+
ts
253+
CopyEdit
254+
private extractReasoningSteps(content: string): ReasoningStep[] {
255+
const steps: ReasoningStep[] = []
256+
const lines = content.split('\n')
257+
let stepCount = 0
258+
259+
for (const line of lines) {
260+
// Look for step indicators like "Step 1:" or "1."
261+
const stepMatch = line.match(/^(?:Step\s+)?(\d+)[:.]?\s*(.+)$/i)
262+
if (stepMatch) {
263+
stepCount++
264+
steps.push({
265+
step: stepCount,
266+
content: stepMatch[2].trim(),
267+
confidence: 0.8,
268+
sources: []
269+
})
270+
}
271+
}
272+
return steps
273+
}
274+
275+
Description.
276+
In reasoning mode, answers are expected to include an ordered thought process. This utility scans the text for step indicators (e.g., “Step 1:or1.”), producing a structured array of steps with content and an initial confidence score. This enables the client to render reasoning as a clear, enumerated sequence.
277+
278+
VR Spatial Response Visualization
279+
ts
280+
CopyEdit
281+
function createSpatialPanels(response: any, mode: string, query: string): any[] {
282+
const panels: any[] = []
283+
284+
// Main content panel
285+
panels.push({
286+
id: `panel_${Date.now()}_main`,
287+
type: 'content',
288+
position: [0, 1.5, -2],
289+
rotation: [0, 0, 0],
290+
title: mode === 'greeting' ? 'Welcome to Monday' : `Learning: ${query}`,
291+
content: response.content,
292+
isActive: true,
293+
opacity: 1,
294+
createdAt: Date.now()
295+
})
296+
297+
// Citations panel if available
298+
if (response.citations && response.citations.length > 0) {
299+
panels.push({
300+
id: `panel_${Date.now()}_citations`,
301+
type: 'content',
302+
position: [2, 1.2, -1.5],
303+
rotation: [0, -30, 0],
304+
title: 'Sources & Citations',
305+
content: response.citations.map((c, i) =>
306+
`${i + 1}. ${c.title}\n${c.snippet}`
307+
).join('\n\n'),
308+
citations: response.citations,
309+
isActive: false,
310+
opacity: 0.8,
311+
createdAt: Date.now()
312+
})
313+
}
314+
315+
// Reasoning panel for complex queries
316+
if (response.reasoning && response.reasoning.length > 0) {
317+
panels.push({
318+
id: `panel_${Date.now()}_reasoning`,
319+
type: 'reasoning',
320+
position: [-2, 1.2, -1.5],
321+
rotation: [0, 30, 0],
322+
title: 'Reasoning Steps',
323+
content: response.reasoning.map((r) =>
324+
`Step ${r.step}: ${r.content}`
325+
).join('\n\n'),
326+
reasoning: response.reasoning,
327+
isActive: false,
328+
opacity: 0.8,
329+
createdAt: Date.now()
330+
})
331+
}
332+
333+
return panels
334+
}
335+
336+
Description.
337+
To bridge AI output into a 3D presentation, the backend constructs spatial panel objects. A main content panel is centered; optional citations and reasoning panels are positioned to the sides. Each panel has an ID, type, position/rotation, title, content, and opacity. These definitions are sent with the response so the client can render floating informational boards in VR.
338+
339+
Spatial Orchestration & Layout (Frontend VR)
340+
ts
341+
CopyEdit
342+
useFrame(() => {
343+
// Continuously rotate the entire group of panels slowly
344+
if (groupRef.current) {
345+
groupRef.current.rotation.y += 0.001
346+
}
347+
348+
// Dynamic layout based on mode
349+
panels.forEach((panel, index) => {
350+
if (spatialLayout === 'focus' && panel.id !== activePanel) {
351+
// In focus mode, push non-active panels far outward
352+
const distance = 5
353+
const angle = (index / panels.length) * Math.PI * 2
354+
panel.position[0] = Math.cos(angle) * distance
355+
panel.position[2] = Math.sin(angle) * distance
356+
357+
} else if (spatialLayout === 'research') {
358+
// In research mode, distribute panels in a layered circle (knowledge constellation)
359+
const radius = 3
360+
const layer = Math.floor(index / 6)
361+
const angleStep = (Math.PI * 2) / Math.min(6, panels.length - layer * 6)
362+
const angle = (index % 6) * angleStep
363+
panel.position[0] = Math.cos(angle) * (radius + layer * 1.5)
364+
panel.position[1] = 1.6 + layer * 0.5
365+
panel.position[2] = Math.sin(angle) * (radius + layer * 1.5)
366+
367+
} else {
368+
// Default layout: semi-circle in front of the user
369+
const angle = (index / Math.max(panels.length - 1, 1)) * Math.PI - Math.PI / 2
370+
const radius = 2.5
371+
panel.position[0] = Math.cos(angle) * radius
372+
panel.position[1] = 1.6 + Math.sin(index * 0.5) * 0.3
373+
panel.position[2] = Math.sin(angle) * radius * 0.5 - 1
374+
}
375+
})
376+
})
377+
378+
Description.
379+
SpatialOrchestrator renders panels in VR and animates placement per the current layout. Default mode arranges panels in a semi-circle ahead of the user. Focus mode pushes non-active panels outward to minimize distraction. Research mode distributes panels into layered circular constellations to accommodate more nodes. The layout logic runs every frame for smooth transitions.
380+
## Links
381+
382+
- [GitHub Repository](https://github.com/srivastavanik/monday/tree/final)
383+
- [Live Demo](https://www.youtube.com/watch?v=BSN3Wp4uE-U)
384+
385+
1386

0 commit comments

Comments
 (0)