RokuNana's Project

Allowing LLMs to naturally participate in Multi-Party Chat environments.

RokuNana isn't just another chatbot that mindlessly replies to every single prompt in a channel. It is a fully integrated team assistant designed specifically for multi-party group chats. It reads the room, maintains an internal monologue, processes multimedia in the background, and only chimes in when it actually has something valuable to add.

Presentation video:

https://www.youtube.com/watch?v=hPal1W1e6o4

Pipeline chart:

The Architecture: Single-Pass Generation

Most AI agents rely on clunky, multi-step loops to function in a group chat. They usually run one prompt to decide if they should reply, another to pick a tool, and a third to generate the text. RokuNana strips all that overhead away.

We built a single-pass generation architecture. By leveraging dynamic Pydantic schemas (as seen in core.py), the model generates its entire state in one continuous JSON stream. In a single execution, RokuNana evaluates the social context, calculates its own "compliance willingness," updates its internal thoughts, proposes tools, and decides whether to output a reply. There is no looped agent constantly polling itself—just one clean, efficient generation that dictates the bot's entire behavior for that interaction.

Real-Time Stream Parsing for Authentic UX

To make RokuNana feel truly alive, the Discord integration in main.py relies on a real-time JSON stream parser.

When a conversation updates, RokuNana quietly starts processing. As the LLM streams its response, our parser hunts for specific fields on the fly. RokuNana takes time to "think" and evaluate the context in the background, but the moment the stream hits the target_user and reply fields, the Discord typing indicator is instantly triggered. If the model decides to stay silent and ignore the conversation, the typing animation never fires. From the user's perspective, this eliminates the robotic instant-reply feel; it acts exactly like a human reading the chat, deciding to weigh in, and typing out their thoughts.

Seamless Multimedia & Tool Chaining

RokuNana handles context richly and natively.

When someone drops a YouTube link or uploads a video, the system doesn't just read the URL or file name. It actively downloads the media using yt_dlp, extracts evenly spaced visual frames using OpenCV, and transcribes the audio track using Mistral's transcription API. This combined audiovisual data is injected straight into the conversation context.

The assistant is also fully equipped to interact with the real world. It can run Python scripts (not in an isolated environment yet, but it can be solved using docker) to solve math equations or analyze data, browse the web for up-to-date context, generate voice messages via ElevenLabs, and integrate with the Google Calendar API to manage your team's schedule. Everything RokuNana learns is indexed into a custom RAG (Retrieval-Augmented Generation) memory pipeline, allowing it to naturally recall facts about users and past conversations over time (see a bit lower).

Autonomous Memory & Prefill Injection (RAG)

Most conversational bots handle memory by simply stuffing past chat logs into the system prompt until they run out of tokens, leading to high latency and context dilution. RokuNana takes a much more deliberate approach by building an internal, semantic knowledge base that it governs itself.

Because RokuNana relies on a dynamic single-pass JSON schema, the model is continuously evaluating the conversation for new information. We built an unknown_fact field directly into its thought process. While the bot is generating its response, if it realizes a user just shared a new preference, trait, or context it didn't previously know, it organically extracts that detail into the JSON stream. The parser catches this in real-time and silently commits it to a ChromaDB vector store using Mistral's embedding models. There is no separate background agent summarizing logs—RokuNana decides what is worth remembering exactly when it learns it.

The way this memory is retrieved and applied is equally seamless. When new messages arrive in the chat, main.py queries the vector database to pull up to four highly relevant past facts. But rather than pasting these facts into the system prompt where the model might ignore them, we use a technique called Assistant Prefilling.

Inside core.py, the retrieved memories are injected as the assistant's own starting output (e.g., forcing the generation to begin with (Temporal memory: User x needs their code in Python 3.10)). By prefilling the beginning of the model's response sequence with this data, we force the LLM to structurally acknowledge the memory right before it constructs its MessageSchema. This guarantees the model factors its past learnings into its current social context and tool selection, resulting in a persistent, hallucination-free memory that scales indefinitely.

Setup Instructions

Clone the repository and navigate to the project directory.

git clone https://github.com/neoluigi4123/RokuNana-s-Project.git
cd RokuNana-s-Project

Create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install the required dependencies:
```
 pip install -r requirements.txt
```
Set up Google Calendar API credentials:
- Go to theGoogle Cloud Console.
- Create a new project.

-When your project is created, configure Google Auth Platform.

-Then go to Credentials under APIs & Services.

-Create an OAuth cliend ID.

-Download the JSON file.

-Rename the file to credentials.json.

-Put the credentials.json file in the local_data/ folder in the root of RokuNana-s-Project or create it if not existing.

-Return to the Google Cloud Console.

-Go to View all products and search for google calendar API in the top search bar.

-Enable the Google Calendar API.

-Finally, go to OAuth consent screen under APIs & Services.

-And add a test user with your e-mail adress.

Discord:

You also require to setup a discord bot in the discord dev portal and get its token.

Once you got it, you can head over the config.py and specify the name of your discord bot in the SYSTEM_PROMPT.

Run the main application:
```
python main.py
```

Keys and configuration

Create a .env file in the root of the project and add the following variables with your own values:

MISTRAL_API_KEY=PUT_YOUR_ACTUAL_MISTRAL_API_KEY_HERE
DISCORD_BOT_TOKEN=PUT_YOUR_ACTUAL_DISCORD_BOT_TOKEN_HERE
ELEVENLABS_API_KEY=PUT_YOUR_ACTUAL_ELEVENLABS_API_KEY_HERE

Since this project was part of an hackaton, it will not be updated anymore. If you're having any issue with the setup or the script itself, you can contact the devs on discord (neo_luigi). note that this script works best with python 1.12+ and may fails to run at 3.11 or less.

Made with Love, Passion and LOT of Fun.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
core.py		core.py
data.csv		data.csv
elevenlabs_module.py		elevenlabs_module.py
financial_data.csv		financial_data.csv
google_calendar_tools.py		google_calendar_tools.py
load_file.py		load_file.py
main.py		main.py
rag_embedding.py		rag_embedding.py
requirements.txt		requirements.txt
scripting.py		scripting.py
tools.py		tools.py
voice_utils.py		voice_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RokuNana's Project

Table of Contents

Project Presentation

The Architecture: Single-Pass Generation

Real-Time Stream Parsing for Authentic UX

Seamless Multimedia & Tool Chaining

Autonomous Memory & Prefill Injection (RAG)

Setup Instructions

Keys and configuration

Made with Love, Passion and LOT of Fun.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RokuNana's Project

Table of Contents

Project Presentation

The Architecture: Single-Pass Generation

Real-Time Stream Parsing for Authentic UX

Seamless Multimedia & Tool Chaining

Autonomous Memory & Prefill Injection (RAG)

Setup Instructions

Keys and configuration

Made with Love, Passion and LOT of Fun.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages