Implement Memory Storage for Improving Follow Up Suggestions

### What specific problem does this solve?

## User Story
As an engineer using roocode, I want the tool to learn from my decisions in response to its questions, so that it create a personalized experience that resolves ambiguities based on my established preferences, significantly accelerating problem-solving, and so that over time, this builds the trust required for me to confidently enable auto-approval, knowing the tool's actions will align with my own.

## User Impact
This feature directly addresses a primary barrier to full automation: lack of trust. Many users who would benefit from auto-approval hesitate to enable it because they can't predict the tool's decisions.

By learning from user feedback, roocode will evolve from a generic tool into a personalized assistant. This builds the necessary confidence for users to enable auto-approval for a wider range of actions. The result is a more autonomous and reliable workflow, where roocode can handle more complex tasks with less supervision, ensuring the final output more closely matches the user's intent from the start.

### Additional context (optional)

_No response_

### Roo Code Task Links (Optional)

https://app.roocode.com/share/12c52b25-106a-4ceb-a868-43b715e2aa8f

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [x] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)

Given we have an integration already into a vector database (qdrant) a lot of the heavy lifting has been done
## High Level Plan
to make this focussed, I have detailed a high level plan that I plan to execute for implementing this feature.

Each step will be a single pull request where applicable.
1. qdrant client update to dynamically accept different collection types (as an enum or equilivant). Dependencies will be updated too. My plan is to scope this to just repo based. With the idea we still keep the constructor the same, but for the other functions in the client a mandatory "collectionType" param needs to be passed
2. store the memories via `askFollowupQuestionTool.ts`, including a setting which is disabled by default which determines if it goes the code path to store a memory
3. Simple approach: add checkbox under experimental features. Medium Complexity: reuse the codebase indexing UI element with tabs for settings on there. Most Complexity: add new setting section in settings for vector database integration to setup the db, and have the settings for both codebase indexing and memory storage there. The existing code base index icon can stay for starting / clearing indexes, etc. 
4. create two new tools called `create-followup-question` and `generate-followup-question-suggestions` and update everything required, this should work as before. Where  `generate-followup-question-suggestions` will still handle structuring the output for the React component & storing the decision in the qdrant store. These tools will only get registered if the setting is enabled.
5. update `create-followup-question` to retrieve from the vector database for similar questions, and what related answers are, and it will be fed it to the `generate-followup-question-suggestions` tool so that the LLM can know what the user might prefer as potential solutions when generating solutions.
6. Beta release
7. General release (open to moving this if we want to address memory degradation issues first)

## Future Plan
I would like to keep developing this as concerns of memory degradation have been brought up in a similar [feature request](https://github.com/RooCodeInc/Roo-Code/issues/3489). Also I would like to expand the functionalities of this feature so that it can work better over time.

### Non Prioritize List for Future Implementation
- implement metrics on usefulness of the feature to understand how to fine tune and modify the retrieval, prompt, and other mechanisms 
- enhanced memory retrieval & storage; utility scoring, prevent memory degredation, etc...
- global decision memory; this can be nuanced, but if we do this, we'd need to store additional metadata about the repo such as language, framework, etc in the embedded data, so only contextual data 
- "learning" mode which asks the users much more frequently (maybe all the time?) to learn their behaviour. This can also be used to reset the stored memory
- modify the suggestions prompt so it gives many more options when in learning mode
- when executing a task which has a similar question stored, then look at what the preferred user behaviour is
- store memories based on tool output for more efficient tool usage
- support different DISTANCE_METRICs for retrieval
- support a shared db across developers


Below is a plan with a bit more details and specifics as generated by roo
[decision_memory_implementation_plan.md](https://github.com/user-attachments/files/21550307/decision_memory_implementation_plan.md)

### How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

```
Given a question needs to be asked to the user
And this feature is enabled
When a question is asked to the user
Then propose answers which will be closer to what the user would have answered based on the question

Given I want to enable question suggestion memory
When I go to the roo code UI
Then I can enable this feature

Given a question needs to be asked to the user
And this feature is disabled
When a question is asked to the user
Then propose suggested answers in the same way as it was before
```

### Technical considerations (REQUIRED if contributing, optional otherwise)

see document

[decision_memory_technical_considerations.md](https://github.com/user-attachments/files/21550330/decision_memory_technical_considerations.md)

### Trade-offs and risks (REQUIRED if contributing, optional otherwise)

## Risk 1: Question tool regression
I believe there is a risk that the question tool won't work well anymore since in my propose it is a two step process whereas before it was a 1 step process. Which can be mitigated with astute prompts and testing.

### Alternative Approach 1: Tool calls LLM
As an alternative, we can have the ask_follow_up_question_tool call an LLM to generate the suggestions where the related prompt would be changed to only generate a question. This would reduce costs for doing the suggestions by for example, only sending the last 10 messages for example to generate the suggestions.

I am unsure about this alternative because of how tools should behave I imagine they don't call LLMs, but unless this is okay, this would be my preferred approach

### Alternative Approach 2: Context-Aware Prompting Pattern
This solution would have two tool calls as well, one would be to use the search tool, and the other to call a `ask_and_suggest` tool which based on the prompt it will generate the question and suggestions. The downside here is that since we don't know the question yet it will be harder to find related answers from the user.

## Trade-Off: Questions now require two tools being used
Now that we are going to have two tools for a question, there will be more time until a response is returned. 

### Mitigation Strategy
We need to do the following:
1. continuing developing the feature to make it more reliable
2. create a setting to disable this feature and enable backwards compatibility - I've added an md file on the alternatives for this that roo came up with

[feature_toggle_architectures.md](https://github.com/user-attachments/files/21550585/feature_toggle_architectures.md)

## Risk 2: Qdrant Client Refactor
There's a risk of regression in code base indexing.

### Mitigation Strategy
To mitigate this, we can ensure that all tests run successfully, and ensure live testing works as well. 

## Risk 3: Long-term memory systems risk accumulating outdated or incorrect assumptions, leading to degraded accuracy over time
Given that this risk only happens over time, I think we can take a phased approach enabling memory first, then solving this problem. I am open to this being a blocker for general release though.

### Options
1. implement a time decay by storing a "freshness score"
2. periodic revalidation if a memory is older than some time period so the user can confirm if that assumption is still correct, this can be a setting
3. create a memory management UI interface so users can modify, or delete memories

## Risk 4: Performance Issues and Storage Bloat
I am unfamiliar with this problem space, but if we see this to be an issue I can look into the options in more detail

### Options
1. Implement Asynchronous and Batch Indexing
2. Tune Approximate Nearest Neighbor (ANN) Parameters
3. Use Dimensionality Reduction and Vector Quantization


## Problem 1: UI Design
I am unsure about how to implement this, so I have two options
1. add tabs to the codebase index window
2. create a settings tab in the setting section for configuring code base indexing & question memory & vector DB setup, while keeping the existing code base index icon / window for index status & starting / clearing the index

Feedback on this would be appreciated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Memory Storage for Improving Follow Up Suggestions #6553

What specific problem does this solve?

User Story

User Impact

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

High Level Plan

Future Plan

Non Prioritize List for Future Implementation

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Risk 1: Question tool regression

Alternative Approach 1: Tool calls LLM

Alternative Approach 2: Context-Aware Prompting Pattern

Trade-Off: Questions now require two tools being used

Mitigation Strategy

Risk 2: Qdrant Client Refactor

Mitigation Strategy

Risk 3: Long-term memory systems risk accumulating outdated or incorrect assumptions, leading to degraded accuracy over time

Options

Risk 4: Performance Issues and Storage Bloat

Options

Problem 1: UI Design

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Memory Storage for Improving Follow Up Suggestions #6553

Description

What specific problem does this solve?

User Story

User Impact

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

High Level Plan

Future Plan

Non Prioritize List for Future Implementation

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Risk 1: Question tool regression

Alternative Approach 1: Tool calls LLM

Alternative Approach 2: Context-Aware Prompting Pattern

Trade-Off: Questions now require two tools being used

Mitigation Strategy

Risk 2: Qdrant Client Refactor

Mitigation Strategy

Risk 3: Long-term memory systems risk accumulating outdated or incorrect assumptions, leading to degraded accuracy over time

Options

Risk 4: Performance Issues and Storage Bloat

Options

Problem 1: UI Design

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions