Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Commit 92a9611

Browse files
committed
FEATURE: add context and llm controls to researcher, fix username filter
Adds context length controls to researcher (max tokens per post and batch) Allow picking LLM for researcher Fix bug where unicode usernames were not working Fix documentation of OR logic
1 parent 4f980d5 commit 92a9611

File tree

5 files changed

+129
-30
lines changed

5 files changed

+129
-30
lines changed

config/locales/server.en.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,15 @@ en:
344344
searching: "Searching for: '%{query}'"
345345
tool_options:
346346
researcher:
347+
researcher_llm:
348+
name: "LLM"
349+
description: "Language model to use for research (default to current persona's LLM)"
350+
max_tokens_per_batch:
351+
name: "Maximum tokens per batch"
352+
description: "Maximum number of tokens to use for each batch in the research"
353+
max_tokens_per_post:
354+
name: "Maximum tokens per post"
355+
description: "Maximum number of tokens to use for each post in the research"
347356
max_results:
348357
name: "Maximum number of results"
349358
description: "Maximum number of results to include in a filter"

lib/personas/tools/researcher.rb

Lines changed: 52 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -31,26 +31,28 @@ def signature
3131

3232
def filter_description
3333
<<~TEXT
34-
Filter string to target specific content.
35-
- Supports user (@username)
36-
- post_type:first - only includes first posts in topics
37-
- post_type:reply - only replies in topics
38-
- date ranges (after:YYYY-MM-DD, before:YYYY-MM-DD for posts; topic_after:YYYY-MM-DD, topic_before:YYYY-MM-DD for topics)
39-
- categories (category:category1,category2 or categories:category1,category2)
40-
- tags (tag:tag1,tag2 or tags:tag1,tag2)
41-
- groups (group:group1,group2 or groups:group1,group2)
42-
- status (status:open, status:closed, status:archived, status:noreplies, status:single_user)
43-
- keywords (keywords:keyword1,keyword2) - searches for specific words within post content using full-text search
44-
- topic_keywords (topic_keywords:keyword1,keyword2) - searches for keywords within topics, returns all posts from matching topics
45-
- topics (topic:topic_id1,topic_id2 or topics:topic_id1,topic_id2) - target specific topics by ID
46-
- max_results (max_results:10) - limits the maximum number of results returned (optional)
47-
- order (order:latest, order:oldest, order:latest_topic, order:oldest_topic, order:likes) - controls result ordering (optional, defaults to latest posts)
48-
49-
Multiple filters can be combined with spaces for AND logic. Example: '@sam after:2023-01-01 tag:feature'
50-
51-
Use OR to combine filter segments for inclusive logic.
52-
Example: 'category:feature,bug OR tag:feature-tag' - includes posts in feature OR bug categories, OR posts with feature-tag tag
53-
Example: '@sam category:bug' - includes posts by @sam AND in bug category
34+
Filter string to target specific content. Space-separated filters use AND logic, OR creates separate filter groups.
35+
36+
**Filters:**
37+
- username:user1 or usernames:user1,user2 - posts by specific users
38+
- group:group1 or groups:group1,group2 - posts by users in specific groups
39+
- post_type:first|reply - first posts only or replies only
40+
- keywords:word1,word2 - full-text search in post content
41+
- topic_keywords:word1,word2 - full-text search in topics (returns all posts from matching topics)
42+
- topic:123 or topics:123,456 - specific topics by ID
43+
- category:name1 or categories:name1,name2 - posts in categories (by name/slug)
44+
- tag:tag1 or tags:tag1,tag2 - posts in topics with tags
45+
- after:YYYY-MM-DD, before:YYYY-MM-DD - filter by post creation date
46+
- topic_after:YYYY-MM-DD, topic_before:YYYY-MM-DD - filter by topic creation date
47+
- status:open|closed|archived|noreplies|single_user - topic status filters
48+
- max_results:N - limit results (per OR group)
49+
- order:latest|oldest|latest_topic|oldest_topic|likes - sort order
50+
51+
**OR Logic:** Each OR group processes independently - filters don't cross boundaries.
52+
53+
Examples:
54+
- 'username:sam after:2023-01-01' - sam's posts after date
55+
- 'max_results:50 category:bugs OR tag:urgent' - (≤50 bug posts) OR (all urgent posts)
5456
TEXT
5557
end
5658

@@ -60,9 +62,11 @@ def name
6062

6163
def accepted_options
6264
[
65+
option(:researcher_llm, type: :llm),
6366
option(:max_results, type: :integer),
6467
option(:include_private, type: :boolean),
6568
option(:max_tokens_per_post, type: :integer),
69+
option(:max_tokens_per_batch, type: :integer),
6670
]
6771
end
6872
end
@@ -134,17 +138,32 @@ def description_args
134138
protected
135139

136140
MIN_TOKENS_FOR_RESEARCH = 8000
141+
MIN_TOKENS_FOR_POST = 50
142+
137143
def process_filter(filter, goals, post, &blk)
138-
if llm.max_prompt_tokens < MIN_TOKENS_FOR_RESEARCH
144+
if researcher_llm.max_prompt_tokens < MIN_TOKENS_FOR_RESEARCH
139145
raise ArgumentError,
140146
"LLM max tokens too low for research. Minimum is #{MIN_TOKENS_FOR_RESEARCH}."
141147
end
148+
149+
max_tokens_per_batch = options[:max_tokens_per_batch].to_i
150+
if max_tokens_per_batch <= MIN_TOKENS_FOR_RESEARCH
151+
max_tokens_per_batch = researcher_llm.max_prompt_tokens - 2000
152+
end
153+
154+
max_tokens_per_post = options[:max_tokens_per_post]
155+
if max_tokens_per_post.nil?
156+
max_tokens_per_post = 2000
157+
elsif max_tokens_per_post < MIN_TOKENS_FOR_POST
158+
max_tokens_per_post = MIN_TOKENS_FOR_POST
159+
end
160+
142161
formatter =
143162
DiscourseAi::Utils::Research::LlmFormatter.new(
144163
filter,
145-
max_tokens_per_batch: llm.max_prompt_tokens - 2000,
146-
tokenizer: llm.tokenizer,
147-
max_tokens_per_post: options[:max_tokens_per_post] || 2000,
164+
max_tokens_per_batch: max_tokens_per_batch,
165+
tokenizer: researcher_llm.tokenizer,
166+
max_tokens_per_post: max_tokens_per_post,
148167
)
149168

150169
results = []
@@ -164,6 +183,14 @@ def process_filter(filter, goals, post, &blk)
164183
end
165184
end
166185

186+
def researcher_llm
187+
@researcher_llm ||=
188+
(
189+
options[:researcher_llm].present? &&
190+
LlmModel.find_by(id: options[:researcher_llm].to_i)&.to_llm
191+
) || self.llm
192+
end
193+
167194
def run_inference(chunk_text, goals, post, &blk)
168195
return if context.cancel_manager&.cancelled?
169196

@@ -179,7 +206,7 @@ def run_inference(chunk_text, goals, post, &blk)
179206
)
180207

181208
results = []
182-
llm.generate(
209+
researcher_llm.generate(
183210
prompt,
184211
user: post.user,
185212
feature_name: context.feature_name,

lib/utils/research/filter.rb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,12 +153,12 @@ def self.word_to_date(str)
153153
end
154154
end
155155

156-
register_filter(/\A\@(\w+)\z/i) do |relation, username, filter|
157-
user = User.find_by(username_lower: username.downcase)
158-
if user
159-
relation.where("posts.user_id = ?", user.id)
156+
register_filter(/\Ausernames?:(.+)\z/i) do |relation, username, filter|
157+
user_ids = User.where(username_lower: username.split(",").map(&:downcase)).pluck(:id)
158+
if user_ids.empty?
159+
relation.where("1 = 0")
160160
else
161-
relation.where("1 = 0") # No results if user doesn't exist
161+
relation.where("posts.user_id IN (?)", user_ids)
162162
end
163163
end
164164

spec/lib/personas/tools/researcher_spec.rb

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,54 @@
2121

2222
before { SiteSetting.ai_bot_enabled = true }
2323

24+
it "uses custom researcher_llm and applies token limits correctly" do
25+
# Create a second LLM model to test the researcher_llm option
26+
secondary_llm_model = Fabricate(:llm_model, name: "secondary_model")
27+
28+
# Create test content with long text to test token truncation
29+
topic = Fabricate(:topic, category: category, tags: [tag_research])
30+
long_content = "zz " * 100 # This will exceed our token limit
31+
_test_post =
32+
Fabricate(:post, topic: topic, raw: long_content, user: user, skip_validation: true)
33+
34+
prompts = nil
35+
responses = [["Research completed"]]
36+
researcher = nil
37+
38+
DiscourseAi::Completions::Llm.with_prepared_responses(
39+
responses,
40+
llm: secondary_llm_model,
41+
) do |_, _, _prompts|
42+
researcher =
43+
described_class.new(
44+
{ filter: "category:research-category", goals: "analyze test content", dry_run: false },
45+
persona_options: {
46+
"researcher_llm" => secondary_llm_model.id,
47+
"max_tokens_per_post" => 50, # Very small to force truncation
48+
"max_tokens_per_batch" => 8000,
49+
},
50+
bot_user: bot_user,
51+
llm: nil,
52+
context: DiscourseAi::Personas::BotContext.new(user: user, post: post),
53+
)
54+
55+
results = researcher.invoke(&progress_blk)
56+
57+
expect(results[:dry_run]).to eq(false)
58+
expect(results[:results]).to be_present
59+
60+
prompts = _prompts
61+
end
62+
63+
expect(prompts).to be_present
64+
65+
user_message = prompts.first.messages.find { |m| m[:type] == :user }
66+
expect(user_message[:content]).to be_present
67+
68+
# count how many times the the "zz " appears in the content (a bit of token magic, we lose a couple cause we redact)
69+
expect(user_message[:content].scan("zz ").count).to eq(48)
70+
end
71+
2472
describe "#invoke" do
2573
it "can correctly filter to a topic id" do
2674
researcher =

spec/lib/utils/research/filter_spec.rb

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,21 @@
144144
end
145145
end
146146

147+
describe "can find posts by users even with unicode usernames" do
148+
before { SiteSetting.unicode_usernames = true }
149+
let!(:unicode_user) { Fabricate(:user, username: "aאb") }
150+
151+
it "can filter by unicode usernames" do
152+
post = Fabricate(:post, user: unicode_user, topic: feature_topic)
153+
filter = described_class.new("username:aאb")
154+
expect(filter.search.pluck(:id)).to contain_exactly(post.id)
155+
156+
filter = described_class.new("usernames:aאb,#{user.username}")
157+
posts_ids = Post.where(user_id: [unicode_user.id, user.id]).pluck(:id)
158+
expect(filter.search.pluck(:id)).to contain_exactly(*posts_ids)
159+
end
160+
end
161+
147162
describe "category filtering" do
148163
it "correctly filters posts by categories" do
149164
filter = described_class.new("category:Announcements")

0 commit comments

Comments
 (0)