Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Commit 4dffd0b

Browse files
authored
DEV: improve tool infra, improve forum researcher prompts, improve logging (#1391)
- add sleep function for tool polling with rate limits - Support base64 encoding for HTTP requests and uploads - Enhance forum researcher with cost warnings and comprehensive planning - Add cancellation support for research operations - Include feature_name parameter for bot analytics - richer research support (OR queries)
1 parent 4c0660d commit 4dffd0b

File tree

8 files changed

+460
-102
lines changed

8 files changed

+460
-102
lines changed

app/jobs/regular/create_ai_reply.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ def execute(args)
1515

1616
bot = DiscourseAi::Personas::Bot.as(bot_user, persona: persona.new)
1717

18-
DiscourseAi::AiBot::Playground.new(bot).reply_to(post)
18+
DiscourseAi::AiBot::Playground.new(bot).reply_to(post, feature_name: "bot")
1919
rescue DiscourseAi::Personas::Bot::BOT_NOT_FOUND
2020
Rails.logger.warn(
2121
"Bot not found for post #{post.id} - perhaps persona was deleted or bot was disabled",

lib/personas/forum_researcher.rb

Lines changed: 39 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -13,43 +13,45 @@ def tools
1313

1414
def system_prompt
1515
<<~PROMPT
16-
You are a helpful Discourse assistant specializing in forum research.
17-
You _understand_ and **generate** Discourse Markdown.
18-
19-
You live in the forum with the URL: {site_url}
20-
The title of your site: {site_title}
21-
The description is: {site_description}
22-
The participants in this conversation are: {participants}
23-
The date now is: {time}, much has changed since you were trained.
24-
Topic URLs are formatted as: /t/-/TOPIC_ID
25-
Post URLs are formatted as: /t/-/TOPIC_ID/POST_NUMBER
26-
27-
As a forum researcher, guide users through a structured research process:
28-
1. UNDERSTAND: First clarify the user's research goal - what insights are they seeking?
29-
2. PLAN: Design an appropriate research approach with specific filters
30-
3. TEST: Always begin with dry_run:true to gauge the scope of results
31-
4. REFINE: If results are too broad/narrow, suggest filter adjustments
32-
5. EXECUTE: Run the final analysis only when filters are well-tuned
33-
6. SUMMARIZE: Present findings with links to supporting evidence
34-
35-
BE MINDFUL: specify all research goals in one request to avoid multiple processing runs.
36-
37-
REMEMBER: Different filters serve different purposes:
38-
- Use post date filters (after/before) for analyzing specific posts
39-
- Use topic date filters (topic_after/topic_before) for analyzing entire topics
40-
- Combine user/group filters with categories/tags to find specialized contributions
41-
42-
Always ground your analysis with links to original posts on the forum.
43-
44-
Research workflow best practices:
45-
1. Start with a dry_run to gauge the scope (set dry_run:true)
46-
2. For temporal analysis, specify explicit date ranges
47-
3. For user behavior analysis, combine @username with categories or tags
48-
49-
- When formatting research results, format backing links clearly:
50-
- When it is a good fit, link to the topic with descriptive text.
51-
- When it is a good fit, link using markdown footnotes.
52-
PROMPT
16+
You are a helpful Discourse assistant specializing in forum research.
17+
You _understand_ and **generate** Discourse Markdown.
18+
19+
You live in the forum with the URL: {site_url}
20+
The title of your site: {site_title}
21+
The description is: {site_description}
22+
The participants in this conversation are: {participants}
23+
The date now is: {time}, much has changed since you were trained.
24+
Topic URLs are formatted as: /t/-/TOPIC_ID
25+
Post URLs are formatted as: /t/-/TOPIC_ID/POST_NUMBER
26+
27+
CRITICAL: Research is extremely expensive. You MUST gather ALL research goals upfront and execute them in a SINGLE request. Never run multiple research operations.
28+
29+
As a forum researcher, follow this structured process:
30+
1. UNDERSTAND: Clarify ALL research goals - what insights are they seeking?
31+
2. PLAN: Design ONE comprehensive research approach covering all objectives
32+
3. TEST: Always begin with dry_run:true to gauge the scope of results
33+
4. REFINE: If results are too broad/narrow, suggest filter adjustments (but don't re-run yet)
34+
5. EXECUTE: Run the final analysis ONCE when filters are well-tuned for all goals
35+
6. SUMMARIZE: Present findings with links to supporting evidence
36+
37+
Before any research, ask users to specify:
38+
- ALL research questions they want answered
39+
- Time periods of interest
40+
- Specific users, categories, or tags to focus on
41+
- Expected scope (broad overview vs. deep dive)
42+
43+
Research filter guidelines:
44+
- Use post date filters (after/before) for analyzing specific posts
45+
- Use topic date filters (topic_after/topic_before) for analyzing entire topics
46+
- Combine user/group filters with categories/tags to find specialized contributions
47+
48+
When formatting results:
49+
- Link to topics with descriptive text when relevant
50+
- Use markdown footnotes for supporting evidence
51+
- Always ground analysis with links to original forum posts
52+
53+
Remember: ONE research request should answer ALL questions. Plan comprehensively before executing.
54+
PROMPT
5355
end
5456
end
5557
end

lib/personas/tool_runner.rb

Lines changed: 83 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ class ToolRunner
1313
MARSHAL_STACK_DEPTH = 20
1414
MAX_HTTP_REQUESTS = 20
1515

16+
MAX_SLEEP_CALLS = 30
17+
MAX_SLEEP_DURATION_MS = 60_000
18+
1619
def initialize(parameters:, llm:, bot_user:, context: nil, tool:, timeout: nil)
1720
if context && !context.is_a?(DiscourseAi::Personas::BotContext)
1821
raise ArgumentError, "context must be a BotContext object"
@@ -28,6 +31,7 @@ def initialize(parameters:, llm:, bot_user:, context: nil, tool:, timeout: nil)
2831
@timeout = timeout || DEFAULT_TIMEOUT
2932
@running_attached_function = false
3033

34+
@sleep_calls_made = 0
3135
@http_requests_made = 0
3236
end
3337

@@ -44,6 +48,7 @@ def mini_racer_context
4448
attach_index(ctx)
4549
attach_upload(ctx)
4650
attach_chain(ctx)
51+
attach_sleep(ctx)
4752
attach_discourse(ctx)
4853
ctx.eval(framework_script)
4954
ctx
@@ -73,6 +78,9 @@ def framework_script
7378
const upload = {
7479
create: _upload_create,
7580
getUrl: _upload_get_url,
81+
getBase64: function(id, maxPixels) {
82+
return _upload_get_base64(id, maxPixels);
83+
}
7684
}
7785
7886
const chain = {
@@ -310,6 +318,33 @@ def attach_chain(mini_racer_context)
310318
mini_racer_context.attach("_chain_set_custom_raw", ->(raw) { self.custom_raw = raw })
311319
end
312320

321+
# this is useful for polling apis
322+
def attach_sleep(mini_racer_context)
323+
mini_racer_context.attach(
324+
"sleep",
325+
->(duration_ms) do
326+
@sleep_calls_made += 1
327+
if @sleep_calls_made > MAX_SLEEP_CALLS
328+
raise TooManyRequestsError.new("Tool made too many sleep calls")
329+
end
330+
331+
duration_ms = duration_ms.to_i
332+
if duration_ms > MAX_SLEEP_DURATION_MS
333+
raise ArgumentError.new(
334+
"Sleep duration cannot exceed #{MAX_SLEEP_DURATION_MS}ms (1 minute)",
335+
)
336+
end
337+
338+
raise ArgumentError.new("Sleep duration must be positive") if duration_ms <= 0
339+
340+
in_attached_function do
341+
sleep(duration_ms / 1000.0)
342+
{ slept: duration_ms }
343+
end
344+
end,
345+
)
346+
end
347+
313348
def attach_discourse(mini_racer_context)
314349
mini_racer_context.attach(
315350
"_discourse_get_post",
@@ -571,6 +606,42 @@ def attach_discourse(mini_racer_context)
571606
end
572607

573608
def attach_upload(mini_racer_context)
609+
mini_racer_context.attach(
610+
"_upload_get_base64",
611+
->(upload_id_or_url, max_pixels) do
612+
in_attached_function do
613+
return nil if upload_id_or_url.blank?
614+
615+
upload = nil
616+
617+
# Handle both upload ID and short URL
618+
if upload_id_or_url.to_s.start_with?("upload://")
619+
# Handle short URL format
620+
sha1 = Upload.sha1_from_short_url(upload_id_or_url)
621+
return nil if sha1.blank?
622+
upload = Upload.find_by(sha1: sha1)
623+
else
624+
# Handle numeric ID
625+
upload_id = upload_id_or_url.to_i
626+
return nil if upload_id <= 0
627+
upload = Upload.find_by(id: upload_id)
628+
end
629+
630+
return nil if upload.nil?
631+
632+
max_pixels = max_pixels&.to_i
633+
max_pixels = nil if max_pixels && max_pixels <= 0
634+
635+
encoded_uploads =
636+
DiscourseAi::Completions::UploadEncoder.encode(
637+
upload_ids: [upload.id],
638+
max_pixels: max_pixels || 10_000_000, # Default to 10M pixels if not specified
639+
)
640+
641+
encoded_uploads.first&.dig(:base64)
642+
end
643+
end,
644+
)
574645
mini_racer_context.attach(
575646
"_upload_get_url",
576647
->(short_url) do
@@ -629,13 +700,18 @@ def attach_http(mini_racer_context)
629700

630701
in_attached_function do
631702
headers = (options && options["headers"]) || {}
703+
base64_encode = options && options["base64Encode"]
632704

633705
result = {}
634706
DiscourseAi::Personas::Tools::Tool.send_http_request(
635707
url,
636708
headers: headers,
637709
) do |response|
638-
result[:body] = response.body
710+
if base64_encode
711+
result[:body] = Base64.strict_encode64(response.body)
712+
else
713+
result[:body] = response.body
714+
end
639715
result[:status] = response.code.to_i
640716
end
641717

@@ -658,6 +734,7 @@ def attach_http(mini_racer_context)
658734
in_attached_function do
659735
headers = (options && options["headers"]) || {}
660736
body = options && options["body"]
737+
base64_encode = options && options["base64Encode"]
661738

662739
result = {}
663740
DiscourseAi::Personas::Tools::Tool.send_http_request(
@@ -666,7 +743,11 @@ def attach_http(mini_racer_context)
666743
headers: headers,
667744
body: body,
668745
) do |response|
669-
result[:body] = response.body
746+
if base64_encode
747+
result[:body] = Base64.strict_encode64(response.body)
748+
else
749+
result[:body] = response.body
750+
end
670751
result[:status] = response.code.to_i
671752
end
672753

lib/personas/tools/researcher.rb

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,19 +33,24 @@ def filter_description
3333
<<~TEXT
3434
Filter string to target specific content.
3535
- Supports user (@username)
36+
- post_type:first - only includes first posts in topics
37+
- post_type:reply - only replies in topics
3638
- date ranges (after:YYYY-MM-DD, before:YYYY-MM-DD for posts; topic_after:YYYY-MM-DD, topic_before:YYYY-MM-DD for topics)
37-
- categories (category:category1,category2)
38-
- tags (tag:tag1,tag2)
39-
- groups (group:group1,group2).
39+
- categories (category:category1,category2 or categories:category1,category2)
40+
- tags (tag:tag1,tag2 or tags:tag1,tag2)
41+
- groups (group:group1,group2 or groups:group1,group2)
4042
- status (status:open, status:closed, status:archived, status:noreplies, status:single_user)
41-
- keywords (keywords:keyword1,keyword2) - specific words to search for in posts
42-
- max_results (max_results:10) the maximum number of results to return (optional)
43-
- order (order:latest, order:oldest, order:latest_topic, order:oldest_topic) - the order of the results (optional)
44-
- topic (topic:topic_id1,topic_id2) - add specific topics to the filter, topics will unconditionally be included
43+
- keywords (keywords:keyword1,keyword2) - searches for specific words within post content using full-text search
44+
- topic_keywords (topic_keywords:keyword1,keyword2) - searches for keywords within topics, returns all posts from matching topics
45+
- topics (topic:topic_id1,topic_id2 or topics:topic_id1,topic_id2) - target specific topics by ID
46+
- max_results (max_results:10) - limits the maximum number of results returned (optional)
47+
- order (order:latest, order:oldest, order:latest_topic, order:oldest_topic, order:likes) - controls result ordering (optional, defaults to latest posts)
4548
46-
If multiple tags or categories are specified, they are treated as OR conditions.
49+
Multiple filters can be combined with spaces for AND logic. Example: '@sam after:2023-01-01 tag:feature'
4750
48-
Multiple filters can be combined with spaces. Example: '@sam after:2023-01-01 tag:feature'
51+
Use OR to combine filter segments for inclusive logic.
52+
Example: 'category:feature,bug OR tag:feature-tag' - includes posts in feature OR bug categories, OR posts with feature-tag tag
53+
Example: '@sam category:bug' - includes posts by @sam AND in bug category
4954
TEXT
5055
end
5156

@@ -145,10 +150,23 @@ def process_filter(filter, goals, post, &blk)
145150
results = []
146151

147152
formatter.each_chunk { |chunk| results << run_inference(chunk[:text], goals, post, &blk) }
148-
{ dry_run: false, goals: goals, filter: @filter, results: results }
153+
154+
if context.cancel_manager&.cancelled?
155+
{
156+
dry_run: false,
157+
goals: goals,
158+
filter: @filter,
159+
results: "Cancelled by user",
160+
cancelled_by_user: true,
161+
}
162+
else
163+
{ dry_run: false, goals: goals, filter: @filter, results: results }
164+
end
149165
end
150166

151167
def run_inference(chunk_text, goals, post, &blk)
168+
return if context.cancel_manager&.cancelled?
169+
152170
system_prompt = goal_system_prompt(goals)
153171
user_prompt = goal_user_prompt(goals, chunk_text)
154172

0 commit comments

Comments
 (0)