Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2d9c72c
FEATURE: PDF support for rag pipeline
SamSaffron Feb 7, 2025
3c7dd74
OK this now sort of works, need to extract llm selector
SamSaffron Feb 7, 2025
b64511e
work in progress, eval
SamSaffron Feb 8, 2025
34b9521
lets add a case that attempts to jailbreak proofread
SamSaffron Feb 8, 2025
2032f5f
better output
SamSaffron Feb 8, 2025
d4695ec
introduce a log
SamSaffron Feb 8, 2025
4d231c3
allow regex
SamSaffron Feb 8, 2025
0875406
this is a jailbreak that intentionally breaks our prompt
SamSaffron Feb 8, 2025
18c6a80
moving evals to own repo, then we can have huge ones
SamSaffron Feb 9, 2025
ace9f94
infra for pdf evals
SamSaffron Feb 9, 2025
4d1798c
add new rag_llm_model_id which is used for ocr
SamSaffron Feb 10, 2025
fdd4a9b
move llm to id column - work in progress
SamSaffron Feb 10, 2025
2181e2a
fix various specs... a bunch left
SamSaffron Feb 10, 2025
72e9576
fix more specs
SamSaffron Feb 10, 2025
4ba0d5c
more experimental columns removed
SamSaffron Feb 10, 2025
e2f71f1
another spec fixed
SamSaffron Feb 10, 2025
938a445
fix more cases where default_llm was used
SamSaffron Feb 10, 2025
848692c
fix more specs
SamSaffron Feb 11, 2025
2f4276a
reduce mocking to make test more stable
SamSaffron Feb 11, 2025
5e9cb80
specs passing, system specs next
SamSaffron Feb 11, 2025
5736da6
tests are passing, but stuff is not working yet..
SamSaffron Feb 11, 2025
db6e28a
mostly working now, need better progress story and error handling
SamSaffron Feb 11, 2025
f5ce2db
we need more time
SamSaffron Feb 11, 2025
25c97ca
Move allowing or disallowing pdf/images to a site setting
SamSaffron Feb 12, 2025
b0a549b
fix tests
SamSaffron Feb 12, 2025
10ea742
refactor eval framework into a simpler structure
SamSaffron Feb 12, 2025
bcb7cdf
image to text support
SamSaffron Feb 12, 2025
8393dd7
PR comments
SamSaffron Feb 12, 2025
3c054e2
Update lib/utils/pdf_to_images.rb
SamSaffron Feb 12, 2025
bfa6a40
Update assets/javascripts/discourse/components/rag-options.gjs
SamSaffron Feb 12, 2025
1de56d8
Update assets/javascripts/discourse/components/ai-persona-editor.gjs
SamSaffron Feb 12, 2025
888237a
Update assets/javascripts/discourse/components/ai-persona-editor.gjs
SamSaffron Feb 12, 2025
be97405
Update config/locales/client.en.yml
SamSaffron Feb 12, 2025
c0f181f
Update config/locales/client.en.yml
SamSaffron Feb 12, 2025
b3fcf3f
address PR comments
SamSaffron Feb 12, 2025
7064d4b
structured logging with simple log viewer
SamSaffron Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ node_modules
/gems
/auto_generated
.env
evals/log
evals/cases
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,7 @@ export default class DiscourseAiToolsEditRoute extends DiscourseRoute {

controller.set("allTools", toolsModel);
controller.set("presets", toolsModel.resultSetMeta.presets);
controller.set("llms", toolsModel.resultSetMeta.llms);
controller.set("settings", toolsModel.resultSetMeta.settings);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,7 @@ export default class DiscourseAiToolsNewRoute extends DiscourseRoute {

controller.set("allTools", toolsModel);
controller.set("presets", toolsModel.resultSetMeta.presets);
controller.set("llms", toolsModel.resultSetMeta.llms);
controller.set("settings", toolsModel.resultSetMeta.settings);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,7 @@
@tools={{this.allTools}}
@model={{this.model}}
@presets={{this.presets}}
@llms={{this.llms}}
@settings={{this.settings}}
/>
</section>
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,7 @@
@tools={{this.allTools}}
@model={{this.model}}
@presets={{this.presets}}
@llms={{this.llms}}
@settings={{this.settings}}
/>
</section>
22 changes: 16 additions & 6 deletions app/controllers/discourse_ai/admin/ai_personas_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,19 @@ def index
}
end
llms =
DiscourseAi::Configuration::LlmEnumerator
.values(allowed_seeded_llms: SiteSetting.ai_bot_allowed_seeded_models)
.map { |hash| { id: hash[:value], name: hash[:name] } }
render json: { ai_personas: ai_personas, meta: { tools: tools, llms: llms } }
DiscourseAi::Configuration::LlmEnumerator.values_for_serialization(
allowed_seeded_llm_ids: SiteSetting.ai_bot_allowed_seeded_models_map,
)
render json: {
ai_personas: ai_personas,
meta: {
tools: tools,
llms: llms,
settings: {
rag_pdf_images_enabled: SiteSetting.ai_rag_pdf_images_enabled,
},
},
}
end

def new
Expand Down Expand Up @@ -187,15 +196,16 @@ def ai_persona_params
:priority,
:top_p,
:temperature,
:default_llm,
:default_llm_id,
:user_id,
:max_context_posts,
:vision_enabled,
:vision_max_pixels,
:rag_chunk_tokens,
:rag_chunk_overlap_tokens,
:rag_conversation_chunks,
:question_consolidator_llm,
:rag_llm_model_id,
:question_consolidator_llm_id,
:allow_chat_channel_mentions,
:allow_chat_direct_messages,
:allow_topic_mentions,
Expand Down
1 change: 1 addition & 0 deletions app/controllers/discourse_ai/admin/ai_tools_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ def ai_tool_params
:summary,
:rag_chunk_tokens,
:rag_chunk_overlap_tokens,
:rag_llm_model_id,
rag_uploads: [:id],
parameters: [:name, :type, :description, :required, enum: []],
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def upload_file
def validate_extension!(filename)
extension = File.extname(filename)[1..-1] || ""
authorized_extensions = %w[txt md]
authorized_extensions.concat(%w[pdf png jpg jpeg]) if SiteSetting.ai_rag_pdf_images_enabled
if !authorized_extensions.include?(extension)
raise Discourse::InvalidParameters.new(
I18n.t(
Expand Down
35 changes: 33 additions & 2 deletions app/jobs/regular/digest_rag_upload.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def execute(args)

# Check if this is the first time we process this upload.
if fragment_ids.empty?
document = get_uploaded_file(upload)
document = get_uploaded_file(upload: upload, target: target)
return if document.nil?

RagDocumentFragment.publish_status(upload, { total: 0, indexed: 0, left: 0 })
Expand Down Expand Up @@ -163,7 +163,38 @@ def first_chunk(text, chunk_tokens:, tokenizer:, splitters: ["\n\n", "\n", ".",
[buffer, split_char]
end

def get_uploaded_file(upload)
def get_uploaded_file(upload:, target:)
if %w[pdf png jpg jpeg].include?(upload.extension) && !SiteSetting.ai_rag_pdf_images_enabled
raise Discourse::InvalidAccess.new(
"The setting ai_rag_pdf_images_enabled is false, can not index images and pdfs.",
)
end
if upload.extension == "pdf"
pages =
DiscourseAi::Utils::PdfToImages.new(
upload: upload,
user: Discourse.system_user,
).uploaded_pages

return(
DiscourseAi::Utils::ImageToText.as_fake_file(
uploads: pages,
llm_model: target.rag_llm_model,
user: Discourse.system_user,
)
)
end

if %w[png jpg jpeg].include?(upload.extension)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my above comment, we can easily call checks for extensions if we create a helper somewhere.

Suggested change
if %w[png jpg jpeg].include?(upload.extension)
if FileHelper.ai_supported_images.include?(upload.extension)

return(
DiscourseAi::Utils::ImageToText.as_fake_file(
uploads: [upload],
llm_model: target.rag_llm_model,
user: Discourse.system_user,
)
)
end

store = Discourse.store
@file ||=
if store.external?
Expand Down
88 changes: 46 additions & 42 deletions app/models/ai_persona.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# frozen_string_literal: true

class AiPersona < ActiveRecord::Base
# TODO remove this line 01-1-2025
self.ignored_columns = %i[commands allow_chat mentionable]
# TODO remove this line 01-10-2025
Copy link
Member

@keegangeorge keegangeorge Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean 02-10-2025? Although, this date has also passed now, should we update it to something further in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dates ... this is D-M-Y ... generally I try to stick with that for code comments. dates are hard, maybe for comments like this we should go with October-2025, a lot less ambiguous

self.ignored_columns = %i[default_llm question_consolidator_llm]

# places a hard limit, so per site we cache a maximum of 500 classes
MAX_PERSONAS_PER_SITE = 500
Expand All @@ -12,7 +12,7 @@ class AiPersona < ActiveRecord::Base
validates :system_prompt, presence: true, length: { maximum: 10_000_000 }
validate :system_persona_unchangeable, on: :update, if: :system
validate :chat_preconditions
validate :allowed_seeded_model, if: :default_llm
validate :allowed_seeded_model, if: :default_llm_id
validates :max_context_posts, numericality: { greater_than: 0 }, allow_nil: true
# leaves some room for growth but sets a maximum to avoid memory issues
# we may want to revisit this in the future
Expand All @@ -30,6 +30,10 @@ class AiPersona < ActiveRecord::Base
belongs_to :created_by, class_name: "User"
belongs_to :user

belongs_to :default_llm, class_name: "LlmModel"
belongs_to :question_consolidator_llm, class_name: "LlmModel"
belongs_to :rag_llm_model, class_name: "LlmModel"

has_many :upload_references, as: :target, dependent: :destroy
has_many :uploads, through: :upload_references

Expand Down Expand Up @@ -62,7 +66,7 @@ def self.persona_users(user: nil)
user_id: persona.user_id,
username: persona.user.username_lower,
allowed_group_ids: persona.allowed_group_ids,
default_llm: persona.default_llm,
default_llm_id: persona.default_llm_id,
force_default_llm: persona.force_default_llm,
allow_chat_channel_mentions: persona.allow_chat_channel_mentions,
allow_chat_direct_messages: persona.allow_chat_direct_messages,
Expand Down Expand Up @@ -157,12 +161,12 @@ def class_instance
user_id
system
mentionable
default_llm
default_llm_id
max_context_posts
vision_enabled
vision_max_pixels
rag_conversation_chunks
question_consolidator_llm
question_consolidator_llm_id
allow_chat_channel_mentions
allow_chat_direct_messages
allow_topic_mentions
Expand Down Expand Up @@ -302,7 +306,7 @@ def chat_preconditions
if (
allow_chat_channel_mentions || allow_chat_direct_messages || allow_topic_mentions ||
force_default_llm
) && !default_llm
) && !default_llm_id
errors.add(:default_llm, I18n.t("discourse_ai.ai_bot.personas.default_llm_required"))
end
end
Expand Down Expand Up @@ -332,13 +336,12 @@ def ensure_not_system
end

def allowed_seeded_model
return if default_llm.blank?
return if default_llm_id.blank?

llm = LlmModel.find_by(id: default_llm.split(":").last.to_i)
return if llm.nil?
return if !llm.seeded?
return if default_llm.nil?
return if !default_llm.seeded?

return if SiteSetting.ai_bot_allowed_seeded_models.include?(llm.id.to_s)
return if SiteSetting.ai_bot_allowed_seeded_models_map.include?(default_llm.id.to_s)

errors.add(:default_llm, I18n.t("discourse_ai.llm.configuration.invalid_seeded_model"))
end
Expand All @@ -348,36 +351,37 @@ def allowed_seeded_model
#
# Table name: ai_personas
#
# id :bigint not null, primary key
# name :string(100) not null
# description :string(2000) not null
# system_prompt :string(10000000) not null
# allowed_group_ids :integer default([]), not null, is an Array
# created_by_id :integer
# enabled :boolean default(TRUE), not null
# created_at :datetime not null
# updated_at :datetime not null
# system :boolean default(FALSE), not null
# priority :boolean default(FALSE), not null
# temperature :float
# top_p :float
# user_id :integer
# default_llm :text
# max_context_posts :integer
# vision_enabled :boolean default(FALSE), not null
# vision_max_pixels :integer default(1048576), not null
# rag_chunk_tokens :integer default(374), not null
# rag_chunk_overlap_tokens :integer default(10), not null
# rag_conversation_chunks :integer default(10), not null
# question_consolidator_llm :text
# tool_details :boolean default(TRUE), not null
# tools :json not null
# forced_tool_count :integer default(-1), not null
# allow_chat_channel_mentions :boolean default(FALSE), not null
# allow_chat_direct_messages :boolean default(FALSE), not null
# allow_topic_mentions :boolean default(FALSE), not null
# allow_personal_messages :boolean default(TRUE), not null
# force_default_llm :boolean default(FALSE), not null
# id :bigint not null, primary key
# name :string(100) not null
# description :string(2000) not null
# system_prompt :string(10000000) not null
# allowed_group_ids :integer default([]), not null, is an Array
# created_by_id :integer
# enabled :boolean default(TRUE), not null
# created_at :datetime not null
# updated_at :datetime not null
# system :boolean default(FALSE), not null
# priority :boolean default(FALSE), not null
# temperature :float
# top_p :float
# user_id :integer
# max_context_posts :integer
# vision_enabled :boolean default(FALSE), not null
# vision_max_pixels :integer default(1048576), not null
# rag_chunk_tokens :integer default(374), not null
# rag_chunk_overlap_tokens :integer default(10), not null
# rag_conversation_chunks :integer default(10), not null
# tool_details :boolean default(TRUE), not null
# tools :json not null
# forced_tool_count :integer default(-1), not null
# allow_chat_channel_mentions :boolean default(FALSE), not null
# allow_chat_direct_messages :boolean default(FALSE), not null
# allow_topic_mentions :boolean default(FALSE), not null
# allow_personal_messages :boolean default(TRUE), not null
# force_default_llm :boolean default(FALSE), not null
# rag_llm_model_id :bigint
# default_llm_id :bigint
# question_consolidator_llm_id :bigint
#
# Indexes
#
Expand Down
3 changes: 2 additions & 1 deletion app/models/ai_tool.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ class AiTool < ActiveRecord::Base
validates :script, presence: true, length: { maximum: 100_000 }
validates :created_by_id, presence: true
belongs_to :created_by, class_name: "User"
belongs_to :rag_llm_model, class_name: "LlmModel"
has_many :rag_document_fragments, dependent: :destroy, as: :target
has_many :upload_references, as: :target, dependent: :destroy
has_many :uploads, through: :upload_references
Expand Down Expand Up @@ -371,4 +372,4 @@ def self.presets
# rag_chunk_tokens :integer default(374), not null
# rag_chunk_overlap_tokens :integer default(10), not null
# tool_name :string(100) default(""), not null
#
# rag_llm_model_id :bigint
2 changes: 1 addition & 1 deletion app/models/llm_model.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def self.provider_params
end

def to_llm
DiscourseAi::Completions::Llm.proxy(identifier)
DiscourseAi::Completions::Llm.proxy(self)
end

def identifier
Expand Down
8 changes: 7 additions & 1 deletion app/serializers/ai_custom_tool_list_serializer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@ class AiCustomToolListSerializer < ApplicationSerializer
has_many :ai_tools, serializer: AiCustomToolSerializer, embed: :objects

def meta
{ presets: AiTool.presets }
{
presets: AiTool.presets,
llms: DiscourseAi::Configuration::LlmEnumerator.values_for_serialization,
settings: {
rag_pdf_images_enabled: SiteSetting.ai_rag_pdf_images_enabled,
},
}
end

def ai_tools
Expand Down
1 change: 1 addition & 0 deletions app/serializers/ai_custom_tool_serializer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ class AiCustomToolSerializer < ApplicationSerializer
:script,
:rag_chunk_tokens,
:rag_chunk_overlap_tokens,
:rag_llm_model_id,
:created_by_id,
:created_at,
:updated_at
Expand Down
5 changes: 3 additions & 2 deletions app/serializers/localized_ai_persona_serializer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,16 @@ class LocalizedAiPersonaSerializer < ApplicationSerializer
:allowed_group_ids,
:temperature,
:top_p,
:default_llm,
:default_llm_id,
:user_id,
:max_context_posts,
:vision_enabled,
:vision_max_pixels,
:rag_chunk_tokens,
:rag_chunk_overlap_tokens,
:rag_conversation_chunks,
:question_consolidator_llm,
:rag_llm_model_id,
:question_consolidator_llm_id,
:tool_details,
:forced_tool_count,
:allow_chat_channel_mentions,
Expand Down
Loading