Skip to content

Conversation

@everettbu
Copy link
Contributor

Test 4

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive embedding system for Discourse that allows external websites to embed forum discussions and automatically import content from RSS/ATOM feeds. The changes introduce several key components:

Core Embedding Infrastructure:

  • New TopicEmbed model handles importing external content via URLs using the ruby-readability gem for content parsing and Nokogiri for URL absolutization
  • EmbedController provides the main endpoint (/embed/best) for external sites to retrieve embedded content with host validation and iframe support
  • TopicRetriever class manages URL validation, throttling (60-second cooldown), and delegates to TopicEmbed for content fetching
  • New database tables (topic_embeds) track relationships between external URLs and Discourse topics

Content Processing Enhancements:

  • Extended Post model with a cook_methods enum supporting both regular markdown and raw HTML processing
  • Modified PostCreator and PostRevisor to handle custom cook methods and optional validation bypassing
  • New scheduled job PollFeed automatically imports RSS/ATOM feeds hourly using the simple-rss gem

Frontend Integration:

  • JavaScript embed widget (embed.js) for third-party sites to load Discourse content in iframes
  • Responsive embed layout with postMessage communication for dynamic resizing
  • Loading states and "best posts" view for embedded content display
  • Dedicated CSS styling optimized for embedded contexts

Configuration & Localization:

  • New site settings section for embedding configuration (embeddable hosts, feed polling, embed user)
  • Comprehensive localization strings for both client and server-side embed functionality
  • Updated Disqus import task to leverage the new embed system

The system follows a security model where only configured hosts can embed content, with referer validation and throttling to prevent abuse. Background jobs handle potentially slow content retrieval operations asynchronously.

Confidence score: 1/5

  • This PR contains multiple critical security vulnerabilities that make it unsafe to merge in its current state
  • The implementation uses deprecated and unsafe methods like open() for HTTP requests, has weak origin validation in JavaScript, potential XSS vulnerabilities from unescaped content, and SSRF attack vectors
  • Files requiring immediate attention: app/models/topic_embed.rb, app/assets/javascripts/embed.js, app/controllers/embed_controller.rb, app/jobs/scheduled/poll_feed.rb, app/views/layouts/embed.html.erb

27 files reviewed, 28 comments

Edit Code Review Bot Settings | Greptile

class CreateTopTopics < ActiveRecord::Migration
def change
create_table :top_topics do |t|
create_table :top_topics, force: true do |t|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Adding force: true makes this migration destructive - it will drop and recreate the table if it exists, potentially causing data loss in production

Suggested change
create_table :top_topics, force: true do |t|
create_table :top_topics do |t|

<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Unusual end if syntax - should be just end

Suggested change
<%- end if %>
<%- end %>

require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Using open() is deprecated and creates security vulnerabilities (command injection, SSRF). Use Net::HTTP, HTTParty, or similar safe HTTP libraries instead.

Suggested change
doc = Readability::Document.new(open(url).read,
doc = Readability::Document.new(Net::HTTP.get_response(URI(url)).body,

@@ -0,0 +1,40 @@
require 'spec_helper'
require_dependency 'jobs/regular/process_post'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Incorrect require_dependency path - should be 'jobs/scheduled/poll_feed' not 'jobs/regular/process_post'


# Import an article from a source (RSS/Atom/Other)
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: URL validation is insufficient - allows access to internal services (localhost, 192.168.x.x, etc). Add proper SSRF protection.


def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Host validation can be bypassed if request.referer is nil - the || '' fallback makes URI('').host return nil, which could equal SiteSetting.embeddable_host if it's also nil


function postMessageReceived(e) {
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Security flaw - indexOf allows subdomain attacks. Use exact origin matching: e.origin !== discourseUrl.replace(/\/$/, '')

Suggested change
if (discourseUrl.indexOf(e.origin) === -1) { return; }
var expectedOrigin = discourseUrl.replace(/\/$/, '').replace(/^(https?:\/\/[^/]+).*/, '$1');
if (e.origin !== expectedOrigin) { return; }

raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

response.headers['X-Frame-Options'] = "ALLOWALL"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Setting X-Frame-Options to 'ALLOWALL' permits embedding from any domain, potentially enabling clickjacking attacks from malicious sites

return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Security vulnerability: using open without URL validation allows potential SSRF attacks and local file access

Suggested change
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
require 'net/http'
require 'uri'
uri = URI.parse(SiteSetting.feed_polling_url)
raise ArgumentError, "Invalid URL scheme" unless %w[http https].include?(uri.scheme)
response = Net::HTTP.get_response(uri)
raise "HTTP error: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
rss = SimpleRSS.parse response.body


user = nil
if args[:user_id]
user = User.where(id: args[:user_id]).first
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: use find_by(id: args[:user_id]) instead of where().first for cleaner code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants