Skip to content

Commit e88cf47

Browse files
committed
Random via cached ID
The easiest way to work with random values is to call `order("RANDOM()")` but this is very bad for database performance. There's hacks to get faster lookups but they all come with caveats. Mostly in terms of guarantees, either they'll return no data, or less than the data you want. You've got to retry queries, but if you do that in an unbounded way then you might end up an an infinite loop. This approach attempts to give the maximum flexibility of full order("random()") but with non-awful performance. The cost here is that the values persist for the given time frame. Also since we're still using `order("RANDOM()")` the first query is still expensive. It works like this: The SQL of the query is used to generate a cache key. The query is then made and the resulting IDs are returned. Then those IDs are used to query the original model. I'm still not in love with it, but it's okay for now. I think for more complex logic there's got to be a better way to handle this kind of processing. But 🤷‍♂️
1 parent ce488d2 commit e88cf47

File tree

3 files changed

+42
-16
lines changed

3 files changed

+42
-16
lines changed

app/mailers/user_mailer.rb

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -77,17 +77,20 @@ def send_triage(user:, assignment:, repo:, create: false)
7777

7878
def poke_inactive(user:, min_issue_count:, min_subscriber_count:)
7979
return unless set_and_check_user(user)
80+
languages = @user.favorite_languages&.sort || []
8081

81-
if @user.favorite_languages.present?
82-
@repos = Repo.where(language: favorite_languages)
83-
else
84-
@repos = Repo
85-
end
82+
query = Repo
83+
query = repo.where(language: languages) if !languages.empty?
84+
query = query
85+
.where("issues_count >= ?", min_issue_count)
86+
.where("subscribers_count >= ?", min_subscriber_count)
87+
88+
@repos = Random::CachedIdQuery.new(
89+
query: query,
90+
limit: 3,
91+
expires_in: 24.hours
92+
).call
8693

87-
@repos = @repos
88-
.rand
89-
.where("issues_count >= ?", min_issue_count)
90-
.where("subscribers_count >= ?", min_subscriber_count).first(3)
9194
mail(to: @user.email, reply_to: "[email protected]", subject: "CodeTriage misses you")
9295
end
9396

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Provides an actual random result exactly as expected
2+
# but it's cached for a period of time.
3+
#
4+
# Cache keys expire after the set time but
5+
# are unbounded. Take care to not pass in arbitrary data
6+
# from users into the query.
7+
#
8+
# The raw SQL is used for a cache key so take care
9+
# to pass in values in sorted order etc.
10+
#
11+
# The first run of this is VERY expensive on a large
12+
# model and can still cause significant database load
13+
class Random::CachedIdQuery
14+
private attr_reader :limit, :query, :key, :expires_in
15+
16+
def initialize(query:, limit:, expires_in:)
17+
@query = query.order("RANDOM()")
18+
@key = "random_cached_query:#{@query.to_sql.hash}"
19+
@expires_in = expires_in
20+
@limit = limit
21+
end
22+
23+
def call
24+
ids = Rails.cache.fetch(key, expires_in: expires_in) do
25+
query.select(:id).first(limit).map(&:id)
26+
end
27+
28+
query.klass.where(id: ids)
29+
end
30+
end

app/models/repo.rb

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -131,13 +131,6 @@ def self.not_in(*ids)
131131
where("repos.id not in (?)", ids)
132132
end
133133

134-
@@max_id = nil
135-
def self.rand
136-
@@max_id = self.maximum(:id) if @@max_id.nil?
137-
138-
where("id >= ?", Random.new.rand(1..@@max_id))
139-
end
140-
141134
def self.all_languages
142135
self.select("language").group("language").order("language ASC").map(&:language).reject(&:blank?)
143136
end

0 commit comments

Comments
 (0)