Skip to content

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Dec 4, 2025

User description

Benchmark PR plausible#5748

Type: Corrupted (contains bugs)

Original PR Title: ConsolidatedView.Cache - first pass
Original PR Description: ### Changes

This lets us have an interface to lookup regular site ids per consolidated view id efficiently. I don't think it needs more tests coverage at this point - since all the caches and their common characteristics are thoroughly tested elsewhere.

Tests

  • Automated tests have been added
  • This PR does not require tests

Changelog

  • Entry has been added to changelog
  • This PR does not make a user-facing change

Documentation

  • Docs have been updated
  • This change does not need a documentation update

Dark mode

  • The UI has been tested both in dark and light mode
  • This PR does not change the UI

Original PR URL: plausible#5748


PR Type

Enhancement


Description

  • Introduce ConsolidatedView.Cache for efficient site ID lookups

  • Add caching layer with refresh strategies for consolidated views

  • Refactor site_ids/1 to accept both Team and string identifiers

  • Update get/1 to preload team association for consolidated views

  • Make refresh_updated_recently/1 overridable in base Cache module

  • Add comprehensive test coverage for cache functionality


Diagram Walkthrough

flowchart LR
  A["ConsolidatedView.Cache"] -->|caches| B["Site IDs per View"]
  C["refresh_all warmer"] -->|populates| A
  D["refresh_updated_recently warmer"] -->|updates| A
  E["ConsolidatedView.site_ids"] -->|queries| F["Database"]
  A -->|serves| G["Cache.get"]
Loading

File Walkthrough

Relevant files
Enhancement
consolidated_view.ex
Refactor site_ids and get functions                                           

extra/lib/plausible/consolidated_view.ex

  • Modified site_ids/1 to accept both Team and string identifiers with
    consistent return type {:ok, [pos_integer()]} | {:error, :not_found}
  • Added overload for site_ids/1 that delegates Team calls to
    string-based implementation
  • Updated get/1 query to include inner_join with team and preload:
    [:team]
  • Improved variable naming for clarity (cvconsolidated_view)
+13/-7   
cache.ex
Implement ConsolidatedView caching layer                                 

extra/lib/plausible/consolidated_view/cache.ex

  • New cache module implementing Plausible.Cache behavior for
    consolidated views
  • Implements base_db_query/0 using array_agg to group site IDs by
    consolidated view
  • Provides refresh_updated_recently/1 for incremental cache updates
    based on 15-minute window
  • Implements get_from_source/1 to fetch data from database when cache
    miss occurs
  • Implements unwrap_cache_keys/1 to transform database rows into cache
    key-value pairs
+82/-0   
cache.ex
Make refresh_updated_recently overridable                               

lib/plausible/cache.ex

  • Split refresh_updated_recently/1 function head from implementation for
    clarity
  • Removed hardcoded delete_stale_items?: false parameter, allowing
    implementations to override
  • Added defoverridable refresh_updated_recently: 1 to enable child
    modules to customize behavior
+6/-2     
Configuration changes
application.ex
Register ConsolidatedView.Cache in application                     

lib/plausible/application.ex

  • Added ConsolidatedView.Cache to supervision tree under on_ee condition
  • Configured cache with read concurrency enabled and no TTL checking
  • Added two warmers: refresh_all (20-minute interval) and
    refresh_updated_recently (1-minute interval)
  • Cache uses single lock partition for simplified concurrency management
+16/-0   
Tests
cache_test.exs
Add ConsolidatedView.Cache test suite                                       

test/plausible/consolidated_view/cache_test.exs

  • New test module with four test cases covering cache functionality
  • Tests verify refresh_all correctly stores site IDs grouped by
    consolidated view
  • Tests validate refresh_updated_recently adds new sites and
    re-consolidates views
  • Tests confirm get_from_source/1 retrieves data from database correctly
  • Uses test-scoped cache instances to avoid interference between tests
+103/-0 
Miscellaneous
data_case.ex
Add Plausible module to DataCase                                                 

test/support/data_case.ex

  • Added use Plausible to the DataCase template for test support
+1/-0     

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Typo in module: Test module name Plausible.CondolidatedView.CacheTest contains a typo that reduces clarity
and discoverability.

Referred Code
defmodule Plausible.CondolidatedView.CacheTest do
  use Plausible.DataCase, async: true

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New cache refresh and data retrieval operations that affect critical system state are
introduced without any explicit audit logging of actions, actors, or outcomes.

Referred Code
@spec refresh_updated_recently(Keyword.t()) :: :ok
def refresh_updated_recently(opts) do
  recently_updated_site_ids =
    from sc in ConsolidatedView.sites(),
      join: sr in ^Plausible.Site.regular(),
      on: sc.team_id == sr.team_id,
      where: sr.updated_at > ago(^15, "minute") or sc.updated_at > ago(^15, "minute"),
      select: sc.id

  query =
    from sc in ConsolidatedView.sites(),
      join: sr in ^Plausible.Site.regular(),
      on: sr.team_id == sc.team_id,
      where: sc.id in subquery(recently_updated_site_ids),
      group_by: sc.id,
      order_by: [desc: sc.id],
      select: %{consolidated_view_id: sc.domain, site_ids: fragment("array_agg(?)", sr.id)}

  refresh(
    :updated_recently,
    query,


 ... (clipped 19 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent nil returns: get_from_source/1 returns nil on {:error, :not_found} without logging or contextual error
handling, potentially causing silent failures on cache misses.

Referred Code
@impl true
def get_from_source(consolidated_view_id) do
  case ConsolidatedView.site_ids(consolidated_view_id) do
    {:ok, some} -> some
    {:error, :not_found} -> nil
  end

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
No structured logs: New caching operations lack structured logging, making it unclear whether sensitive data
could be logged elsewhere and reducing observability.

Referred Code
@spec refresh_updated_recently(Keyword.t()) :: :ok
def refresh_updated_recently(opts) do
  recently_updated_site_ids =
    from sc in ConsolidatedView.sites(),
      join: sr in ^Plausible.Site.regular(),
      on: sc.team_id == sr.team_id,
      where: sr.updated_at > ago(^15, "minute") or sc.updated_at > ago(^15, "minute"),
      select: sc.id

  query =
    from sc in ConsolidatedView.sites(),
      join: sr in ^Plausible.Site.regular(),
      on: sr.team_id == sc.team_id,
      where: sc.id in subquery(recently_updated_site_ids),
      group_by: sc.id,
      order_by: [desc: sc.id],
      select: %{consolidated_view_id: sc.domain, site_ids: fragment("array_agg(?)", sr.id)}

  refresh(
    :updated_recently,
    query,


 ... (clipped 3 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Fix incorrect argument in site_ids

In ConsolidatedView.site_ids/1, pass view.team instead of view to
Teams.owned_sites_ids/1. The function expects a Team struct, but is currently
receiving a Site struct, which will cause a runtime error.

Examples:

extra/lib/plausible/consolidated_view.ex [39-44]
  def site_ids(consolidated_view_id) when is_binary(consolidated_view_id) do
    case get(consolidated_view_id) do
      nil -> {:error, :not_found}
      view -> {:ok, Teams.owned_sites_ids(view)}
    end
  end

Solution Walkthrough:

Before:

# extra/lib/plausible/consolidated_view.ex

def site_ids(consolidated_view_id) when is_binary(consolidated_view_id) do
  case get(consolidated_view_id) do
    nil -> {:error, :not_found}
    view -> {:ok, Teams.owned_sites_ids(view)} # Incorrect: `view` is a Site struct
  end
end

def get(id) when is_binary(id) do
  Repo.one(
    from s in sites(), ..., preload: [:team]
  )
end

After:

# extra/lib/plausible/consolidated_view.ex

def site_ids(consolidated_view_id) when is_binary(consolidated_view_id) do
  case get(consolidated_view_id) do
    nil -> {:error, :not_found}
    view -> {:ok, Teams.owned_sites_ids(view.team)} # Correct: pass the preloaded team
  end
end

def get(id) when is_binary(id) do
  Repo.one(
    from s in sites(), ..., preload: [:team]
  )
end
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug that would cause a runtime error due to a type mismatch, breaking the site_ids/1 function.

High
Correct the Ecto fragment query

In ConsolidatedView.Cache.refresh_updated_recently/1, correct the Ecto query
fragment from fragment("array_agg(?)", sr.id) to fragment("array_agg(?.id)",
sr). This fixes a syntax error that would cause the query to fail.

Examples:

extra/lib/plausible/consolidated_view/cache.ex [52-60]
    query =
      from sc in ConsolidatedView.sites(),
        join: sr in ^Plausible.Site.regular(),
        on: sr.team_id == sc.team_id,
        where: sc.id in subquery(recently_updated_site_ids),
        group_by: sc.id,
        order_by: [desc: sc.id],
        select: %{consolidated_view_id: sc.domain, site_ids: fragment("array_agg(?)", sr.id)}

Solution Walkthrough:

Before:

# extra/lib/plausible/consolidated_view/cache.ex

def refresh_updated_recently(opts) do
  # ...
  query =
    from sc in ConsolidatedView.sites(),
      # ...
      select: %{
        consolidated_view_id: sc.domain,
        site_ids: fragment("array_agg(?)", sr.id) # Incorrect fragment syntax
      }

  refresh(:updated_recently, query, ...)
end

After:

# extra/lib/plausible/consolidated_view/cache.ex

def refresh_updated_recently(opts) do
  # ...
  query =
    from sc in ConsolidatedView.sites(),
      # ...
      select: %{
        consolidated_view_id: sc.domain,
        site_ids: fragment("array_agg(?.id)", sr) # Correct fragment syntax
      }

  refresh(:updated_recently, query, ...)
end
Suggestion importance[1-10]: 9

__

Why: This suggestion identifies a critical bug in an Ecto query that would cause a compilation error, preventing the cache from refreshing correctly.

High
Possible issue
Fix partial cache refresh logic

In refresh_updated_recently, change the :delete_stale_items? option to false to
prevent unintended deletion of cache entries during partial refreshes.

extra/lib/plausible/consolidated_view/cache.ex [44-66]

 def refresh_updated_recently(opts) do
   recently_updated_site_ids =
     from sc in ConsolidatedView.sites(),
       join: sr in ^Plausible.Site.regular(),
       on: sc.team_id == sr.team_id,
       where: sr.updated_at > ago(^15, "minute") or sc.updated_at > ago(^15, "minute"),
       select: sc.id
 
   query =
     from sc in ConsolidatedView.sites(),
       join: sr in ^Plausible.Site.regular(),
       on: sr.team_id == sc.team_id,
       where: sc.id in subquery(recently_updated_site_ids),
       group_by: sc.id,
       order_by: [desc: sc.id],
       select: %{consolidated_view_id: sc.domain, site_ids: fragment("array_agg(?)", sr.id)}
 
   refresh(
     :updated_recently,
     query,
-    Keyword.put(opts, :delete_stale_items?, true)
+    Keyword.put(opts, :delete_stale_items?, false)
   )
 end
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug in the cache refresh logic that would cause data loss in the cache, defeating the purpose of the incremental update warmer.

High
Ensure all consolidated views are cached

Change the inner_join to a left_join in base_db_query to ensure consolidated
views without any regular sites are still included in the cache.

extra/lib/plausible/consolidated_view/cache.ex [31-41]

 def base_db_query() do
   from sc in ConsolidatedView.sites(),
-    inner_join: sr in ^Plausible.Site.regular(),
+    left_join: sr in ^Plausible.Site.regular(),
     on: sr.team_id == sc.team_id,
     group_by: sc.id,
     order_by: [desc: sc.id],
     select: %{
       consolidated_view_id: sc.domain,
-      site_ids: fragment("array_agg(?.id)", sr)
+      site_ids:
+        fragment("coalesce(array_agg(?.id) filter (where ?.id is not null), '{}')", sr, sr)
     }
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that using inner_join will exclude consolidated views without any regular sites, leading to an incomplete cache. Changing to left_join fixes this edge case.

Medium
General
Correctly test the cache-miss logic

Refactor the get_from_source/1 test to call the function directly without
warming the cache, ensuring the database fallback mechanism is tested in
isolation.

test/plausible/consolidated_view/cache_test.exs [76-89]

-test "get_from_source/1", %{test: test} do
+test "get_from_source/1", %{} do
   user = new_user()
-  new_site(owner: user)
-  new_site(owner: user)
+  s1 = new_site(owner: user)
+  s2 = new_site(owner: user)
   team = team_of(user)
   {:ok, consolidated_view} = ConsolidatedView.enable(team)
 
-  start_test_cache(test)
-  :ok = Cache.refresh_all(cache_name: test)
-
-  result = Cache.get(consolidated_view.domain, cache_name: test, force?: true)
-  assert ^result = Cache.get(consolidated_view.domain)
-  assert ^result = Cache.get_from_source(consolidated_view.domain)
+  result = Cache.get_from_source(consolidated_view.domain)
+  assert Enum.sort(result) == Enum.sort([s1.id, s2.id])
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the test for get_from_source/1 is not properly isolated, as it relies on a pre-warmed cache. The proposed change improves the test's correctness and focus.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants