Skip to content

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Dec 4, 2025

User description

Benchmark PR plausible#5893

Type: Clean (correct implementation)

Original PR Title: Refactor building the Query struct
Original PR Description: ### Changes

This PR is a refactor and doesn't introduce any new behaviour.

In the upcoming dashboard migration from React to LiveView, we want to be able to create Query structs from native data structures. E.g.:

  • pass the site struct itself into the Query.build function, not the site_id
  • pass DateTime structs instead of iso8601 strings
  • pass metrics as atoms instead of strings
  • etc...

The biggest refactor in this PR is changing the current Query.build function into Query.parse_and_build which calls two separate modules (QueryParser and QueryBuilder (new) for the respective actions).

It also introduces an intermediate data structure that represents the parsed state between those two phases - ParsedQueryParams.

There was a lot of query "building" logic and validations incorporated into QueryParser. The test file (with more than 3k lines of code) got turned into QueryParseAndBuildTest, now asserting on a Query struct outcome rather than an arbitrary map.

This PR doesn't tackle the date_range input aspect yet. For now, QueryBuilder will expect a clean utc_time_range input (via the ParsedQueryParams struct). It shouldn't block this PR though. I can tackle it later. E.g.: we'll likely want to build queries like:

query1 = QueryBuilder.build(site, %{metrics: [:visitors], date_range: :month, date: ~D[2025-01-01]})
query2 = QueryBuilder.build(site, %{metrics: [:visitors], date_range: [~D[2025-01-01], ~D[2025-01-31]]})

With that, we can also migrate any existing ad-hoc Query construction (e.g. in email reports or query_24h_stats) to use this new version of Query.build.

Tests

  • Automated tests have been added

Changelog

  • This PR does not make a user-facing change

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

Original PR URL: plausible#5893


PR Type

Enhancement, Tests


Description

  • Refactored Query.build to Query.parse_and_build implementing a two-phase architecture (parsing and building) for better separation of concerns

  • Created new QueryBuilder module to handle Query struct construction with comprehensive validation logic for metrics, dimensions, filters, and special constraints

  • Created new ParsedQueryParams struct as an intermediate data structure between parsing and building phases

  • Refactored QueryParser module (relocated from Plausible.Stats.Filters) to focus solely on parsing, removing validation and query building logic

  • Updated 20+ files across the codebase to use the new Query.parse_and_build API and relocated QueryParser

  • Added comprehensive test suite (3000+ lines) for Query.parse_and_build functionality covering metrics, filters, dimensions, date ranges, pagination, and special cases

  • Reorganized test modules under Plausible.Stats.Query namespace for better structure

  • Renamed StatsAPIFilterParser to LegacyStatsAPIFilterParser to clarify legacy API support


Diagram Walkthrough

flowchart LR
  Input["Input Parameters"]
  Parser["QueryParser.parse"]
  ParsedParams["ParsedQueryParams"]
  Builder["QueryBuilder.build"]
  Query["Query Struct"]
  
  Input -- "parse" --> Parser
  Parser -- "returns" --> ParsedParams
  ParsedParams -- "build" --> Builder
  Builder -- "returns" --> Query
Loading

File Walkthrough

Relevant files
Tests
6 files
query_parse_and_build_test.exs
Comprehensive test suite for Query parse and build             

test/plausible/stats/query/query_parse_and_build_test.exs

  • Added comprehensive test suite with 3000+ lines covering
    Query.parse_and_build functionality
  • Tests validate metrics, filters, dimensions, date ranges, pagination,
    and include parameters
  • Tests cover goal preloading, revenue metrics, behavioral filters,
    segments, and special metric validations
  • Tests verify both public and internal API schemas with appropriate
    error handling
+3056/-0
query_optimizer_test.exs
Update QueryOptimizerTest module path and references         

test/plausible/stats/query/query_optimizer_test.exs

  • Renamed module from Plausible.Stats.QueryOptimizerTest to
    Plausible.Stats.Query.QueryOptimizerTest
  • Updated imports to include ParsedQueryParams alias
  • Replaced all QueryParser.default_include() calls with
    ParsedQueryParams.default_include()
  • Removed unused QueryParser alias from test describe block
+14/-16 
query_result_test.exs
Update QueryResultTest module path and function names       

test/plausible/stats/query/query_result_test.exs

  • Renamed module from Plausible.Stats.QueryResultTest to
    Plausible.Stats.Query.QueryResultTest
  • Updated all Query.build! calls to Query.parse_and_build!
+4/-4     
query_parser_test.exs
New QueryParserTest module with basic tests                           

test/plausible/stats/query/query_parser_test.exs

  • New test module for QueryParser functionality
  • Tests basic parsing validation including empty map rejection and
    invalid metric detection
+22/-0   
query_from_test.exs
Update QueryFromTest module path and cleanup                         

test/plausible/stats/query/query_from_test.exs

  • Renamed module from Plausible.Stats.QueryTest to
    Plausible.Stats.Query.QueryFromTest
  • Removed unused QueryParser alias import
+1/-2     
comparisons_test.exs
Update comparisons test function call                                       

test/plausible/stats/comparisons_test.exs

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
Refactoring
1 files
query_parser.ex
Refactor QueryParser to focus on parsing only                       

lib/plausible/stats/query_parser.ex

  • Refactored module name from Plausible.Stats.Filters.QueryParser to
    Plausible.Stats.QueryParser
  • Simplified parse function to return ParsedQueryParams struct instead
    of full query map
  • Removed query validation logic (moved to QueryBuilder)
  • Removed goal preloading and consolidated site ID logic (moved to
    builder phase)
  • Updated to use ParsedQueryParams defaults instead of local constants
+33/-339
Enhancement
17 files
send_email_report.ex
Update email report worker to use new Query API                   

lib/workers/send_email_report.ex

  • Updated four function calls from Query.build! to
    Query.parse_and_build!
  • Changes affect stats_aggregates, pages, sources, and goals functions
  • Maintains same parameter structure and internal API usage
+4/-4     
query_builder.ex
New QueryBuilder module for Query struct construction       

lib/plausible/stats/query_builder.ex

  • New module created to handle building Query structs from
    already-parsed parameters
  • Implements comprehensive validation logic for metrics, dimensions,
    filters, and special query constraints
  • Handles segment resolution, goal preloading, revenue metrics, and
    comparison time range setup
  • Provides helper functions for time-on-page data and consolidated site
    ID retrieval
+363/-0 
parsed_query_params.ex
New ParsedQueryParams struct for parsed query state           

lib/plausible/stats/parsed_query_params.ex

  • New module defining ParsedQueryParams struct as intermediate data
    structure between parsing and building phases
  • Contains default configurations for include and pagination options
  • Provides new!/1 constructor for creating validated instances with
    required fields
+55/-0   
query.ex
Refactor Query.build to parse_and_build with delegation   

lib/plausible/stats/query.ex

  • Renamed build/4 function to parse_and_build/4 to reflect two-phase
    architecture
  • Renamed build!/4 function to parse_and_build!/4 for consistency
  • Refactored to delegate to QueryParser.parse and QueryBuilder.build
    instead of inline logic
  • Updated default include reference from QueryParser.default_include()
    to ParsedQueryParams.default_include()
  • Updated documentation to reference new function names
+18/-39 
legacy_query_builder.ex
Update legacy builder to use new QueryBuilder and QueryParser

lib/plausible/stats/legacy/legacy_query_builder.ex

  • Updated function calls from Query.put_comparison_utc_time_range() to
    QueryBuilder.put_comparison_utc_time_range()
  • Updated function calls from Query.set_time_on_page_data() to
    QueryBuilder.set_time_on_page_data()
  • Updated preload_goals_and_revenue call to use QueryBuilder instead of
    QueryParser
  • Refactored parse_order_by and parse_include to use new QueryParser
    module with improved error handling
  • Simplified helper functions by removing json_decode and unwrap
    utilities
+24/-23 
filters.ex
Update filter module aliases and legacy parser naming       

lib/plausible/stats/filters/filters.ex

  • Updated alias from Plausible.Stats.Filters.QueryParser to
    Plausible.Stats.QueryParser
  • Renamed StatsAPIFilterParser to LegacyStatsAPIFilterParser in alias
    and usage
+3/-3     
legacy_stats_api_filter_parser.ex
Rename and document legacy stats API filter parser             

lib/plausible/stats/filters/legacy_stats_api_filter_parser.ex

  • Renamed module from Plausible.Stats.Filters.StatsAPIFilterParser to
    Plausible.Stats.Filters.LegacyStatsAPIFilterParser
  • Enhanced moduledoc to clarify this is for legacy Stats API v1 filter
    format
+4/-2     
filters.ex
Update segment filters to use relocated QueryParser           

lib/plausible/segments/filters.ex

  • Updated alias to import QueryParser from Plausible.Stats instead of
    nested under Filters
  • Updated function call from Filters.QueryParser.parse_filters to
    QueryParser.parse_filters
+2/-2     
external_query_api_controller.ex
Update external query API controller function call             

lib/plausible_web/controllers/api/external_query_api_controller.ex

  • Updated Query.build call to Query.parse_and_build to match renamed
    function
+1/-1     
segment.ex
Update segment module function call                                           

lib/plausible/segments/segment.ex

  • Updated Query.build call to Query.parse_and_build to match renamed
    function
+1/-1     
page_rules.ex
Update page rules shield function call                                     

lib/plausible_web/live/shields/page_rules.ex

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
hostname_rules.ex
Update hostname rules shield function call                             

lib/plausible_web/live/shields/hostname_rules.ex

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
form.ex
Update goal settings form function call                                   

lib/plausible_web/live/goal_settings/form.ex

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
goal_suggestions.ex
Update goal suggestions function call                                       

lib/plausible/stats/goal_suggestions.ex

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
consolidated_view.ex
Update consolidated view function calls                                   

extra/lib/plausible/stats/consolidated_view.ex

  • Updated two Query.build! calls to Query.parse_and_build! to match
    renamed function
+2/-2     
form.ex
Update funnel settings form function call                               

extra/lib/plausible_web/live/funnel_settings/form.ex

  • Updated Query.build! call to Query.parse_and_build! to match renamed
    function
+1/-1     
traffic_change_notifier.ex
Update traffic change notifier function calls                       

lib/workers/traffic_change_notifier.ex

  • Updated two Query.build! calls to Query.parse_and_build! to match
    renamed function
+2/-2     
Additional files
1 files
query_parser_test.exs +0/-2740

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit logs: The new parsing/building flow adds validation and building logic without any audit logging
of critical actions or failures, and the diff does not show existing logging that would
satisfy audit trail requirements.

Referred Code
def build(site, parsed_query_params, params, debug_metadata \\ %{}) do
  with {:ok, parsed_query_params} <- resolve_segments_in_filters(parsed_query_params, site),
       query = do_build(parsed_query_params, site, params, debug_metadata),
       :ok <- validate_order_by(query),
       :ok <- validate_custom_props_access(site, query),
       :ok <- validate_toplevel_only_filter_dimension(query),
       :ok <- validate_special_metrics_filters(query),
       :ok <- validate_behavioral_filters(query),
       :ok <- validate_filtered_goals_exist(query),
       :ok <- validate_revenue_metrics_access(site, query),
       :ok <- validate_metrics(query),
       :ok <- validate_include(query) do
    query =
      query
      |> set_time_on_page_data(site)
      |> put_comparison_utc_time_range()
      |> Query.put_imported_opts(site)

    on_ee do
      # NOTE: The Query API schema does not allow the sample_threshold param
      # and it looks like it's not used as a parameter anymore. We might want


 ... (clipped 6 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error detail surface: The builder and parser return detailed validation error strings (e.g., listing invalid
dimensions or metrics) which might propagate to user-facing contexts; it is unclear from
the diff whether these are sanitized at the API boundary.

Referred Code
defp validate_order_by(query) do
  if query.order_by do
    valid_values = query.metrics ++ query.dimensions

    invalid_entry =
      Enum.find(query.order_by, fn {value, _direction} ->
        not Enum.member?(valid_values, value)
      end)

    case invalid_entry do
      nil ->
        :ok

      _ ->
        {:error,
         "Invalid order_by entry '#{i(invalid_entry)}'. Entry is not a queried metric or dimension."}
    end
  else
    :ok
  end
end


 ... (clipped 196 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
No log controls: The new code introduces parsing and building paths without adding or showing structured
logging controls or scrubbing for potentially sensitive filter values, and the diff does
not demonstrate existing logging safeguards.

Referred Code
def build(site, parsed_query_params, params, debug_metadata \\ %{}) do
  with {:ok, parsed_query_params} <- resolve_segments_in_filters(parsed_query_params, site),
       query = do_build(parsed_query_params, site, params, debug_metadata),
       :ok <- validate_order_by(query),
       :ok <- validate_custom_props_access(site, query),
       :ok <- validate_toplevel_only_filter_dimension(query),
       :ok <- validate_special_metrics_filters(query),
       :ok <- validate_behavioral_filters(query),
       :ok <- validate_filtered_goals_exist(query),
       :ok <- validate_revenue_metrics_access(site, query),
       :ok <- validate_metrics(query),
       :ok <- validate_include(query) do
    query =
      query
      |> set_time_on_page_data(site)
      |> put_comparison_utc_time_range()
      |> Query.put_imported_opts(site)

    on_ee do
      # NOTE: The Query API schema does not allow the sample_threshold param
      # and it looks like it's not used as a parameter anymore. We might want


 ... (clipped 6 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refine the new query building API

Refine the QueryBuilder.build function to remove its dependency on raw, untyped
parameter maps. Instead, it should rely solely on the new typed
ParsedQueryParams struct for its inputs.

Examples:

lib/plausible/stats/query_builder.ex [10-36]
  def build(site, parsed_query_params, params, debug_metadata \\ %{}) do
    with {:ok, parsed_query_params} <- resolve_segments_in_filters(parsed_query_params, site),
         query = do_build(parsed_query_params, site, params, debug_metadata),
         :ok <- validate_order_by(query),
         :ok <- validate_custom_props_access(site, query),
         :ok <- validate_toplevel_only_filter_dimension(query),
         :ok <- validate_special_metrics_filters(query),
         :ok <- validate_behavioral_filters(query),
         :ok <- validate_filtered_goals_exist(query),
         :ok <- validate_revenue_metrics_access(site, query),

 ... (clipped 17 lines)
lib/plausible/stats/query.ex [50-59]
  def parse_and_build(
        %Plausible.Site{domain: domain} = site,
        schema_type,
        %{"site_id" => domain} = params,
        debug_metadata \\ %{}
      ) do
    with {:ok, %ParsedQueryParams{} = parsed_query_params} <-
           QueryParser.parse(site, schema_type, params) do
      QueryBuilder.build(site, parsed_query_params, params, debug_metadata)
    end

Solution Walkthrough:

Before:

# lib/plausible/stats/query.ex
def parse_and_build(site, schema_type, params, ...) do
  with {:ok, parsed_query_params} <- QueryParser.parse(site, schema_type, params) do
    # The raw `params` map is passed to the builder
    QueryBuilder.build(site, parsed_query_params, params, ...)
  end
end

# lib/plausible/stats/query_builder.ex
def build(site, parsed_query_params, params, ...) do
  # ...
  # The builder uses the raw `params` map
  query = do_build(parsed_query_params, site, params, ...)
  # ...
end

After:

# lib/plausible/stats/parsed_query_params.ex
defstruct [
  # ...
  :input_date_range
]

# lib/plausible/stats/query_parser.ex
def parse(site, schema_type, params, ...) do
  # ...
  # The parser now populates `input_date_range`
  Plausible.Stats.ParsedQueryParams.new!(%{
    # ...
    input_date_range: Map.get(params, "date_range")
  })
end

# lib/plausible/stats/query.ex
def parse_and_build(site, schema_type, params, ...) do
  with {:ok, parsed_query_params} <- QueryParser.parse(site, schema_type, params) do
    # The raw `params` map is no longer needed
    QueryBuilder.build(site, parsed_query_params, ...)
  end
end
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a design flaw where QueryBuilder.build still depends on the raw params map, which contradicts the PR's goal of separating parsing and building logic.

Medium
Possible issue
Return empty list for nil filters

Modify parse_filters/1 to return {:ok, []} instead of {:ok, nil} when the input
is nil.

lib/plausible/stats/query_parser.ex [60]

-defp parse_filters(nil), do: {:ok, nil}
+defp parse_filters(nil), do: {:ok, []}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that returning [] instead of nil for missing filters aligns better with the expected data structure and test assertions, which expect filters: []. This improves consistency and robustness by ensuring the filters field is always a list.

Medium
Return empty list for nil dimensions

Modify parse_dimensions/1 to return {:ok, []} instead of {:ok, nil} when the
input is nil.

lib/plausible/stats/query_parser.ex [258]

-defp parse_dimensions(nil), do: {:ok, nil}
+defp parse_dimensions(nil), do: {:ok, []}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that returning [] instead of nil for missing dimensions is more consistent with the expected data type and test cases, which assert dimensions: []. This change improves code robustness by ensuring the dimensions field is always a list.

Medium
Return default pagination for nil input

Modify parse_pagination/1 to return the default pagination map,
Plausible.Stats.ParsedQueryParams.default_pagination(), instead of nil when the
input is nil.

lib/plausible/stats/query_parser.ex [345]

-defp parse_pagination(nil), do: {:ok, nil}
+defp parse_pagination(nil), do: {:ok, Plausible.Stats.ParsedQueryParams.default_pagination()}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly observes that returning a default pagination map instead of nil aligns with the test expectations and ensures the pagination field is always a map. This improves consistency and prevents potential nil value issues downstream.

Medium
General
Improve error message for invalid filters

Improve the error message for invalid filter dimensions to list all invalid
dimensions at once, rather than just the first one found.

lib/plausible/stats/query_builder.ex [164-169]

-if Enum.count(not_toplevel) > 0 do
+if Enum.any?(not_toplevel) do
+  invalid_dims = Enum.join(not_toplevel, ", ")
+
   {:error,
-   "Invalid filters. Dimension `#{List.first(not_toplevel)}` can only be filtered at the top level."}
+   "Invalid filters. Dimensions `#{invalid_dims}` can only be filtered at the top level."}
 else
   :ok
 end
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion improves the user experience by providing a more comprehensive error message, which is a valuable quality-of-life enhancement.

Low
Improve validation for required parameters

Refactor the metrics validation in new!/1 to use an if condition and an explicit
raise for an empty list, instead of pattern matching.

lib/plausible/stats/parsed_query_params.ex [40-54]

 def new!(params) when is_map(params) do
   %DateTimeRange{} = utc_time_range = Map.fetch!(params, :utc_time_range)
-  [_ | _] = metrics = Map.fetch!(params, :metrics)
+  metrics = Map.fetch!(params, :metrics)
+
+  if metrics == [] do
+    raise "ParsedQueryParams `metrics` cannot be empty"
+  end
 
   %__MODULE__{
     now: params[:now],
     utc_time_range: utc_time_range,
     metrics: metrics,
     filters: params[:filters] || [],
     dimensions: params[:dimensions] || [],
     order_by: params[:order_by],
     pagination: Map.merge(@default_pagination, params[:pagination] || %{}),
     include: Map.merge(@default_include, params[:include] || %{})
   }
 end
  • Apply / Chat
Suggestion importance[1-10]: 2

__

Why: The suggestion proposes a stylistic change that is subjective; the existing code is a common and valid Elixir idiom for asserting a non-empty list.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants