Skip to content

Commit aea6589

Browse files
amokanZiinc
andauthored
adds sandbox query/cte support for clickhouse (#2834)
Co-authored-by: Ziinc <Ziinc@users.noreply.github.com>
1 parent c98f89f commit aea6589

File tree

6 files changed

+301
-27
lines changed

6 files changed

+301
-27
lines changed

docs/docs.logflare.com/docs/backends/clickhouse.mdx

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ sidebar_position: 7
33
---
44
# ClickHouse
55

6-
The ClickHouse backend is **ingest-only** to a ClickHouse [HTTP](https://clickhouse.com/docs/interfaces/http) endpoint.
6+
The ClickHouse backend supports both **ingestion** and **querying** via a ClickHouse [HTTP](https://clickhouse.com/docs/interfaces/http) endpoint.
77

88
## Behavior and Configuration
99

@@ -50,3 +50,38 @@ The ingest table schema is as follows:
5050
By default, the ClickHouse backends will utilize the [`MergeTree` engine](https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree).
5151

5252
Note that when using ClickHouse Cloud, replication is handled automatically as mentioned in the [data replication documentaion](https://clickhouse.com/docs/engines/table-engines/mergetree-family/replication#creating-replicated-tables).
53+
54+
## Querying
55+
56+
ClickHouse backends support SQL querying through Logflare Endpoints and Alerts. The backend uses ClickHouse SQL dialect, which supports standard SQL features including:
57+
58+
- Common Table Expressions (CTEs) with `WITH` clauses
59+
- Complex aggregations and window functions
60+
- Array and nested data type operations
61+
- ClickHouse-specific functions (e.g., `tuple()`, `arraySlice()`, `JSONExtractString()`)
62+
63+
### Sandboxed Queries
64+
65+
ClickHouse backends fully support [sandboxed queries](/concepts/endpoints#query-sandboxing) within Endpoints, allowing you to create secure, parameterized API endpoints where consumers can provide custom SQL while being restricted to pre-defined data subsets.
66+
67+
Example sandboxed ClickHouse endpoint:
68+
69+
```sql
70+
WITH filtered_logs AS (
71+
SELECT id, event_message, timestamp
72+
FROM my_clickhouse_source
73+
WHERE timestamp > now() - interval 1 day
74+
)
75+
SELECT * FROM filtered_logs
76+
```
77+
78+
Consumers can then query within the sandbox via the `sql=` parameter:
79+
80+
```sql
81+
SELECT event_message, count(*) as count
82+
FROM filtered_logs
83+
GROUP BY event_message
84+
ORDER BY count DESC
85+
```
86+
87+
See the [Endpoints documentation](/concepts/endpoints) for more details on sandboxed queries and security features.

docs/docs.logflare.com/docs/concepts/endpoints.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ Parameters that do not match any that are declared in the SQL template will be i
5959

6060
You can create sandboxed queries by using a CTE within the query. It allows the Endpoint consumer to provide a custom SQL query through the `sql=` query parameter.
6161

62+
:::note
63+
Sandboxed queries are supported for BigQuery and ClickHouse backends. PostgreSQL backends do not currently support this feature.
64+
:::
65+
6266
For example, this sandboxed query creates a temporary result called `errors`, which limits the results to containing the `"ERROR"` string as well as being before the year `2020` .
6367

6468
```sql

lib/logflare/sql.ex

Lines changed: 123 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ defmodule Logflare.Sql do
2323

2424
@typep query_language :: :bq_sql | :ch_sql | :pg_sql
2525

26+
@bq_restricted_functions ~w(external_query session_user)
27+
@ch_restricted_functions ~w(azureblobstorage cluster currentuser deltalake file gcs hdfs hudi iceberg jdbc mongodb mysql odbc postgresql redis remote remotesecure s3 sqlite url)
28+
2629
@doc """
2730
Converts a language atom to its corresponding dialect.
2831
@@ -66,22 +69,49 @@ defmodule Logflare.Sql do
6669
end
6770

6871
@doc """
69-
Transforms and validates an SQL query for querying with bigquery.any()
70-
The resultant SQL is BigQuery compatible.
72+
Transforms and validates a SQL query for the specified dialect,
73+
which can be BigQuery (`:bq_sql`), ClickHouse (`:ch_sql`), or PostgreSQL (`:pg_sql`).
7174
75+
The query is parsed, validated, and transformed to include fully-qualified table names
76+
appropriate for the target backend.
7277
73-
DML is blocked
74-
- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax
78+
## Validation Rules
7579
76-
### Example
80+
All queries are validated to ensure:
81+
- Only SELECT statements are allowed (DML is blocked)
82+
- Single query only (no multiple statements)
83+
- No wildcard selects (`SELECT *`)
84+
- No restricted functions (`SESSION_USER`, `EXTERNAL_QUERY`)
85+
- All referenced tables/sources exist
86+
87+
## Sandboxed Queries
88+
89+
BigQuery and ClickHouse support sandboxed queries via tuple input `{cte_query, consumer_query}`.
90+
This allows secure, parameterized endpoints where consumers can provide custom SQL while
91+
being restricted to pre-defined data subsets via CTEs.
92+
93+
PostgreSQL does not currently support sandboxed queries.
94+
95+
## Examples
7796
78-
iex> transform("select a from my_table", %User{...})
79-
{:ok, "select a from `my_project.my_dataset.source_token`"}
97+
Basic query transformation:
8098
81-
With a sandboxed query
82-
iex> cte = "..."
83-
iex> transform({cte, "select a from my_alias"}, %User{...})
84-
{:ok, "..."}
99+
transform(:bq_sql, "select a from my_table", user)
100+
# => {:ok, "select a from `my_project.my_dataset.source_token`"}
101+
102+
transform(:ch_sql, "select a from my_table", user)
103+
# => {:ok, "select a from my_clickhouse_table"}
104+
105+
Sandboxed query (BigQuery and ClickHouse only):
106+
107+
cte = "with filtered as (select a from my_table where a > 0) select a from filtered"
108+
consumer_query = "select a from filtered where a < 100"
109+
110+
transform(:bq_sql, {cte, consumer_query}, user)
111+
# => {:ok, "with filtered as (select a from `project.dataset.token` where a > 0) select a from filtered where a < 100"}
112+
113+
transform(:ch_sql, {cte, consumer_query}, user)
114+
# => {:ok, "with filtered as (select a from my_clickhouse_table where a > 0) select a from filtered where a < 100"}
85115
"""
86116
@typep input :: String.t() | {String.t(), String.t()}
87117
@spec transform(
@@ -95,8 +125,44 @@ defmodule Logflare.Sql do
95125
transform(lang, input, user)
96126
end
97127

98-
# clickhouse and postgres
99-
def transform(language, query, %User{} = user) when language in ~w(ch_sql pg_sql)a do
128+
# clickhouse with sandboxed query support
129+
def transform(:ch_sql = language, input, %User{} = user) do
130+
{query, sandboxed_query} =
131+
case input do
132+
q when is_non_empty_binary(q) -> {q, nil}
133+
other when is_tuple(other) -> other
134+
end
135+
136+
sql_dialect = to_dialect(language)
137+
sources = Sources.list_sources_by_user(user)
138+
source_mapping = source_mapping(sources)
139+
140+
Logger.metadata(query_string: query)
141+
142+
with {:ok, statements} <- Parser.parse(sql_dialect, query),
143+
{:ok, sandboxed_query_ast} <- sandboxed_ast(sandboxed_query, sql_dialect),
144+
base_data = %{
145+
sources: sources,
146+
source_mapping: source_mapping,
147+
source_names: Map.keys(source_mapping),
148+
sandboxed_query: sandboxed_query,
149+
sandboxed_query_ast: sandboxed_query_ast,
150+
ast: statements,
151+
dialect: sql_dialect
152+
},
153+
data = DialectTransformer.Clickhouse.build_transformation_data(user, base_data),
154+
:ok <- validate_query(statements, data),
155+
:ok <- maybe_validate_sandboxed_query_ast({statements, sandboxed_query_ast}, data) do
156+
data = %{data | sandboxed_query_ast: sandboxed_query_ast}
157+
158+
statements
159+
|> do_transform(data)
160+
|> Parser.to_string()
161+
end
162+
end
163+
164+
# postgres (no sandboxed query support)
165+
def transform(:pg_sql = language, query, %User{} = user) do
100166
sql_dialect = to_dialect(language)
101167
sources = Sources.list_sources_by_user(user)
102168
source_mapping = source_mapping(sources)
@@ -347,16 +413,22 @@ defmodule Logflare.Sql do
347413
defp sandboxed_ast(_, _), do: {:ok, nil}
348414

349415
# applies to both ctes, sandboxed queries, and non-ctes
350-
defp validate_query(ast, data) when is_list(ast) do
416+
defp validate_query(ast, %{dialect: dialect} = data) when is_list(ast) do
351417
with :ok <- check_select_statement_only(ast),
352418
:ok <- check_single_query_only(ast),
353-
:ok <- has_restricted_functions(ast),
419+
:ok <- maybe_check_restricted_functions(ast, dialect, data),
354420
:ok <- has_wildcard_in_select(ast),
355421
:ok <- check_all_sources_allowed(ast, data) do
356422
:ok
357423
end
358424
end
359425

426+
defp maybe_check_restricted_functions(ast, dialect, data)
427+
when dialect in ~w(bigquery clickhouse),
428+
do: has_restricted_functions(ast, data)
429+
430+
defp maybe_check_restricted_functions(_ast, _dialect, _data), do: :ok
431+
360432
# applies only to the sandboed query
361433
defp maybe_validate_sandboxed_query_ast({cte_ast, ast}, data) when is_list(ast) do
362434
with :ok <- validate_query(ast, data),
@@ -481,33 +553,58 @@ defmodule Logflare.Sql do
481553
end
482554
end
483555

484-
defp has_restricted_functions(ast) when is_list(ast), do: has_restricted_functions(ast, :ok)
556+
defp has_restricted_functions(ast, data) when is_list(ast),
557+
do: has_restricted_functions(ast, :ok, data)
485558

486-
defp has_restricted_functions({"Function", %{"name" => [%{"value" => _} | _] = names}}, :ok) do
487-
restricted =
559+
defp has_restricted_functions({"Function", %{"name" => [%{"value" => _} | _] = names}}, :ok, %{
560+
dialect: dialect
561+
}) do
562+
restricted_list = get_restricted_functions_for_dialect(dialect)
563+
564+
found_restricted =
488565
for name <- names,
489566
normalized = String.downcase(name["value"]),
490-
normalized in ["session_user", "external_query"] do
567+
normalized in restricted_list do
491568
normalized
492569
end
493570

494-
if Enum.empty?(restricted) do
571+
if Enum.empty?(found_restricted) do
495572
:ok
496573
else
497-
{:error, "Restricted function #{Enum.join(restricted, ", ")}"}
574+
{:error, "Restricted function #{Enum.join(found_restricted, ", ")}"}
498575
end
499576
end
500577

501-
defp has_restricted_functions(kv, :ok = acc) when is_list(kv) or is_map(kv) do
578+
defp has_restricted_functions(
579+
{"Table", %{"args" => [_ | _], "name" => [%{"value" => name} | _]}},
580+
:ok,
581+
%{dialect: dialect}
582+
) do
583+
restricted_list = get_restricted_functions_for_dialect(dialect)
584+
normalized = String.downcase(name)
585+
586+
if normalized in restricted_list do
587+
{:error, "Restricted function #{normalized}"}
588+
else
589+
:ok
590+
end
591+
end
592+
593+
defp has_restricted_functions(kv, :ok = acc, data) when is_list(kv) or is_map(kv) do
502594
kv
503-
|> Enum.reduce(acc, fn kv, nested_acc -> has_restricted_functions(kv, nested_acc) end)
595+
|> Enum.reduce(acc, fn kv, nested_acc -> has_restricted_functions(kv, nested_acc, data) end)
504596
end
505597

506-
defp has_restricted_functions({_k, v}, :ok = acc) when is_list(v) or is_map(v) do
507-
has_restricted_functions(v, acc)
598+
defp has_restricted_functions({_k, v}, :ok = acc, data) when is_list(v) or is_map(v) do
599+
has_restricted_functions(v, acc, data)
508600
end
509601

510-
defp has_restricted_functions(_kv, acc), do: acc
602+
defp has_restricted_functions(_kv, acc, _data), do: acc
603+
604+
@spec get_restricted_functions_for_dialect(String.t() | nil) :: [String.t()]
605+
defp get_restricted_functions_for_dialect("bigquery"), do: @bq_restricted_functions
606+
defp get_restricted_functions_for_dialect("clickhouse"), do: @ch_restricted_functions
607+
defp get_restricted_functions_for_dialect(_), do: []
511608

512609
defp has_restricted_sources(cte_ast, ast) when is_list(ast) do
513610
aliases =

lib/logflare/sql/dialect_transformer/clickhouse.ex

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ defmodule Logflare.Sql.DialectTransformer.Clickhouse do
66
@behaviour Logflare.Sql.DialectTransformer
77

88
alias Logflare.Backends.Adaptor.ClickhouseAdaptor
9+
alias Logflare.User
910

1011
@impl true
1112
def quote_style, do: nil
@@ -18,4 +19,12 @@ defmodule Logflare.Sql.DialectTransformer.Clickhouse do
1819
source = Enum.find(sources, fn s -> s.name == source_name end)
1920
ClickhouseAdaptor.clickhouse_ingest_table_name(source)
2021
end
22+
23+
@doc """
24+
Builds transformation data for ClickHouse from a user and base data.
25+
26+
Since ClickHouse does not require project/dataset metadata, we can just pass through the base data.
27+
"""
28+
@spec build_transformation_data(User.t(), map()) :: map()
29+
def build_transformation_data(%User{}, base_data), do: base_data
2130
end

test/logflare/sql/dialect_transformer/clickhouse_test.exs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,4 +63,21 @@ defmodule Logflare.Sql.DialectTransformer.ClickhouseTest do
6363
assert result == expected
6464
end
6565
end
66+
67+
describe "build_transformation_data/2" do
68+
test "passes through base data unchanged" do
69+
user = build(:user)
70+
71+
base_data = %{
72+
sources: [],
73+
dialect: "clickhouse",
74+
ast: [],
75+
sandboxed_query: nil
76+
}
77+
78+
result = Clickhouse.build_transformation_data(user, base_data)
79+
80+
assert result == base_data
81+
end
82+
end
6683
end

0 commit comments

Comments
 (0)