Skip to content

Implement $sqlquery-run operation #2561

@johngrimes

Description

@johngrimes

The SQL on FHIR v2.1.0-pre specification introduces the $sqlquery-run operation, which executes SQL queries against materialised ViewDefinition tables. This enables ad-hoc analytics, interactive query development, and real-time data retrieval.

Operation summary

The operation takes a SQLQuery resource (a Library profile) that bundles:

  • SQL content (content.data): Base64-encoded SQL query with content type application/sql (optionally with a dialect parameter, e.g. application/sql;dialect=spark)
  • ViewDefinition dependencies (relatedArtifact): Each dependency has a label (the table alias used in the SQL) and a resource (canonical URL of the ViewDefinition)
  • Parameters (parameter): Named, typed input parameters referenced in SQL via colon-prefix notation (:parameter_name)

Endpoints

Level Endpoint Query source
System POST [base]/$sqlquery-run queryReference or queryResource
Type POST [base]/Library/$sqlquery-run queryReference or queryResource
Instance POST [base]/Library/[id]/$sqlquery-run Library instance

Parameters

Name Cardinality Type Description
_format 1..1 code Output format (json, ndjson, csv, parquet)
header 0..1 boolean Include CSV headers (default true)
queryReference 0..1 Reference Reference to stored SQLQuery Library
queryResource 0..1 SQLQuery Inline SQLQuery Library resource
parameter 0..* complex Query parameter values (name + value)
source 0..1 string External data source reference

Execution flow

  1. Resolve ViewDefinitions from relatedArtifact
  2. Execute each ViewDefinition and register the result as a named Spark view (using the label as the view name)
  3. Bind parameter values to SQL placeholders
  4. Execute the SQL query against the registered views
  5. Return results in the requested format

Implementation approach within Pathling

Pathling already implements $viewdefinition-run using its FHIRPath-to-Spark projection engine. The $sqlquery-run operation would complement this by allowing users to write SQL queries that join across multiple views.

View registration strategy

ViewDefinitions should be registered as Spark temporary views eagerly, so that $sqlquery-run can execute SQL against them without per-request overhead:

  • On startup: Register all ViewDefinition resources stored in the server as Spark temporary views (using the ViewDefinition's name as the view name).
  • On write: When a ViewDefinition is created or updated via the FHIR API, register or replace the corresponding temporary view. When a ViewDefinition is deleted, drop the view.

Since these are Spark views (lazy query plans, not materialised tables), they always reflect the current state of the underlying FHIR data. No action is needed when data changes.

The relatedArtifact.label in the SQLQuery maps to the registered view name. If a label does not match a registered view, the operation should return a 404.

Parameter binding

The spec requires that parameter values are safely bound and never interpolated via string concatenation. Spark SQL supports parameterised queries via spark.sql(sqlText, args) (Spark 3.4+), which should be used to bind :parameter_name placeholders safely.

Security: read-only SQL execution

Since this operation accepts arbitrary SQL from users, the implementation must ensure that SQL execution is strictly read-only and scoped to only the registered views. The approach is to whitelist the plan nodes, expressions, and functions that are permitted, rejecting everything else.

After parsing the SQL with spark.sessionState().sqlParser().parsePlan(sql), walk the entire LogicalPlan tree and verify that every plan node and every expression belongs to an allowed set.

1. Allowed plan nodes

Every LogicalPlan node in the parsed tree must be one of the following.

Basic operations:
Project, Filter, Sort, GlobalLimit, LocalLimit, Tail, Offset, ReturnAnswer

Set operations:
Union, Intersect, Except

Joins:
Join, LateralJoin, AsOfJoin

Aggregation and grouping:
Aggregate, Expand, Window, WithWindowDefinition

Table and relation access:
UnresolvedRelation, SubqueryAlias, LocalRelation, OneRowRelation, EmptyRelation, Range, View

Common table expressions:
WithCTE, CTERelationDef, CTERelationRef, UnresolvedWith

Deduplication:
Distinct, Deduplicate

Pivot and unpivot:
Pivot, Unpivot

Generate (LATERAL VIEW / EXPLODE):
Generate

Hints:
ResolvedHint, UnresolvedHint

Sampling:
Sample

Repartitioning (read-only):
Repartition, RebalancePartitions

Transposition:
Transpose

Subquery wrapper:
Subquery

Any plan node not in this list must cause the query to be rejected. In particular, the following are excluded:

  • All Command subclasses (DDL/DML: CreateTable, InsertInto, DropTable, AlterTable, etc.)
  • ScriptTransformation (forks external processes via TRANSFORM ... USING 'script')
  • All Dataset API plan nodes (MapPartitions, DeserializeToObject, MapElements, TypedFilter, AppendColumns, CoGroup, etc.)

2. Allowed expressions

Every Expression in the plan tree must belong to one of the following categories. Where a base class is listed, all its subclasses are permitted.

Literals and references:
Literal, AttributeReference, BoundReference, OuterReference, Alias, VariableReference, LambdaVariable, NamedLambdaVariable

Arithmetic (all BinaryArithmetic subclasses):
Add, Subtract, Multiply, Divide, IntegralDivide, Remainder, Pmod, UnaryMinus, UnaryPositive, Abs

Comparison (all BinaryComparison subclasses):
EqualTo, EqualNullSafe, LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual

Predicate expressions:
In, InSet, InSubquery, Between, Like, ILike, RLike, LikeAll, LikeAny, NotLikeAll, NotLikeAny, Contains, StartsWith, EndsWith, StringInstr, StringLocate, FindInSet

Logical operators:
And, Or, Not

Null handling:
IsNull, IsNotNull, Coalesce, NullIf, Nvl, Nvl2, IfNull, NaNvl, IsNaN

Conditional expressions:
If, CaseWhen, Greatest, Least

Type casting:
Cast, AnsiCast, ToBinary, TryToBinary, ToNumber, TryToNumber, ToCharacter

String functions:
Substring, SubstringIndex, Concat, ConcatWs, Upper, Lower, Length, OctetLength, BitLength, StringTrimBoth, StringTrimLeft, StringTrimRight, StringRepeat, StringReplace, StringTranslate, StringSpace, StringSplit, StringSplitSQL, Reverse, Ascii, Chr, InitCap, SoundEx, Levenshtein, Luhncheck, FormatString, FormatNumber, Encode, Decode, Base64, UnBase64, Hex, Unhex

Regular expressions:
RegExpReplace, RegExpExtract, RegExpExtractAll, RegExpCount, RegExpInStr, RegExpSubStr

Math functions (all UnaryMathExpression subclasses, plus):
Sqrt, Cbrt, Pow, Exp, Expm1, Log, Log10, Log2, Log1p, Logarithm, Floor, Ceil, Round, BRound, RoundCeil, RoundFloor, Rint, Signum, Bin, Conv, Pi, EulerNumber, WidthBucket

Trigonometric functions:
Sin, Cos, Tan, Cot, Sec, Csc, Asin, Acos, Atan, Atan2, Sinh, Cosh, Tanh, Asinh, Acosh, Atanh, Degrees, Radians, Hypot

Date and time functions:
CurrentDate, CurrentTimestamp, LocalTimestamp, CurrentTime, Now, DateAdd, DateSub, DateDiff, DateFromUnixDate, AddMonths, MonthsBetween, LastDay, NextDay, DayOfWeek, WeekDay, DayOfMonth, DayOfYear, WeekOfYear, Month, Quarter, Year, YearOfWeek, DayName, MonthName, Hour, Minute, Second, SecondWithFraction, TruncDate, TruncTimestamp, DateFormatClass, FromUnixTime, ToUnixTimestamp, UnixDate, UnixSeconds, UnixMillis, UnixMicros, MakeDate, MakeTime, MakeTimestamp, MakeTimestampNTZ, TryMakeTimestamp, MakeInterval, MakeDTInterval, MakeYMInterval, TryMakeInterval, ConvertTimezone, ToUTCTimestamp, FromUTCTimestamp, ParseToDate, ParseToTimestamp, GetTimestamp

Date and time arithmetic:
DateAddInterval, DateAddYMInterval, TimestampAdd, TimestampAddInterval, TimestampAddYMInterval, TimestampDiff, TimeDiff, SubtractDates, SubtractTimes, SubtractTimestamps, DatetimeSub, DivideDTInterval, DivideYMInterval, MultiplyDTInterval, MultiplyYMInterval

Interval extraction:
ExtractIntervalDays, ExtractIntervalHours, ExtractIntervalMinutes, ExtractIntervalSeconds, ExtractIntervalMonths, ExtractIntervalYears, Days, Hours, Minutes, Seconds, Months, Years

Array functions:
CreateArray, GetArrayItem, Size, ArraySize, ArrayContains, ArrayPosition, ArrayAppend, ArrayPrepend, ArrayInsert, ArrayRemove, ArrayRepeat, ArraySort, ArrayMax, ArrayMin, ArrayJoin, ArrayDistinct, ArrayUnion, ArrayIntersect, ArrayExcept, ArraysOverlap, ArraysZip, ArrayCompact, Flatten, Slice, Sequence

Map functions:
CreateMap, GetMapValue, MapSize, MapKeys, MapValues, MapEntries, MapContainsKey, MapConcat, MapFromArrays, MapFromEntries, MapSort, StringToMap

Struct functions:
CreateNamedStruct, GetStructField, GetArrayStructFields, UpdateFields, WithField, DropField

Higher-order functions:
LambdaFunction, ArrayTransform, ArrayFilter, ArrayExists, ArrayForAll, ArrayAggregate, MapTransformKeys, MapTransformValues, MapFilter, MapZipWith

Aggregate functions:
Count, CountIf, Sum, Average, Min, Max, First, Last, CollectList, CollectSet, Corr, CovPopulation, CovSample, Skewness, Kurtosis, StddevPop, StddevSamp, VariancePop, VarianceSamp, Mode, Median, Percentile, PercentileCont, PercentileDisc, ApproximatePercentile, ApproxCountDistinct, AnyValue, BoolAnd, BoolOr, BitAndAgg, BitOrAgg, BitXorAgg, HistogramNumeric, HyperLogLogPlusPlus

Regression functions:
RegrAvgX, RegrAvgY, RegrCount, RegrIntercept, RegrSlope, RegrR2, RegrSXX, RegrSXY, RegrSYY

Window functions:
RowNumber, Rank, DenseRank, PercentRank, CumeDist, Lead, Lag, NthValue, NTile, WindowExpression, WindowSpecDefinition, SpecifiedWindowFrame, UnspecifiedFrame

Grouping functions:
Grouping, GroupingID

Sorting:
SortOrder

Subquery expressions:
ScalarSubquery, InSubquery, Exists, ListQuery, LateralSubquery

Generator expressions:
Explode, PosExplode, Inline, Stack

JSON functions:
JsonTuple, GetJsonObject, JsonObjectKeys, JsonToStructs, StructsToJson, LengthOfJsonArray, SchemaOfJson

CSV functions:
CsvToStructs, StructsToCsv, SchemaOfCsv

Hash functions:
Md5, Sha1, Sha2, Crc32, Murmur3Hash, XxHash64

Bitwise operations:
BitwiseAnd, BitwiseOr, BitwiseXor, BitwiseNot, BitwiseGet, BitwiseCount, ShiftLeft, ShiftRight, ShiftRightUnsigned

URL functions:
ParseUrl, TryParseUrl, UrlDecode, UrlEncode, TryUrlDecode

Data masking:
Mask

Non-deterministic (safe):
Rand, Randn, Uuid, Shuffle, RandStr, Uniform

Error handling:
RaiseError, TryEval, TryAdd, TrySubtract, TryMultiply, TryDivide, TryMod, TryElementAt, AssertTrue

Safe try-conversions:
TryToBinary, TryToNumber, TryUrlDecode, TryParseUrl, TryMakeTimestamp, TryMakeInterval

Utility:
Coalesce, Empty2Null, TypeOf, IsValidUTF8, MakeValidUTF8

Any expression not in this list must cause the query to be rejected. In particular, the following are excluded:

  • CallMethodViaReflection / TryReflect (arbitrary static Java method invocation via reflect() / java_method() / try_reflect())
  • ScalaUDF (unless the function name is in the Pathling UDF allowlist — see below)
  • PythonUDF, HiveSimpleUDF, HiveGenericUDF, HiveUDAF
  • PrintToStderr
  • StaticInvoke, Invoke, NewInstance (internal reflection expressions)

3. Pathling UDF allowlist

ScalaUDF expressions are permitted only if their function name matches one of Pathling's registered UDFs, all of which are safe read-only operations.

Terminology UDFs (registered by TerminologyUdfRegistrar):
display, member_of, subsumes, designation, translate_coding, property_string, property_code, property_integer, property_boolean, property_decimal, property_dateTime, property_Coding

FHIRPath UDFs (registered by PathlingUdfRegistrar):
to_null, string_to_quantity, quantity_to_literal, decimal_to_literal, coding_to_literal, convert_quantity_to_unit, low_boundary_for_date, high_boundary_for_date, low_boundary_for_time, high_boundary_for_time

Summary of validation layers

  1. Plan node whitelist — only the listed read-only plan nodes are permitted
  2. Expression whitelist — only the listed expression types are permitted
  3. UDF allowlistScalaUDF permitted only for Pathling's 22 registered UDFs

Output format handling

Supported output formats:

Format MIME type
NDJSON application/x-ndjson
CSV text/csv
JSON application/json
Parquet application/vnd.apache.parquet

Reuse the existing ViewExecutionHelper streaming infrastructure for NDJSON, CSV, and JSON output. Add Parquet support using Spark's built-in Parquet writer, consistent with the existing $viewdefinition-export implementation in ViewDefinitionExportExecutor.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureNew feature or requestserverIssues relating to Pathling server.sqlonfhirChanges that relate to SQL on FHIR functionality

    Type

    No type

    Projects

    Status

    Planned

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions