-
Notifications
You must be signed in to change notification settings - Fork 20
Description
The SQL on FHIR v2.1.0-pre specification introduces the $sqlquery-run operation, which executes SQL queries against materialised ViewDefinition tables. This enables ad-hoc analytics, interactive query development, and real-time data retrieval.
Operation summary
The operation takes a SQLQuery resource (a Library profile) that bundles:
- SQL content (
content.data): Base64-encoded SQL query with content typeapplication/sql(optionally with a dialect parameter, e.g.application/sql;dialect=spark) - ViewDefinition dependencies (
relatedArtifact): Each dependency has alabel(the table alias used in the SQL) and aresource(canonical URL of the ViewDefinition) - Parameters (
parameter): Named, typed input parameters referenced in SQL via colon-prefix notation (:parameter_name)
Endpoints
| Level | Endpoint | Query source |
|---|---|---|
| System | POST [base]/$sqlquery-run |
queryReference or queryResource |
| Type | POST [base]/Library/$sqlquery-run |
queryReference or queryResource |
| Instance | POST [base]/Library/[id]/$sqlquery-run |
Library instance |
Parameters
| Name | Cardinality | Type | Description |
|---|---|---|---|
_format |
1..1 | code | Output format (json, ndjson, csv, parquet) |
header |
0..1 | boolean | Include CSV headers (default true) |
queryReference |
0..1 | Reference | Reference to stored SQLQuery Library |
queryResource |
0..1 | SQLQuery | Inline SQLQuery Library resource |
parameter |
0..* | complex | Query parameter values (name + value) |
source |
0..1 | string | External data source reference |
Execution flow
- Resolve ViewDefinitions from
relatedArtifact - Execute each ViewDefinition and register the result as a named Spark view (using the
labelas the view name) - Bind parameter values to SQL placeholders
- Execute the SQL query against the registered views
- Return results in the requested format
Implementation approach within Pathling
Pathling already implements $viewdefinition-run using its FHIRPath-to-Spark projection engine. The $sqlquery-run operation would complement this by allowing users to write SQL queries that join across multiple views.
View registration strategy
ViewDefinitions should be registered as Spark temporary views eagerly, so that $sqlquery-run can execute SQL against them without per-request overhead:
- On startup: Register all ViewDefinition resources stored in the server as Spark temporary views (using the ViewDefinition's
nameas the view name). - On write: When a ViewDefinition is created or updated via the FHIR API, register or replace the corresponding temporary view. When a ViewDefinition is deleted, drop the view.
Since these are Spark views (lazy query plans, not materialised tables), they always reflect the current state of the underlying FHIR data. No action is needed when data changes.
The relatedArtifact.label in the SQLQuery maps to the registered view name. If a label does not match a registered view, the operation should return a 404.
Parameter binding
The spec requires that parameter values are safely bound and never interpolated via string concatenation. Spark SQL supports parameterised queries via spark.sql(sqlText, args) (Spark 3.4+), which should be used to bind :parameter_name placeholders safely.
Security: read-only SQL execution
Since this operation accepts arbitrary SQL from users, the implementation must ensure that SQL execution is strictly read-only and scoped to only the registered views. The approach is to whitelist the plan nodes, expressions, and functions that are permitted, rejecting everything else.
After parsing the SQL with spark.sessionState().sqlParser().parsePlan(sql), walk the entire LogicalPlan tree and verify that every plan node and every expression belongs to an allowed set.
1. Allowed plan nodes
Every LogicalPlan node in the parsed tree must be one of the following.
Basic operations:
Project, Filter, Sort, GlobalLimit, LocalLimit, Tail, Offset, ReturnAnswer
Set operations:
Union, Intersect, Except
Joins:
Join, LateralJoin, AsOfJoin
Aggregation and grouping:
Aggregate, Expand, Window, WithWindowDefinition
Table and relation access:
UnresolvedRelation, SubqueryAlias, LocalRelation, OneRowRelation, EmptyRelation, Range, View
Common table expressions:
WithCTE, CTERelationDef, CTERelationRef, UnresolvedWith
Deduplication:
Distinct, Deduplicate
Pivot and unpivot:
Pivot, Unpivot
Generate (LATERAL VIEW / EXPLODE):
Generate
Hints:
ResolvedHint, UnresolvedHint
Sampling:
Sample
Repartitioning (read-only):
Repartition, RebalancePartitions
Transposition:
Transpose
Subquery wrapper:
Subquery
Any plan node not in this list must cause the query to be rejected. In particular, the following are excluded:
- All
Commandsubclasses (DDL/DML:CreateTable,InsertInto,DropTable,AlterTable, etc.) ScriptTransformation(forks external processes viaTRANSFORM ... USING 'script')- All Dataset API plan nodes (
MapPartitions,DeserializeToObject,MapElements,TypedFilter,AppendColumns,CoGroup, etc.)
2. Allowed expressions
Every Expression in the plan tree must belong to one of the following categories. Where a base class is listed, all its subclasses are permitted.
Literals and references:
Literal, AttributeReference, BoundReference, OuterReference, Alias, VariableReference, LambdaVariable, NamedLambdaVariable
Arithmetic (all BinaryArithmetic subclasses):
Add, Subtract, Multiply, Divide, IntegralDivide, Remainder, Pmod, UnaryMinus, UnaryPositive, Abs
Comparison (all BinaryComparison subclasses):
EqualTo, EqualNullSafe, LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual
Predicate expressions:
In, InSet, InSubquery, Between, Like, ILike, RLike, LikeAll, LikeAny, NotLikeAll, NotLikeAny, Contains, StartsWith, EndsWith, StringInstr, StringLocate, FindInSet
Logical operators:
And, Or, Not
Null handling:
IsNull, IsNotNull, Coalesce, NullIf, Nvl, Nvl2, IfNull, NaNvl, IsNaN
Conditional expressions:
If, CaseWhen, Greatest, Least
Type casting:
Cast, AnsiCast, ToBinary, TryToBinary, ToNumber, TryToNumber, ToCharacter
String functions:
Substring, SubstringIndex, Concat, ConcatWs, Upper, Lower, Length, OctetLength, BitLength, StringTrimBoth, StringTrimLeft, StringTrimRight, StringRepeat, StringReplace, StringTranslate, StringSpace, StringSplit, StringSplitSQL, Reverse, Ascii, Chr, InitCap, SoundEx, Levenshtein, Luhncheck, FormatString, FormatNumber, Encode, Decode, Base64, UnBase64, Hex, Unhex
Regular expressions:
RegExpReplace, RegExpExtract, RegExpExtractAll, RegExpCount, RegExpInStr, RegExpSubStr
Math functions (all UnaryMathExpression subclasses, plus):
Sqrt, Cbrt, Pow, Exp, Expm1, Log, Log10, Log2, Log1p, Logarithm, Floor, Ceil, Round, BRound, RoundCeil, RoundFloor, Rint, Signum, Bin, Conv, Pi, EulerNumber, WidthBucket
Trigonometric functions:
Sin, Cos, Tan, Cot, Sec, Csc, Asin, Acos, Atan, Atan2, Sinh, Cosh, Tanh, Asinh, Acosh, Atanh, Degrees, Radians, Hypot
Date and time functions:
CurrentDate, CurrentTimestamp, LocalTimestamp, CurrentTime, Now, DateAdd, DateSub, DateDiff, DateFromUnixDate, AddMonths, MonthsBetween, LastDay, NextDay, DayOfWeek, WeekDay, DayOfMonth, DayOfYear, WeekOfYear, Month, Quarter, Year, YearOfWeek, DayName, MonthName, Hour, Minute, Second, SecondWithFraction, TruncDate, TruncTimestamp, DateFormatClass, FromUnixTime, ToUnixTimestamp, UnixDate, UnixSeconds, UnixMillis, UnixMicros, MakeDate, MakeTime, MakeTimestamp, MakeTimestampNTZ, TryMakeTimestamp, MakeInterval, MakeDTInterval, MakeYMInterval, TryMakeInterval, ConvertTimezone, ToUTCTimestamp, FromUTCTimestamp, ParseToDate, ParseToTimestamp, GetTimestamp
Date and time arithmetic:
DateAddInterval, DateAddYMInterval, TimestampAdd, TimestampAddInterval, TimestampAddYMInterval, TimestampDiff, TimeDiff, SubtractDates, SubtractTimes, SubtractTimestamps, DatetimeSub, DivideDTInterval, DivideYMInterval, MultiplyDTInterval, MultiplyYMInterval
Interval extraction:
ExtractIntervalDays, ExtractIntervalHours, ExtractIntervalMinutes, ExtractIntervalSeconds, ExtractIntervalMonths, ExtractIntervalYears, Days, Hours, Minutes, Seconds, Months, Years
Array functions:
CreateArray, GetArrayItem, Size, ArraySize, ArrayContains, ArrayPosition, ArrayAppend, ArrayPrepend, ArrayInsert, ArrayRemove, ArrayRepeat, ArraySort, ArrayMax, ArrayMin, ArrayJoin, ArrayDistinct, ArrayUnion, ArrayIntersect, ArrayExcept, ArraysOverlap, ArraysZip, ArrayCompact, Flatten, Slice, Sequence
Map functions:
CreateMap, GetMapValue, MapSize, MapKeys, MapValues, MapEntries, MapContainsKey, MapConcat, MapFromArrays, MapFromEntries, MapSort, StringToMap
Struct functions:
CreateNamedStruct, GetStructField, GetArrayStructFields, UpdateFields, WithField, DropField
Higher-order functions:
LambdaFunction, ArrayTransform, ArrayFilter, ArrayExists, ArrayForAll, ArrayAggregate, MapTransformKeys, MapTransformValues, MapFilter, MapZipWith
Aggregate functions:
Count, CountIf, Sum, Average, Min, Max, First, Last, CollectList, CollectSet, Corr, CovPopulation, CovSample, Skewness, Kurtosis, StddevPop, StddevSamp, VariancePop, VarianceSamp, Mode, Median, Percentile, PercentileCont, PercentileDisc, ApproximatePercentile, ApproxCountDistinct, AnyValue, BoolAnd, BoolOr, BitAndAgg, BitOrAgg, BitXorAgg, HistogramNumeric, HyperLogLogPlusPlus
Regression functions:
RegrAvgX, RegrAvgY, RegrCount, RegrIntercept, RegrSlope, RegrR2, RegrSXX, RegrSXY, RegrSYY
Window functions:
RowNumber, Rank, DenseRank, PercentRank, CumeDist, Lead, Lag, NthValue, NTile, WindowExpression, WindowSpecDefinition, SpecifiedWindowFrame, UnspecifiedFrame
Grouping functions:
Grouping, GroupingID
Sorting:
SortOrder
Subquery expressions:
ScalarSubquery, InSubquery, Exists, ListQuery, LateralSubquery
Generator expressions:
Explode, PosExplode, Inline, Stack
JSON functions:
JsonTuple, GetJsonObject, JsonObjectKeys, JsonToStructs, StructsToJson, LengthOfJsonArray, SchemaOfJson
CSV functions:
CsvToStructs, StructsToCsv, SchemaOfCsv
Hash functions:
Md5, Sha1, Sha2, Crc32, Murmur3Hash, XxHash64
Bitwise operations:
BitwiseAnd, BitwiseOr, BitwiseXor, BitwiseNot, BitwiseGet, BitwiseCount, ShiftLeft, ShiftRight, ShiftRightUnsigned
URL functions:
ParseUrl, TryParseUrl, UrlDecode, UrlEncode, TryUrlDecode
Data masking:
Mask
Non-deterministic (safe):
Rand, Randn, Uuid, Shuffle, RandStr, Uniform
Error handling:
RaiseError, TryEval, TryAdd, TrySubtract, TryMultiply, TryDivide, TryMod, TryElementAt, AssertTrue
Safe try-conversions:
TryToBinary, TryToNumber, TryUrlDecode, TryParseUrl, TryMakeTimestamp, TryMakeInterval
Utility:
Coalesce, Empty2Null, TypeOf, IsValidUTF8, MakeValidUTF8
Any expression not in this list must cause the query to be rejected. In particular, the following are excluded:
CallMethodViaReflection/TryReflect(arbitrary static Java method invocation viareflect()/java_method()/try_reflect())ScalaUDF(unless the function name is in the Pathling UDF allowlist — see below)PythonUDF,HiveSimpleUDF,HiveGenericUDF,HiveUDAFPrintToStderrStaticInvoke,Invoke,NewInstance(internal reflection expressions)
3. Pathling UDF allowlist
ScalaUDF expressions are permitted only if their function name matches one of Pathling's registered UDFs, all of which are safe read-only operations.
Terminology UDFs (registered by TerminologyUdfRegistrar):
display, member_of, subsumes, designation, translate_coding, property_string, property_code, property_integer, property_boolean, property_decimal, property_dateTime, property_Coding
FHIRPath UDFs (registered by PathlingUdfRegistrar):
to_null, string_to_quantity, quantity_to_literal, decimal_to_literal, coding_to_literal, convert_quantity_to_unit, low_boundary_for_date, high_boundary_for_date, low_boundary_for_time, high_boundary_for_time
Summary of validation layers
- Plan node whitelist — only the listed read-only plan nodes are permitted
- Expression whitelist — only the listed expression types are permitted
- UDF allowlist —
ScalaUDFpermitted only for Pathling's 22 registered UDFs
Output format handling
Supported output formats:
| Format | MIME type |
|---|---|
| NDJSON | application/x-ndjson |
| CSV | text/csv |
| JSON | application/json |
| Parquet | application/vnd.apache.parquet |
Reuse the existing ViewExecutionHelper streaming infrastructure for NDJSON, CSV, and JSON output. Add Parquet support using Spark's built-in Parquet writer, consistent with the existing $viewdefinition-export implementation in ViewDefinitionExportExecutor.
References
Metadata
Metadata
Assignees
Labels
Type
Projects
Status