Skip to content

Commit 46420de

Browse files
authored
Merge pull request #670 from jvalue/rfc17-error-values
[RFC] 0017 error values
2 parents 41caa26 + f53e3cf commit 46420de

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

rfc/0017-error-values/README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
<!--
2+
SPDX-FileCopyrightText: 2025 Friedrich-Alexander-Universitat Erlangen-Nurnberg
3+
4+
SPDX-License-Identifier: AGPL-3.0-only
5+
-->
6+
7+
# RFC 0017 : Error Values
8+
9+
| | |
10+
|---|---|
11+
| Feature Tag | `error-values` |
12+
| Status | `ACCEPTED` | <!-- Possible values: DRAFT, DISCUSSION, ACCEPTED, REJECTED -->
13+
| Responsible | `tungstnballon` |
14+
<!--
15+
Status Overview:
16+
- DRAFT: The RFC is not ready for a review and currently under change. Feel free to already ask for feedback on the structure and contents at this stage.
17+
- DISCUSSION: The RFC is open for discussion. Usually, we open a PR to trigger discussions.
18+
- ACCEPTED: The RFC was accepted. Create issues to prepare implementation of the RFC.
19+
- REJECTED: The RFC was rejected. If another revision emerges, switch to status DRAFT.
20+
-->
21+
22+
## Summary
23+
24+
This RFC introduces the concept of invalid or missing values to the Jayvee interpreter.
25+
Specifically, it defines two new special values that the Jayvee interpreter must be able to handle.
26+
27+
## Motivation
28+
29+
Currently, there are two behaviors when an error occurs during pipeline
30+
execution.
31+
1. Terminate pipeline execution. This is the case for almost all errors that can
32+
happen outside table processing.
33+
2. Discard. If an error occurs during table processing, the affected row is
34+
discarded and the user is notified via a log message.
35+
36+
These behaviors make the following impossible:
37+
- Exporting tables that contain `NULL`
38+
- Gracefully recovering from calculation errors occurring during transforms,
39+
instead of discarding the entire row.
40+
41+
## Explanation
42+
43+
This RFC introduces two error values:
44+
- `invalid`: Represents an invalid value.
45+
- `missing`: Represents a missing value.
46+
47+
This distinction is made in order to allow both users and the interpreter more
48+
fine grained control.
49+
50+
For now, these values are valid for the valuetypes `text`, `boolean`, `integer` and `decimal` (see
51+
[Possible Future Changes/Enhancements](#possible-future-changesenhancements)).
52+
53+
When a value becomes `invalid` or `missing` for the first time, the user is
54+
notified with a log message. This message must contain the reason for the
55+
error's occurrence.
56+
57+
This RFC also introduces two keywords `invalid` and `missing`, which represent
58+
the corresponding error value.
59+
For their usage see [Operator Interactions](#operator-interactions)
60+
61+
### invalid
62+
63+
This error's primary use case is to represent an erroneous calculation result
64+
(e.g. Division by zero). It is intended to be used by operator evaluators.
65+
66+
Additionally, parsers can represent a failed value parse using `invalid` (e.g.
67+
when attempting to parse a number but encountering a letter).
68+
69+
### missing
70+
71+
Can be emitted when "something" that should exist doesn't (e.g. a file / a table
72+
column).
73+
SQL exporters can now replace any table cell containing `missing` with `NULL`.
74+
However, when parsing CSV, empty cells will be parsed as `""` not `missing`.
75+
76+
### Operator interactions
77+
78+
#### `invalid` and `missing` as keywords
79+
80+
The new keywords, `invalid` and `missing`, may be used as a parameter for the `==` and `!=` operators to check for (in)equality.
81+
82+
#### `invalid` and `missing` as values
83+
84+
- Unary operators:
85+
- If the parameter is `invalid`, the result is `invalid`
86+
- If the parameter is `missing`, the result is `missing`
87+
- The following operations evaluate to `invalid`:
88+
- Square root of a negative number
89+
- Parsing failure for `asDecimal`, `asInteger` or `asBoolean`
90+
91+
- Binary operators:
92+
- The following operations always evaluate to `invalid`:
93+
- Division by zero
94+
- Root of a negative number
95+
- 0th root of a number
96+
- Number modulo zero
97+
- If at least one of the parameters is `invalid`, the result is `invalid`
98+
- If at least one of the parameters is `missing` and no parameter is `invalid`, the result is `missing`
99+
100+
- Ternary operators:
101+
- If at least one of the parameters is `invalid`, the result is `invalid`
102+
- If at least one of the parameters is `missing` and no parameter is `invalid`, the result is `missing`
103+
104+
### Example
105+
106+
data.csv:
107+
```csv
108+
column
109+
3
110+
```
111+
112+
pipeline.jv:
113+
```
114+
pipeline CarsPipeline {
115+
Extractor
116+
-> ToTextFile
117+
-> ToCSV
118+
-> ToTable
119+
-> ProduceInvalid
120+
-> ToSQLite;
121+
122+
block Extractor oftype LocalFileExtractor {
123+
filePath: "data.csv";
124+
}
125+
block ToTextFile oftype TextFileInterpreter {}
126+
block ToCSV oftype CSVInterpreter {}
127+
block ToTable oftype TableInterpreter {
128+
header: true;
129+
columns: [
130+
"column" oftype integer,
131+
];
132+
}
133+
/*
134+
| column |
135+
|--------|
136+
| 3 |
137+
*/
138+
transform divideByZero {
139+
from num oftype integer;
140+
to out oftype decimal;
141+
142+
// This calculation results in `invalid`
143+
out: num / 0;
144+
}
145+
block ProduceInvalid oftype TableTransformer {
146+
inputColumns: ["column"];
147+
outputColumn: "result";
148+
uses: divideByZero;
149+
}
150+
/*
151+
| column | result |
152+
|--------|---------|
153+
| 3 | invalid |
154+
*/
155+
// `invalid` is replaced with `NULL`
156+
block ToSQLite oftype SQLiteLoader {
157+
table: "Table";
158+
file: "data.sqlite";
159+
}
160+
}
161+
```
162+
163+
data.sqlite:
164+
| column | result |
165+
|--------|--------|
166+
| 3 | NULL |
167+
168+
## Drawbacks
169+
170+
## Alternatives
171+
172+
- One single `ERROR`.
173+
- More fine-grained errors. E.g `ParsingError`, `DivisionByZeroError`,
174+
`EmptyCellError`
175+
176+
## Possible Future Changes/Enhancements
177+
178+
- Introduce a ternary `if then else` operator to allow users to handle `invalid` and `missing` values
179+
- Type unions to express a value can be `number | invalid | missing`
180+
- Add context to errors (reason, location)
181+
- Change the syntax to define which blocks can throw which errors. This may lead
182+
to more generic error handling (try/catch)
183+

0 commit comments

Comments
 (0)