Skip to content

Commit 13a5f0a

Browse files
authored
YAML source location handling (#340) (#352)
* YAML source location handling (#340) Example describing how to retain YAML source file location information in processed and validated JSON tree Fixes #340 * Fixed typos * Woops - more typos!
1 parent 94dcb57 commit 13a5f0a

File tree

1 file changed

+303
-0
lines changed

1 file changed

+303
-0
lines changed

doc/yaml-line-numbers.md

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
# Obtaining YAML Line Numbers
2+
3+
## Scenario 1 - finding YAML line numbers from the JSON tree
4+
5+
A great feature of json-schema-validator is it's ability to validate YAML documents against a JSON Scheme. The manner in which this is done though, by pre-processing the YAML into a tree of [JsonNode](https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/JsonNode.html) objects, breaks the connection back to the original YAML source file. Very commonly, once the YAML has been validated against the schema, there may be additional processing and checking for semantic or content errors or inconsistency in the JSON tree. From an end user point of view, the ideal is to report such errors using line and column references back to the original YAML, but this information is not readily available from the processed JSON tree.
6+
7+
### Scenario 1, solution part 1 - capturing line details during initial parsing
8+
9+
One solution is to use a custom [JsonNodeFactory](https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/node/JsonNodeFactory.html) that returns custom JsonNode objects which are created during initial parsing, and which record the original YAML locations that were being parsed at the time they were created. The example below shows this
10+
11+
```
12+
public static class MyNodeFactory extends JsonNodeFactory
13+
{
14+
YAMLParser yp;
15+
16+
public MyNodeFactory(YAMLParser yp)
17+
{
18+
super();
19+
this.yp = yp;
20+
}
21+
22+
public ArrayNode arrayNode()
23+
{
24+
return new MyArrayNode(this, yp.getTokenLocation(), yp.getCurrentLocation());
25+
}
26+
27+
public BooleanNode booleanNode(boolean v)
28+
{
29+
return new MyBooleanNode(v, yp.getTokenLocation(), yp.getCurrentLocation());
30+
}
31+
32+
public NumericNode numberNode(int v)
33+
{
34+
return new MyIntNode(v, yp.getTokenLocation(), yp.getCurrentLocation());
35+
}
36+
37+
public NullNode nullNode()
38+
{
39+
return new MyNullNode(yp.getTokenLocation(), yp.getCurrentLocation());
40+
}
41+
42+
public ObjectNode objectNode()
43+
{
44+
return new MyObjectNode(this, yp.getTokenLocation(), yp.getCurrentLocation());
45+
}
46+
47+
public TextNode textNode(String text)
48+
{
49+
return (text != null) ? new MyTextNode(text, yp.getTokenLocation(), yp.getCurrentLocation()) : null;
50+
}
51+
}
52+
```
53+
54+
The example above includes a basic, but usable subset of all possible JsonNode types - if your YAML needs them, than you should also consider the others i.e. `byte`, `byte[]`, `raw`, `short`, `long`, `float`, `double`, `BigInteger`, `BigDecimal`
55+
56+
There are some important other things to note from the example:
57+
58+
* Even in a reduced set, `ObjectNode` and `NullNode` should be included
59+
* The current return for methods that receive a null parameter value seems to be null rather than `NullNode` (based on inspecting the underlying `valueOf()` methods in the various `JsonNode` sub classes). Hence the implementation of the `textNode()` method above.
60+
61+
The actual work here is really being done by the YAMLParser - it holds the location of the token being parsed, and the current location in the file. The first of these gives us a line and column number we can use to flag where an error or problem was found, and the second (if needed) can let us calculate a span to the end of the error e.g. if we wanted to highlight or underline the text in error.
62+
63+
### Scenario 1, solution part 2 - augmented `JsonNode` subclassess
64+
65+
We can be as simple or fancy as we like in the `JsonNode` subclassses, but basically we need 2 pieces of information from them:
66+
67+
* An interface so when we are post processing the JSON tree, we can recognize nodes that retain line number information
68+
* An interface that lets us extract the relevant location information
69+
70+
Those could be the same thing of course, but in our case we separated them as shown in the following example
71+
72+
```
73+
public interface LocationProvider
74+
{
75+
LocationDetails getLocationDetails();
76+
}
77+
78+
public interface LocationDetails
79+
{
80+
default int getLineNumber() { return 1; }
81+
default int getColumnNumber() { return 1; }
82+
default String getFilename() { return ""; }
83+
}
84+
85+
public static class LocationDetailsImpl implements LocationDetails
86+
{
87+
final JsonLocation currentLocation;
88+
final JsonLocation tokenLocation;
89+
90+
public LocationDetailsImpl(JsonLocation tokenLocation, JsonLocation currentLocation)
91+
{
92+
this.tokenLocation = tokenLocation;
93+
this.currentLocation = currentLocation;
94+
}
95+
96+
@Override
97+
public int getLineNumber() { return (tokenLocation != null) ? tokenLocation.getLineNr() : 1; };
98+
@Override
99+
public int getColumnNumber() { return (tokenLocation != null) ? tokenLocation.getColumnNr() : 1; };
100+
@Override
101+
public String getFilename() { return (tokenLocation != null) ? tokenLocation.getSourceRef().toString() : ""; };
102+
}
103+
104+
public static class MyNullNode extends NullNode implements LocationProvider
105+
{
106+
final LocationDetails locDetails;
107+
108+
public MyNullNode(JsonLocation tokenLocation, JsonLocation currentLocation)
109+
{
110+
super();
111+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
112+
}
113+
114+
@Override
115+
public LocationDetails getLocationDetails()
116+
{
117+
return locDetails;
118+
}
119+
}
120+
121+
public static class MyTextNode extends TextNode implements LocationProvider
122+
{
123+
final LocationDetails locDetails;
124+
125+
public MyTextNode(String v, JsonLocation tokenLocation, JsonLocation currentLocation)
126+
{
127+
super(v);
128+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
129+
}
130+
131+
@Override
132+
public LocationDetails getLocationDetails() { return locDetails;}
133+
}
134+
135+
public static class MyIntNode extends IntNode implements LocationProvider
136+
{
137+
final LocationDetails locDetails;
138+
139+
public MyIntNode(int v, JsonLocation tokenLocation, JsonLocation currentLocation)
140+
{
141+
super(v);
142+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
143+
}
144+
145+
@Override
146+
public LocationDetails getLocationDetails() { return locDetails;}
147+
}
148+
149+
public static class MyBooleanNode extends BooleanNode implements LocationProvider
150+
{
151+
final LocationDetails locDetails;
152+
153+
public MyBooleanNode(boolean v, JsonLocation tokenLocation, JsonLocation currentLocation)
154+
{
155+
super(v);
156+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
157+
}
158+
159+
@Override
160+
public LocationDetails getLocationDetails() { return locDetails;}
161+
}
162+
163+
public static class MyArrayNode extends ArrayNode implements LocationProvider
164+
{
165+
final LocationDetails locDetails;
166+
167+
public MyArrayNode(JsonNodeFactory nc, JsonLocation tokenLocation, JsonLocation currentLocation)
168+
{
169+
super(nc);
170+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
171+
}
172+
173+
@Override
174+
public LocationDetails getLocationDetails() { return locDetails;}
175+
}
176+
177+
public static class MyObjectNode extends ObjectNode implements LocationProvider
178+
{
179+
final LocationDetails locDetails;
180+
181+
public MyObjectNode(JsonNodeFactory nc, JsonLocation tokenLocation, JsonLocation currentLocation)
182+
{
183+
super(nc);
184+
locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
185+
}
186+
187+
@Override
188+
public LocationDetails getLocationDetails() { return locDetails;}
189+
}
190+
```
191+
192+
### Scenario 1, solution part 3 - using the custom `JsonNodeFactory`
193+
194+
With the pieces we now have, we just need to tell the YAML library to make of use them, which involves a minor and simple modification to the normal sequence of processing.
195+
196+
```
197+
this.yamlFactory = new YAMLFactory();
198+
199+
try (YAMLParser yp = yamlFactory.createParser(f);)
200+
{
201+
ObjectReader rdr = mapper.reader(new MyNodeFactory(yp));
202+
JsonNode jsonNode = rdr.readTree(yp);
203+
Set<ValidationMessage> msgs = mySchema.validate(jsonNode);
204+
205+
if (msgs.isEmpty())
206+
{
207+
for (JsonNode item : jsonNode.get("someItem"))
208+
{
209+
processJsonItems(item);
210+
}
211+
}
212+
else
213+
{
214+
// ... we'll look at how to get line locations for ValidationMessage cases in Scenario 2
215+
}
216+
217+
}
218+
// a JsonProcessingException seems to be the base exception for "gross" errors e.g.
219+
// missing quotes at end of string etc.
220+
catch (JsonProcessingException jpEx)
221+
{
222+
JsonLocation loc = jpEx.getLocation();
223+
// ... do something with the loc details
224+
}
225+
```
226+
Some notes on what is happening here:
227+
228+
* We instantiate our custom JsonNodeFactory with the YAMLParser reference, and the line locations get recorded for us as the file is parsed.
229+
* If any exceptions are thrown, they will already contain a JsonLocation object that we can use directly if needed
230+
* If we get no validation messages, we know the JSON tree matches the schema and we can do any post processing we need on the tree. We'll see how to report any issues with this in the next part
231+
* We'll look at how to get line locations for ValidationMessage errors in Scenario 2
232+
233+
### Scenario 1, solution part 4 - extracting the line details
234+
235+
Having got everything prepared, actually getting the line locations is rather easy
236+
237+
238+
```
239+
void processJsonItems(JsonNode item)
240+
{
241+
Iterator<Map.Entry<String, JsonNode>> iter = item.fields();
242+
243+
while (iter.hasNext())
244+
{
245+
Map.Entry<String, JsonNode> node = iter.next();
246+
extractErrorLocation(node.getValue());
247+
}
248+
}
249+
250+
void extractErrorLocation(JsonNode node)
251+
{
252+
if (node == null || !(node instanceof LocationProvider)) { return; }
253+
254+
//Note: we also know the "span" of the error section i.e. from token location to current location (first char after the token)
255+
// if we wanted at some stage we could use this to highlight/underline all of the text in error
256+
LocationDetails dets = ((LocationProvider) node).getLocationDetails();
257+
// ... do something with the details e.g. report an error/issue against the YAML line
258+
}
259+
```
260+
261+
So that's pretty much it - as we are processing the JSON tree, if there is any point we want to report something about the contents, we can do so with a reference back to the original YAML line number.
262+
263+
There is still a problem though, what if the validation against the schema fails?
264+
265+
## Scenario 2 - ValidationMessage line locations
266+
267+
Any failures validation against the schema come back in the form of a set of `ValidationMessage` objects. But these also do not contain original YAML source line information, and there's no easy way to inject it as we did for Scenario 1. Luckily though, there is a trick we can use here!
268+
269+
Within the `ValidationMessage` object is something called the 'path' of the error, which we can access with the `getPath()` method. The syntax of this path is not exactly the same as a regular [JsonPointer](https://fasterxml.github.io/jackson-core/javadoc/2.10/com/fasterxml/jackson/core/JsonPointer.html) object, but it is sufficiently close as to be convertible. And, once converted, we can use that pointer for locating the appropriate `JsonNode`. The following couple of methods can be used to automate this process
270+
271+
```
272+
JsonNode findJsonNode(ValidationMessage msg, JsonNode rootNode)
273+
{
274+
// munge the ValidationMessage path
275+
String pathStr = StringUtils.replace(msg.getPath(), "$.", "/", 1);
276+
pathStr = StringUtils.replace(pathStr, ".", "/");
277+
pathStr = StringUtils.replace(pathStr, "[", "/");
278+
pathStr = StringUtils.replace(pathStr, "]", ""); // array closure superfluous
279+
JsonPointer pathPtr = JsonPointer.valueOf(pathStr);
280+
// Now see if we can find the node
281+
JsonNode node = rootNode.at(pathPtr);
282+
return node;
283+
}
284+
285+
LocationDetails getLocationDetails(ValidationMessage msg, JsonNode rootNode)
286+
{
287+
LocationDetails retval = null;
288+
JsonNode node = findJsonNode(msg, rootNode);
289+
if (node != null && node instanceof LocationProvider)
290+
{
291+
retval = ((LocationProvider) node).getLocationDetails();
292+
}
293+
return retval;
294+
}
295+
```
296+
297+
## Summary
298+
299+
Although not trivial, the steps outlined here give us a way to track back to the original source YAML for a variety of possible reporting cases:
300+
301+
* JSON processing exceptions (mostly already done for us)
302+
* Issues flagged during validation of the YAML against the schema
303+
* Anything we need to report with source information during post processing of the validated JSON tree

0 commit comments

Comments
 (0)