Skip to content

Commit 3669dfb

Browse files
committed
Proposal to specify JSON dump in GHC
1 parent a797b0d commit 3669dfb

File tree

2 files changed

+264
-0
lines changed

2 files changed

+264
-0
lines changed

proposals/0000-specify-dump-json.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# RFC - Specify GHC's JSON Diagnostic Dump
2+
3+
## Abstract
4+
5+
GHC is currently undergoing development GHC is currently undergoing a long scale project to [move to a more structured error representation](https://gitlab.haskell.org/ghc/ghc/-/issues/18516).
6+
In harmony with this motion, there is a [desire for a JSON dump of GHC's diagnostic messages](https://gitlab.haskell.org/ghc/ghc/-/issues/19278).
7+
Such an implementation would facilitate easier consumption for downstream consumers of GHC's diagnostic messages (e.g. IDE developers).
8+
To best enforce a standardized JSON output, a [JSON schema](https://json-schema.org/) can be used.
9+
10+
The purpose of this RFC is to open up the discussion to the broader Haskell community about:
11+
12+
- the desire for such a feature
13+
- the requested fields present in such a JSON output
14+
- opinions on the draft JSON schema (or the usage of JSON schemas at all)
15+
- any other general feedback
16+
17+
## Background
18+
19+
In the prior implementation of GHC, errors were passed around as `SDoc`s (for Structured Documents).
20+
These `SDoc`s are simply a text encoding of the errors with some formatting features, such as nice alignment.
21+
By transforming the errors into `SDoc`s at the moment of creation, a lot of type-level information is lost,
22+
causing a headache for GHC developers and GHC API users.
23+
To improve on this problem, the [journey to represent errors as structured values has begun](https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values ).
24+
In summary, this means to pass errors around as a specific type which contains the relevant information separated into different fields,
25+
rather than combining it all into one single `SDoc`.
26+
27+
Another change that has been made is the introduction of [error codes](https://gitlab.haskell.org/ghc/ghc/-/issues/21684) to uniquely identify each possible error output from GHC.
28+
29+
These features together serve to make error consumption a breeze for GHC API users, but what about those that would like to consume errors without utilizing GHC's API?
30+
31+
## Problem Statement
32+
33+
The fundamental issue is that the structured errors are only presented to users who leverage GHC as a library in their application. If instead, you desire to consume error messages without tapping into the internals of GHC, your best bet is to parse the plain text errors printed to stdout by GHC. This poses two important problems.
34+
35+
The first is that this is an _unnecessary_ pain. The errors are now passed as structured values internally, but the users are given a fairly unstructured value to parse themselves, leading to the requirement of always having to write a parser whenever consumption is desired.
36+
37+
The second problem is that this plain text output is subject to change. Messages are often changed to make the content clearer, but each change results bubbles into a new requirement for the hand-written plain-text parser of error messages. This problem is mitigated a bit by the above-mentioned new inclusion of error codes, but even these error codes must be parsed from a non-standard structured output.
38+
39+
The requirements to solve this problem are to enable the consumption of GHC's diagnostic message in a structured representation that closely matches the internal representation without requiring users to utilize the internals of GHC. The most reasonable solution is to use a compiler flag to dump the structured messages into a versioned JSON output, which is described by a JSON Schema, to standardize the expected output. The hope is that the community can reach a general consensus on the requirements imposed by such a JSON schema. While it will not be too costly to change the JSON schema as new requirements come up (simply increment the schema version), the ideal situation is to present here an opportunity to provide input on the current schema and receive constructive criticism. The conversation is also open on whether or not a JSON schema is really required.
40+
41+
## Prior Art and Related Efforts
42+
43+
There is a currently existing GHC flag, `-ddump-json`, which is under-specified and thus cannot be depended on. Here is an example (prettified) output of the current flag:
44+
```json
45+
{
46+
"span": null,
47+
"doc": "[1 of 1] Compiling ShouldFail ( testsuite/tests/typecheck/should_fail/T2414.hs, testsuite/tests/typecheck/should_fail/T2414.o )",
48+
"messageClass": "MCOutput"
49+
}
50+
{
51+
"span": {
52+
"file": "testsuite/tests/typecheck/should_fail/T2414.hs",
53+
"startLine": 9,
54+
"startCol": 13,
55+
"endLine": 9,
56+
"endCol": 17
57+
},
58+
"doc": "• Couldn't match type ‘b0’ with ‘(Bool, b0)’\n Expected: b0 -> Maybe (Bool, b0)\n Actual: b0 -> Maybe b0\n• In the first argument of ‘unfoldr’, namely ‘Just’\n In the expression: unfoldr Just\n In an equation for ‘f’: f = unfoldr Just",
59+
"messageClass": "MCDiagnostic SevError ErrorWithoutFlag Just GHC-27958"
60+
}
61+
```
62+
63+
Just on the surface, the first object is of a different kind than the second, they are not wrapped in array, and the second object lacks crucial information like error codes. And what is messageClass? These fields can and should be standardized to most authentically express the diagnostics in a way that makes them understandable and consumable.
64+
65+
The extra `d` in `-ddump-json` indicates that this is a developer flag, but our aim is to make this a first class feature.
66+
67+
The aim is to create a JSON interface for GHC diagnostics, standardized by a JSON schema. Then, the `-ddump-flag` flag will be replaced by the first-class `-dump-flag` flag, and subsequently, whatever implementation changes are necessary to make the new flag conform to the new specification will be performed.
68+
69+
The past attempt to not succeed simply due to lack of effort and resources devoted to it, as well as a lack of structured error representations. Now that structured errors are pervasive within GHC, such a JSON dump is both feasible and practical.
70+
71+
## Technical Content
72+
73+
The most important component of the technical content is the JSON schema itself. The JSON schema is not reproduced here, as it is quite long. Instead, it can be found in the directory of the same name, with title `schema.json`. To demonstrate the appropriateness of the proposed JSON schema, the internal error representation is produced below. Comments removed for brevity.
74+
```haskell
75+
-- compiler/GHC/Types/Error.hs
76+
data MsgEnvelope e = MsgEnvelope
77+
{ errMsgSpan :: SrcSpan
78+
, errMsgContext :: NamePprCtx
79+
, errMsgDiagnostic :: e
80+
, errMsgSeverity :: Severity
81+
} deriving (Functor, Foldable, Traversable)
82+
-- compiler/GHC/Types/Error.hs
83+
class (HasDefaultDiagnosticOpts (DiagnosticOpts a)) => Diagnostic a where
84+
type DiagnosticOpts a
85+
diagnosticMessage :: DiagnosticOpts a -> a -> DecoratedSDoc
86+
diagnosticReason :: a -> DiagnosticReason
87+
88+
-- | Extract any hints a user might use to repair their
89+
-- code to avoid this diagnostic.
90+
diagnosticHints :: a -> [GhcHint]
91+
diagnosticCode :: a -> Maybe DiagnosticCode
92+
```
93+
All of the above is explained in the [wiki](https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values).
94+
95+
The output of the JSON dump would consist of a list of errors, as well as a top level version. Each error contains the following fields:
96+
- `"span"` corresponding to the above `SrcSpan`
97+
- This contains sub-fields, specified in the schema, which contains line numbers, etc.
98+
- `"severity"` corresponding to the above `Severity`
99+
- `"code"` corresponding to the change in this [proposal](https://github.com/haskellfoundation/tech-proposals/blob/main/proposals/accepted/024-error-messages.md) and the above `DiagnosticCode` in typeclass `Diagnostic`
100+
- `"hints"` corresponding to the above `[GhcHint]`
101+
- This is perhaps one that remains a bit of a question mark. To what degree should this be specified? There are quite a few possible constructors, and so it may be overly tedious to specify every single one as a possible output in the JSON schema. On the other hand, it may be insufficient to allow arbitrary JSON to be put forward. Maybe a reasonable compromise is to give back the name of the constructor followed by a list of the constructors arguments.
102+
- `"message"` corresponding to `DecoratedSDoc` of `diagnosticMessage`
103+
- `"reason"` corresponding to `diagnosticReason`
104+
105+
The top level version is crucial for indicating to downstream consumers which JSON schema they must comply with in the case that the schema is updated in the future (which is quite likely).
106+
107+
For demonstrative purposes, here is an example valid instance of the schema.
108+
```json
109+
{
110+
"version": "1.0.0",
111+
"diagnostics": [
112+
{
113+
"span": {
114+
"file": "typecheck/should_fail/T2414.hs",
115+
"startLine": 9,
116+
"startCol": 13,
117+
"endLine": 9,
118+
"endCol": 17
119+
},
120+
"severity": "Error",
121+
"code": 27958,
122+
"message": " • Couldn't match type ‘b0’ with ‘(Bool, b0)’ \n Expected: b0 -> Maybe (Bool, b0) \nActual: b0 -> Maybe b0 \n• In the first argument of ‘unfoldr’, namely ‘Just’ \nIn the expression: unfoldr Just \nIn an equation for ‘f’: f = unfoldr Just",
123+
"warnReason": {
124+
"reason": "ErrorWithoutFlag"
125+
}
126+
}
127+
]
128+
}
129+
```
130+
131+
The schema itself will be brought into version control of the GHC repo and tests will be added to ensure that GHC-dump complies with the JSON schema. This will ensure that changes to the diagnostics will result in appropriate changes to the schema. In addition, a wiki page will be created to fill in the gaps for anyone needing to understand the project. Otherwise, I will be available to perform maintenance.
132+
133+
One of the major benefits of utilizing a JSON schema is that the expected JSON payload can be well-defined for consumers of the messages. This has massive benefits, as consumers can presume the structure of the output without having to analyze the contents for the presence or absence of particular bits of data. However, one drawback may be the over specification of the output. There may be some opportunities in which flexibility is a benefit for the output. However, this schema can be adapted as further feedback rolls in (with incremented version), making the good net outweigh the bad.
134+
135+
In addition to adding a `-dump-json` flag, it may also prove useful to provide a `-dump-json-schema` flag which simply produces the relevant JSON schema for that particular version of GHC. This I leave open for discussion. Provided that the schema is in an easy to find location, it may be overkill.
136+
137+
The schmema evolution process is currently undetermined, though I imagine that due to the infrequency with which the schema will need to be changed, it can be handled on a case-by-base basis. Though keeping a running list of all relevant stakeholders that may need to be informed could be a good idea.
138+
139+
## Stakeholders
140+
141+
Stakeholders consist of anyone that desires to consume a JSON output of GHC diagnostics without leveraging the GHC API. Some possible stakeholders are listed explicitly [here](https://gitlab.haskell.org/ghc/ghc/-/issues/19278#note_503994). These are:
142+
- Chris Smith of https://code.world
143+
- Joseph Sumabat
144+
145+
Joseph Sumabat replied to my emails and his input has been incorporated into the included schema. I believe the Haskell community would benefit as a whole from improved tooling, and this effort will help in this regard by making an easy to consume representation of GHC's diagnostics.
146+
147+
## Success
148+
149+
I expect the implementation to take no more than 2 or so months. The project will be considered a success when an appropriate JSON dump flag is shipped with GHC which serves the purpose of providing a JSON structured representation of error messages which complies with a JSON schema in version control.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"title": "Dump JSON",
4+
"description": "JSON dump of the GHC compiler diagnostics output",
5+
"type": "object",
6+
"properties": {
7+
"version": {
8+
"description": "The current JSON schema version of the JSON dump",
9+
"type": "string"
10+
},
11+
"diagnostics": {
12+
"description": "The list of diagnostics produced by GHC",
13+
"type": "array",
14+
"items": {
15+
"properties": {
16+
"span": {
17+
"$ref": "#/$defs/span"
18+
},
19+
"severity": {
20+
"description": "The diagnostic severity",
21+
"type": "string",
22+
"enum": [
23+
"Warning",
24+
"Error"
25+
]
26+
},
27+
"code": {
28+
"description": "The diagnostic code (if it exists)",
29+
"type": [
30+
"integer",
31+
"null"
32+
]
33+
},
34+
"hints": {
35+
"description": "The hints suggested by GHC",
36+
"type": "array"
37+
},
38+
"message": {
39+
"description": "The string output of the diagnostic message by GHC",
40+
"type": "string"
41+
},
42+
"warnReason": {
43+
"description": "The flag, if it exists, which caused the warning",
44+
"type": "object",
45+
"properties": {
46+
"reason": {
47+
"description": "The reason why a diagnostic was emitted in the first place (e.g., flag, category)",
48+
"type": "string",
49+
"enum": [
50+
"WarningWithoutFlag",
51+
"WarningWithFlag",
52+
"WarningWithCategory",
53+
"ErrorWithoutFlag"
54+
]
55+
},
56+
"flagOrCategory": {
57+
"desciption": "The flag or category which caused the warning",
58+
"type": "string",
59+
}
60+
},
61+
"required": [
62+
"reason"
63+
]
64+
}
65+
},
66+
"required": [
67+
"span",
68+
"severity",
69+
"message"
70+
]
71+
}
72+
}
73+
},
74+
"required": [
75+
"version",
76+
"diagnostics"
77+
],
78+
"additionalProperties": false,
79+
"$defs": {
80+
"span": {
81+
"description": "The location of the diganostic",
82+
"type": "object",
83+
"properties": {
84+
"file": {
85+
"description": "The file in which the diagnostic occurs",
86+
"type": "string"
87+
},
88+
"startLine": {
89+
"description": "The start line of the diagnostic",
90+
"type": "integer"
91+
},
92+
"startCol": {
93+
"description": "The start column of the diagnostics",
94+
"type": "integer"
95+
},
96+
"endLine": {
97+
"description": "The end line of the diagnostics",
98+
"type": "integer"
99+
},
100+
"endCol": {
101+
"description": "The end column of the diagnostic",
102+
"type": "integer"
103+
}
104+
},
105+
"required": [
106+
"file",
107+
"startLine",
108+
"startCol",
109+
"endLine",
110+
"endCol"
111+
],
112+
"additionalProperties": false
113+
}
114+
}
115+
}

0 commit comments

Comments
 (0)