Skip to content

Commit 09b8710

Browse files
authored
ci: validate internal doc comments match markdown documentation (#32074)
<!-- Describe the contents of the PR briefly but completely. If you write detailed commit messages, it is acceptable to copy/paste them here, or write "see commit messages for details." If there is only one commit in the PR, GitHub will have already added its commit message above. --> ### Motivation This is a follow up PR from my previous work that will ensure the field documentation we publish in our docs is also reflected as comments in the product. Having it as a CI step will ensure the two will stay in sync. <!-- Which of the following best describes the motivation behind this PR? * This PR fixes a recognized bug. [Ensure issue is linked somewhere.] * This PR adds a known-desirable feature. [Ensure issue is linked somewhere.] * This PR fixes a previously unreported bug. [Describe the bug in detail, as if you were filing a bug report.] * This PR adds a feature that has not yet been specified. [Write a brief specification for the feature, including justification for its inclusion in Materialize, as if you were writing the original feature specification.] * This PR refactors existing code. [Describe what was wrong with the existing code, if it is not obvious.] --> ### Tips for reviewer The last commit is pure code gen and is marked as such <!-- Leave some tips for your reviewer, like: * The diff is much smaller if viewed with whitespace hidden. * [Some function/module/file] deserves extra attention. * [Some function/module/file] is pure code movement and only needs a skim. Delete this section if no tips. --> ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post. --------- Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
1 parent a141899 commit 09b8710

File tree

7 files changed

+1379
-1335
lines changed

7 files changed

+1379
-1335
lines changed

ci/test/lint-docs-catalog.py

Lines changed: 36 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,14 @@ class ParserState(Enum):
2323

2424

2525
HEADER_SEPARATOR_RE = re.compile(r"\|?(\s*-+\s*)(\|\s*-+\s*){2}\|?")
26-
TABLE_RE = re.compile(r"(?:\|?[\s`\[\]]*([\w_ ]+)[\s`\[\]]*)")
26+
# Field names are enclosed in backticks
27+
FIELD_NAME_RE = re.compile(r"`(.*)`")
28+
# Field types are enclosed in backticks and optionally in square brackets
29+
FIELD_TYPE_RE = re.compile(r"\[?`(.*)`\]?")
30+
# Documentation links are not preserved in the SQL comments. We capture the
31+
# text of [..](..) and [..][..] type links and keep only the link text.
32+
DOC_LINK_TYPE1_RE = re.compile(r"\[([^\]]+)\]\([^)]+\)")
33+
DOC_LINK_TYPE2_RE = re.compile(r"\[([^\]]+)\]\[[^]]+\]")
2734
RELATION_MARKER_RE = re.compile(r"RELATION_SPEC (\w+)\.(\w+)")
2835
UNDOCUMENTED_RELATION_MARKER = re.compile(r"RELATION_SPEC_UNDOCUMENTED (\w+)\.(\w+)")
2936

@@ -41,19 +48,25 @@ class ParserState(Enum):
4148
4249
mode cockroach
4350
51+
simple conn=mz_system,user=mz_system
52+
ALTER SYSTEM SET unsafe_enable_unstable_dependencies = true
53+
----
54+
COMPLETE 0
55+
4456
statement ok
4557
CREATE VIEW objects AS
4658
SELECT
4759
schema.name AS schema,
4860
objects.name AS object,
4961
columns.position,
5062
columns.name,
51-
columns.type
63+
columns.type,
64+
comments.comment
5265
FROM
53-
mz_catalog.mz_columns AS columns,
54-
mz_catalog.mz_objects AS objects,
55-
mz_catalog.mz_schemas AS schema
56-
WHERE columns.id = objects.id AND objects.schema_id = schema.id
66+
mz_catalog.mz_columns AS columns
67+
JOIN mz_catalog.mz_objects AS objects ON columns.id = objects.id
68+
JOIN mz_catalog.mz_schemas AS schema ON objects.schema_id = schema.id
69+
LEFT JOIN mz_internal.mz_comments AS comments ON columns.id = comments.id AND columns.position = comments.object_sub_id
5770
5871
statement ok
5972
CREATE INDEX objects_idx ON objects(schema, object)
@@ -80,9 +93,9 @@ def main() -> None:
8093
if marker_match:
8194
schema = marker_match.group(1)
8295
object_name = marker_match.group(2)
83-
print("query ITT")
96+
print("query TTT")
8497
print(
85-
f"SELECT position, name, type FROM objects WHERE schema = '{schema}' AND object = '{object_name}' ORDER BY position"
98+
f"SELECT name, type, comment FROM objects WHERE schema = '{schema}' AND object = '{object_name}' ORDER BY position"
8699
)
87100
print("----")
88101
state = ParserState.HEADER
@@ -92,10 +105,19 @@ def main() -> None:
92105
if HEADER_SEPARATOR_RE.match(line):
93106
state = ParserState.FIELDS
94107
elif state == ParserState.FIELDS:
95-
table_match = TABLE_RE.findall(line)
96-
if table_match and len(table_match) >= 2:
97-
field = table_match[0]
98-
type_name = table_match[1]
108+
line = line.strip()
109+
if line.startswith("|"):
110+
line = line[1:]
111+
fields = [field.strip() for field in line.split("|")]
112+
if len(fields) >= 3:
113+
field_match = FIELD_NAME_RE.search(fields[0])
114+
type_match = FIELD_TYPE_RE.search(fields[1])
115+
if not field_match or not type_match:
116+
raise ValueError(f"unexpected field format: {line}")
117+
field = field_match.group(1)
118+
type_name = type_match.group(1)
119+
documentation = DOC_LINK_TYPE1_RE.sub(r"\1", fields[2])
120+
documentation = DOC_LINK_TYPE2_RE.sub(r"\1", documentation)
99121
# We currently cannot determine the type of lists from the catalog.
100122
if type_name == "mz_aclitem array":
101123
type_name = "mz_aclitem[]"
@@ -106,7 +128,8 @@ def main() -> None:
106128
elif "array" in type_name:
107129
type_name = "array"
108130
type_name = type_name.replace(" ", "␠")
109-
print(" ".join([str(position), field, type_name]))
131+
documentation = documentation.replace(" ", "␠")
132+
print(" ".join([field, type_name, documentation]))
110133
position += 1
111134
else:
112135
print()

doc/user/content/sql/system-catalog/mz_internal.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ granted the [`mz_monitor` role](/security/appendix/appendix-built-in-roles/#syst
6161
<!-- RELATION_SPEC mz_internal.mz_recent_activity_log -->
6262
| Field | Type | Meaning |
6363
|----------------------------|------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
64-
| `execution_id` | [`uuid`] | An ID that is unique for each executed statement. |
64+
| `execution_id` | [`uuid`] | An ID that is unique for each executed statement.
6565
| `sample_rate` | [`double precision`] | The actual rate at which the statement was sampled. |
6666
| `cluster_id` | [`text`] | The ID of the cluster the statement execution was directed to. Corresponds to [mz_clusters.id](/sql/system-catalog/mz_catalog/#mz_clusters). |
6767
| `application_name` | [`text`] | The value of the `application_name` configuration parameter at execution time. |
@@ -139,7 +139,7 @@ the most recent status for each AWS PrivateLink connection in the system.
139139
| `id` | [`text`] | The ID of the connection. Corresponds to [`mz_catalog.mz_connections.id`](../mz_catalog#mz_sinks). |
140140
| `name` | [`text`] | The name of the connection. |
141141
| `last_status_change_at` | [`timestamp with time zone`] | Wall-clock timestamp of the connection status change.|
142-
| `status` | [`text`] | | The status of the connection: one of `pending-service-discovery`, `creating-endpoint`, `recreating-endpoint`, `updating-endpoint`, `available`, `deleted`, `deleting`, `expired`, `failed`, `pending`, `pending-acceptance`, `rejected`, or `unknown`. |
142+
| `status` | [`text`] | The status of the connection: one of `pending-service-discovery`, `creating-endpoint`, `recreating-endpoint`, `updating-endpoint`, `available`, `deleted`, `deleting`, `expired`, `failed`, `pending`, `pending-acceptance`, `rejected`, or `unknown`. |
143143

144144
## `mz_cluster_deployment_lineage`
145145

@@ -303,7 +303,7 @@ The `mz_internal_cluster_replicas` table lists the replicas that are created and
303303
<!-- RELATION_SPEC mz_internal.mz_internal_cluster_replicas -->
304304
| Field | Type | Meaning |
305305
|------------|----------|-------------------------------------------------------------------------------------------------------------|
306-
| id | [`text`] | The ID of a cluster replica. Corresponds to [`mz_cluster_replicas.id`](../mz_catalog/#mz_cluster_replicas). |
306+
| `id` | [`text`] | The ID of a cluster replica. Corresponds to [`mz_cluster_replicas.id`](../mz_catalog/#mz_cluster_replicas). |
307307

308308
## `mz_pending_cluster_replicas`
309309

@@ -312,7 +312,7 @@ The `mz_pending_cluster_replicas` table lists the replicas that were created dur
312312
<!-- RELATION_SPEC mz_internal.mz_pending_cluster_replicas -->
313313
| Field | Type | Meaning |
314314
|------------|----------|-------------------------------------------------------------------------------------------------------------|
315-
| id | [`text`] | The ID of a cluster replica. Corresponds to [`mz_cluster_replicas.id`](../mz_catalog/#mz_cluster_replicas). |
315+
| `id` | [`text`] | The ID of a cluster replica. Corresponds to [`mz_cluster_replicas.id`](../mz_catalog/#mz_cluster_replicas). |
316316

317317
## `mz_comments`
318318

doc/user/content/sql/system-catalog/mz_introspection.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -318,15 +318,15 @@ We use the range `[operator_id_start, operator_id_end)` to record this informati
318318
If an LIR node was implemented without any dataflow operators, `operator_id_start` will be equal to `operator_id_end`.
319319

320320
<!-- RELATION_SPEC mz_introspection.mz_lir_mapping -->
321-
| Field | Type | Meaning
322-
| --------- | -------- | -----------
323-
| global_id | [`text`] | The global ID.
324-
| lir_id | [`uint8`] | The LIR node ID.
325-
| operator | [`text`] | The LIR operator, in the format `OperatorName INPUTS [OPTIONS]`.
326-
| parent_lir_id | [`uint8`] | The parent of this LIR node. May be `NULL`.
327-
| nesting | [`uint2`] | The nesting level of this LIR node.
328-
| operator_id_start | [`uint8`] | The first dataflow operator ID implementing this LIR operator (inclusive).
329-
| operator_id_end | [`uint8`] | The first dataflow operator ID _after_ this LIR operator (exclusive).
321+
| Field | Type | Meaning
322+
| --------- | -------- | -----------
323+
| `global_id` | [`text`] | The global ID.
324+
| `lir_id` | [`uint8`] | The LIR node ID.
325+
| `operator` | [`text`] | The LIR operator, in the format `OperatorName INPUTS [OPTIONS]`.
326+
| `parent_lir_id` | [`uint8`] | The parent of this LIR node. May be `NULL`.
327+
| `nesting` | [`uint2`] | The nesting level of this LIR node.
328+
| `operator_id_start` | [`uint8`] | The first dataflow operator ID implementing this LIR operator (inclusive).
329+
| `operator_id_end` | [`uint8`] | The first dataflow operator ID _after_ this LIR operator (exclusive).
330330

331331
<!-- RELATION_SPEC_UNDOCUMENTED mz_introspection.mz_compute_lir_mapping_per_worker -->
332332

0 commit comments

Comments
 (0)