Skip to content

Commit 468b57b

Browse files
authored
Add anchors for web developers to dfns extracts (#1875)
For context, see discussion starting at: mdn/browser-compat-data#23958 (comment) Some specs such as DOM, Encoding, HTML contain sections targeted at web developers. These sections re-define terms normatively defined elsewhere in a more developer-friendly way. Terms re-defined in these sections are good targets for documentation but did not appear in definitions extracts. This update makes Reffy parse "for web developers" sections and extract the links that complete definitions they contain. This is a prerequisite to publishing a package with definitions that could be used to validate URLs in BCD and web-features, as envisioned in: w3c/webref#1198 (comment) The links are recorded in a `links` property attached to the base definition that the link completes. The `links` property is an array of objects, each object featuring `id`, `href`, `type`, `name` and `heading` properties. The `type` property is always set to `"dev"`. The `name` property contains the text content of the enclosing `<dt>`. The `heading` property contains the heading of the section where the anchor is defined (it may be different from the heading of the section where the underlying definition appears). There may be more than one dev link per definition. That's normal. It typically happens when the underlying definition is for a mixin included in multiple interfaces, as for `TextDecoderCommon` attributes in the Encoding spec. Some links for developers target definitions in external specs. They are ignored for now. Worth noting: - Ideally, spec authoring tools would provide better support for this pattern, giving these links more stable IDs than `ref-for-[foo][number]` and possibly creating proper dfns themselves. If they do that, processing may need to be adjusted. Updating tools and specs will take time though. - The key marker for sections targeted at web developers is the use of a `domintro` class. Now, a few specs do use `domintro` in normative definition lists (shape-detection-api, image-capture, mediastream-recording). That's probably unintentional. I'll look into fixing the specs. The code skips `domintro` sections that look suspicious. - This would add **2815 links** to the dfns extracts (for ~50000 definitions)
1 parent 0036e84 commit 468b57b

File tree

6 files changed

+175
-27
lines changed

6 files changed

+175
-27
lines changed

schemas/browserlib/extract-dfns.json

Lines changed: 31 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,21 @@
22
"$schema": "http://json-schema.org/schema#",
33
"$id": "https://github.com/w3c/reffy/blob/main/schemas/browserlib/extract-dfns.json",
44

5+
"$defs": {
6+
"heading": {
7+
"type": "object",
8+
"additionalProperties": false,
9+
"required": ["href", "title"],
10+
"properties": {
11+
"id": { "$ref": "../common.json#/$defs/id" },
12+
"href": { "$ref": "../common.json#/$defs/url" },
13+
"title": { "type": "string" },
14+
"number": { "$ref": "../common.json#/$defs/headingNumber" },
15+
"alternateIds": { "type": "array", "items": { "$ref": "../common.json#/$defs/id"} }
16+
}
17+
}
18+
},
19+
520
"type": "array",
621
"items": {
722
"type": "object",
@@ -24,19 +39,14 @@
2439
"enum": [
2540
"property", "descriptor", "value", "type",
2641
"at-rule", "function", "selector",
27-
2842
"namespace", "interface", "constructor", "method", "argument",
2943
"attribute", "callback", "dictionary", "dict-member", "enum",
3044
"enum-value", "exception", "const", "typedef", "stringifier",
3145
"serializer", "iterator", "maplike", "setlike", "extended-attribute",
3246
"event", "permission",
33-
3447
"element", "element-state", "element-attr", "attr-value",
35-
3648
"cddl-module", "cddl-type", "cddl-parameter", "cddl-key", "cddl-value",
37-
3849
"scheme", "http-header",
39-
4050
"grammar", "abstract-op", "dfn"
4151
],
4252
"$comment": "Types taken from src/browserlib/extract-dfns.mjs"
@@ -52,21 +62,25 @@
5262
"informative": {
5363
"type": "boolean"
5464
},
55-
"heading": {
56-
"type": "object",
57-
"additionalProperties": false,
58-
"required": ["href", "title"],
59-
"properties": {
60-
"id": { "$ref": "../common.json#/$defs/id" },
61-
"href": { "$ref": "../common.json#/$defs/url" },
62-
"title": { "type": "string" },
63-
"number": { "$ref": "../common.json#/$defs/headingNumber" },
64-
"alternateIds": { "type": "array", "items": { "$ref": "../common.json#/$defs/id"} }
65-
}
66-
},
65+
"heading": { "$ref": "#/$defs/heading" },
6766
"definedIn": {
6867
"type": "string"
6968
},
69+
"links": {
70+
"type": "array",
71+
"items": {
72+
"type": "object",
73+
"additionalProperties": false,
74+
"required": ["type", "id", "href", "name"],
75+
"properties": {
76+
"type": { "type": "string", "enum": ["dev"] },
77+
"id": { "$ref": "../common.json#/$defs/id" },
78+
"name": { "type": "string" },
79+
"href": { "$ref": "../common.json#/$defs/url" },
80+
"heading": { "$ref": "#/$defs/heading" }
81+
}
82+
}
83+
},
7084
"htmlProse": {
7185
"type": "string",
7286
"minLength": 1

src/browserlib/extract-dfns.mjs

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import extractWebIdl from './extract-webidl.mjs';
22
import informativeSelector from './informative-selector.mjs';
3+
import getAbsoluteUrl from './get-absolute-url.mjs';
34
import {parse} from "../../node_modules/webidl2/index.js";
45
/**
56
* Extract definitions in the spec that follow the "Definitions data model":
@@ -276,7 +277,11 @@ function definitionMapper(el, idToHeading, usesDfnDataModel) {
276277
// Enclosing element under which the definition appears. Value can be one of
277278
// "dt", "pre", "table", "heading", "note", "example", or "prose" (last one
278279
// indicates that definition appears in the main body of the specification)
279-
definedIn
280+
definedIn,
281+
282+
// Important links that complement the definition
283+
// (typically: anchors in "for web developers" sections)
284+
links: []
280285
};
281286

282287
// Extract a prose definition in HTML for the term, if available
@@ -322,14 +327,14 @@ export default function (spec, idToHeading = {}) {
322327
break;
323328
}
324329

325-
const definitions = [...document.querySelectorAll(definitionsSelector)];
326-
const usesDfnDataModel = definitions.some(dfn =>
330+
const dfnEls = [...document.querySelectorAll(definitionsSelector)];
331+
const usesDfnDataModel = dfnEls.some(dfn =>
327332
dfn.hasAttribute('data-dfn-type') ||
328333
dfn.hasAttribute('data-dfn-for') ||
329334
dfn.hasAttribute('data-export') ||
330335
dfn.hasAttribute('data-noexport'));
331336

332-
return definitions
337+
const definitions = dfnEls
333338
.map(node => {
334339
// 2021-06-21: Temporary preprocessing of invalid "idl" dfn type (used for
335340
// internal slots) while fix for https://github.com/w3c/respec/issues/3644
@@ -365,6 +370,45 @@ export default function (spec, idToHeading = {}) {
365370
})
366371
.map(node => definitionMapper(node, idToHeading, usesDfnDataModel))
367372
.filter(isNotAlreadyExported);
373+
374+
// Some specs have informative "For web developers" sections targeted at
375+
// presenting concepts to web developers. These sections contain anchors
376+
// that are useful for documentation purpose. The anchors themselves are
377+
// references to terms defined elsewhere in the spec. We will capture them in
378+
// a `links` property attached to the underlying definition.
379+
// Note: Ideally, `.domintro` would be added to the informative selector list
380+
// but some specs use `.domintro` for lists that define IDL terms. We'll get
381+
// rid of them by skipping lists that have `dfn`.
382+
const devSelector = '.domintro dt:not(dt:has(dfn)) a[id]';
383+
for (const node of [...document.querySelectorAll(devSelector)]) {
384+
const dfnHref = getAbsoluteUrl(node, { attribute: 'href' });
385+
const dfn = definitions.find(d => d.href === dfnHref);
386+
if (dfn) {
387+
const href = getAbsoluteUrl(node);
388+
const page = node.closest('[data-reffy-page]')?.getAttribute('data-reffy-page');
389+
dfn.links.push({
390+
type: 'dev',
391+
id: node.getAttribute('id'),
392+
name: normalize(node.closest('dt').textContent),
393+
href,
394+
heading: idToHeading[href] ?? {
395+
href: (new URL(page ?? window.location.href)).toString(),
396+
title: document.title
397+
}
398+
});
399+
}
400+
else {
401+
// When an interface inherits from another, the reference may target
402+
// a base dfn in another spec. For example:
403+
// https://encoding.spec.whatwg.org/#ref-for-dom-generictransformstream-readable
404+
// ... targets the Streams spec. There aren't many occurrences of this
405+
// pattern and the occurrences do not look super interesting to link to
406+
// from a documentation perspective. Let's skip them.
407+
console.warn('[reffy]', `Dev dfn ${node.textContent} (${node.id}) targets unknown/external dfn at ${node.href}`);
408+
}
409+
}
410+
411+
return definitions;
368412
}
369413

370414
function preProcessEcmascript() {

src/browserlib/get-absolute-url.mjs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,12 @@ export default function (node, { singlePage, attribute } =
1818
const url = new URL(page ?? window.location.href);
1919
const hashid = node.getAttribute(attribute);
2020
if (hashid) {
21-
url.hash = '#' + encodeURIComponent(hashid);
21+
let fragment = hashid;
22+
if (hashid.match(/^#/) && attribute === 'href') {
23+
// Function is called to turn a fragment ref into an absolute URL
24+
fragment = hashid.substring(1);
25+
}
26+
url.hash = '#' + encodeURIComponent(fragment);
2227
}
2328
return url.toString();
2429
}

test/crawl-test.json

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@
4747
"href": "https://w3c.github.io/woff/woff2/",
4848
"title": "WOFF2"
4949
},
50-
"definedIn": "prose"
50+
"definedIn": "prose",
51+
"links": []
5152
}
5253
],
5354
"events": [],
@@ -124,7 +125,8 @@
124125
"href": "https://w3c.github.io/mediacapture-output/#title",
125126
"title": "No Title"
126127
},
127-
"definedIn": "pre"
128+
"definedIn": "pre",
129+
"links": []
128130
},
129131
{
130132
"id": "dom-foo-bar",
@@ -146,7 +148,8 @@
146148
"href": "https://w3c.github.io/mediacapture-output/#title",
147149
"title": "No Title"
148150
},
149-
"definedIn": "pre"
151+
"definedIn": "pre",
152+
"links": []
150153
}
151154
],
152155
"events": [],

test/extract-dfns.js

Lines changed: 82 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ const baseDfn = {
9595
heading: {
9696
href: 'about:blank',
9797
title: ''
98-
}
98+
},
99+
links: []
99100
};
100101
const tests = [
101102
{title: "parses a simple <dfn>",
@@ -787,6 +788,86 @@ When initialize(<var>newItem</var>) is called, the following steps are run:</p>`
787788
'There is one web'
788789
],
789790
}]
791+
},
792+
793+
{
794+
title: "extracts links for web developers",
795+
html: `<p><dfn id='foo' data-dfn-type='dfn'>Foo</dfn></p>
796+
<div class="domintro">
797+
<dl>
798+
<dt><a id="foo-dev" href="#foo">Foo</a></dt>
799+
<dd>Blah</dd>
800+
</dl>
801+
</div>`,
802+
changesToBaseDfn: [
803+
{
804+
links: [
805+
{
806+
type: 'dev',
807+
id: 'foo-dev',
808+
name: 'Foo',
809+
href: 'about:blank#foo-dev',
810+
heading: {
811+
href: 'about:blank',
812+
title: ''
813+
}
814+
}
815+
]
816+
}
817+
]
818+
},
819+
820+
{
821+
title: "extracts heading info for links for web developers",
822+
html: `<p><dfn id='foo' data-dfn-type='interface' data-dfn-for="Cest" data-lt="Fou">Foo</dfn></p>
823+
<section id="foo-sec">
824+
<h3>Foo section</h3>
825+
<dl class="domintro">
826+
<dt>Fou . C . <a id="foo-dev" href="#foo">Foo</a></dt>
827+
<dd>Blah</dd>
828+
</dl>
829+
</section>`,
830+
changesToBaseDfn: [
831+
{
832+
type: 'interface',
833+
access: 'public',
834+
for: ['Cest'],
835+
linkingText: ['Fou'],
836+
links: [
837+
{
838+
type: 'dev',
839+
id: 'foo-dev',
840+
name: 'Fou . C . Foo',
841+
href: 'about:blank#foo-dev',
842+
heading: {
843+
href: 'about:blank#foo-sec',
844+
id: 'foo-sec',
845+
title: 'Foo section'
846+
}
847+
}
848+
]
849+
}
850+
]
851+
},
852+
853+
{
854+
title: "ignores sections for web developers that contain dfns",
855+
html: `<p><dfn id='foo' data-dfn-type='dfn'>Foo</dfn></p>
856+
<dl class="domintro">
857+
<dt>
858+
<dfn id="bar" data-dfn-type='dfn'>Bar</dfn>
859+
<a id="foo-dev" href="#foo">Foo</a></dt>
860+
<dd>Blah</dd>
861+
</dl>`,
862+
changesToBaseDfn: [
863+
{},
864+
{
865+
id: 'bar',
866+
href: 'about:blank#bar',
867+
linkingText: ['Bar'],
868+
definedIn: 'dt'
869+
}
870+
]
790871
}
791872
];
792873

test/generate-idlparsed.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ intraface foo {};
4545
type: type.split(' ')[0],
4646
for: [],
4747
access: 'public',
48-
informative: false
48+
informative: false,
49+
links: []
4950
}],
5051
idl: `${type} foo {};`
5152
};

0 commit comments

Comments
 (0)