Skip to content

Commit ff6df77

Browse files
authored
enh(mathematica) Much improved highlighting for Wolfram (#2706)
Fix several issues and implement additional features for the Wolfram Language (Mathematica) - Include an up-to-date list of built-in symbols in a separate `lib/mathematica.js` file. It's one keyword per line and more easy to maintain. - Fix regexp to identify symbols/variables which requires special treatment and does not follow the common `IDENT_RE` matching. - Replace generic `C_NUMBER_MODE` matching with dedicated regular expressions for all possible numbers in Mathematica. - Include named-characters in the matching of symbols. - Allow for dedicated styling of - pattern-like forms, e.g. `par_String` - slots of anonymous functions, e.g. `##3` - message names, e.g. `myFunc::usage` - braces, curly braces and brackets - Introduce `classNameAliases` to map specific styles to general styles used by all themes. This allows for using built-in themes and writing sophisticated Mathematica themes.
1 parent 67b83be commit ff6df77

File tree

10 files changed

+6931
-92
lines changed

10 files changed

+6931
-92
lines changed

AUTHORS.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,4 +305,5 @@ Contributors:
305305
- Jonathan Sharpe <[email protected]>
306306
- Michael Rush <[email protected]>
307307
- Florian Bezdeka <[email protected]>
308+
- Patrick Scheibe <[email protected]>
308309
- Kyle Brown <kylebrown9@github>

CHANGES.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,21 @@ Language Improvements:
2323
- enh(php) highlight variables (#2785) [Taufik Nurrohman][]
2424
- fix(python) Handle comments on decorators (#2804) [Jonathan Sharpe][]
2525
- enh(diff) improve highlighting of diff for git patches [Florian Bezdeka][]
26+
- enh(mathematica) Rework entire implementation [Patrick Scheibe][]
27+
- Correct matching of the many variations of Mathematica's numbers
28+
- Matching of named-characters aka special symbols like `\[Gamma]`
29+
- Updated list of version 12.1 built-in symbols
30+
- Matching of patterns, slots, message-names and braces
2631

2732
Dev Improvements:
2833

2934
- chore(dev) add theme picker to the tools/developer tool (#2770) [Josh Goebel][]
3035
- fix(dev) the Vue.js plugin no longer throws an exception when hljs is not in the global namespace [Kyle Brown][]
3136

37+
Parser:
38+
39+
- enh(grammars) allow `classNameAliases` for more complex grammars [Josh Goebel][]
40+
3241
New themes:
3342

3443
- *StackOverflow Dark* by [Jan Pilzer][]
@@ -41,8 +50,10 @@ New themes:
4150
[Jan Pilzer]: https://github.com/Hirse
4251
[Jonathan Sharpe]: https://github.com/textbook
4352
[Michael Rush]: https://github.com/rushimusmaximus
53+
[Patrick Scheibe]: https://github.com/halirutan
4454
[Kyle Brown]: https://github.com/kylebrown9
4555

56+
4657
## Version 10.3.1
4758

4859
Prior version let some look-behind regex sneak in, which does not work

docs/mode-reference.rst

Lines changed: 82 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Mode reference
1+
Mode Reference
22
==============
33

44
Types
@@ -23,29 +23,81 @@ Types of attributes values in this reference:
2323
+------------+-------------------------------------------------------------------------------------+
2424

2525

26-
Attributes
27-
----------
26+
Language Only Attributes
27+
------------------------
28+
29+
These attributes are only valid at the language level (ie, they many only exist on the top-most language object and have no meaning if specified in children modes).
30+
31+
32+
name
33+
^^^^
34+
35+
- **type**: string
36+
37+
The canonical name of this language, ie "JavaScript", etc.
38+
2839

2940
case_insensitive
3041
^^^^^^^^^^^^^^^^
3142

32-
**type**: boolean
43+
- **type**: boolean
3344

3445
Case insensitivity of language keywords and regexps. Used only on the top-level mode.
3546

3647

3748
aliases
3849
^^^^^^^
3950

40-
**type**: array
51+
- **type**: array
4152

4253
A list of additional names (besides the canonical one given by the filename) that can be used to identify a language in HTML classes and in a call to :ref:`getLanguage <getLanguage>`.
4354

4455

56+
classNameAliases
57+
^^^^^^^^^^^^^^^^
58+
59+
- **type**: object
60+
61+
A mapping table of any custom class names your grammar uses and their supported equivalencies. Perhaps your language has a concept of "slots" that roughly correspond to variables in other languages. This allows you to write grammar code like:
62+
63+
::
64+
65+
{
66+
classNameAliases: {
67+
slot: "variable",
68+
"message-name": "string"
69+
},
70+
contains: [
71+
{
72+
className: "slot",
73+
begin: // ...
74+
}
75+
]
76+
}
77+
78+
The final HTML output will render slots with the CSS class as ``hljs-variable``. This feature exists to make it easier for grammar maintainers to think in their own language when maintaining a grammar.
79+
80+
For a list of all supported class names please see the :doc:`CSS class reference
81+
</css-classes-reference>`.
82+
83+
84+
disableAutodetect
85+
^^^^^^^^^^^^^^^^^
86+
87+
- **type**: boolean
88+
89+
Disables autodetection for this language.
90+
91+
92+
93+
Mode Attributes
94+
---------------
95+
96+
4597
className
4698
^^^^^^^^^
4799

48-
**type**: identifier
100+
- **type**: identifier
49101

50102
The name of the mode. It is used as a class name in HTML markup.
51103

@@ -56,16 +108,16 @@ for one thing like string in single or double quotes.
56108
begin
57109
^^^^^
58110

59-
**type**: regexp
111+
- **type**: regexp
60112

61113
Regular expression starting a mode. For example a single quote for strings or two forward slashes for C-style comments.
62114
If absent, ``begin`` defaults to a regexp that matches anything, so the mode starts immediately.
63115

64116

65117
on:begin
66-
^^^^^^^^^^^
118+
^^^^^^^^
67119

68-
**type**: callback (matchData, response)
120+
- **type**: callback (matchData, response)
69121

70122
This callback is triggered the moment a begin match is detected. ``matchData`` includes the typical regex match data; the full match, match groups, etc. The ``response`` object is used to tell the parser how it should handle the match. It can be also used to temporarily store data.
71123

@@ -78,7 +130,7 @@ For an example of usage see ``END_SAME_AS_BEGIN`` in ``modes.js``.
78130
end
79131
^^^
80132

81-
**type**: regexp
133+
- **type**: regexp
82134

83135
Regular expression ending a mode. For example a single quote for strings or "$" (end of line) for one-line comments.
84136

@@ -93,9 +145,9 @@ This is achieved with :ref:`endsWithParent <endsWithParent>` attribute.
93145

94146

95147
on:end
96-
^^^^^^^^^^^
148+
^^^^^^
97149

98-
**type**: callback (matchData, response)
150+
- **type**: callback (matchData, response)
99151

100152
This callback is triggered the moment an end match is detected. ``matchData`` includes the typical regex match data; the full match, match groups, etc. The ``response`` object is used to tell the parser how it should handle the match. It can also be used to retrieve data stored from a `begin` callback.
101153

@@ -106,9 +158,9 @@ For an example of usage see ``END_SAME_AS_BEGIN`` in ``modes.js``.
106158

107159

108160
beginKeywords
109-
^^^^^^^^^^^^^^^^
161+
^^^^^^^^^^^^^
110162

111-
**type**: string
163+
- **type**: string
112164

113165
Used instead of ``begin`` for modes starting with keywords to avoid needless repetition:
114166

@@ -140,7 +192,7 @@ Ex. ``class A { ... }`` would match while ``A.class == B.class`` would not.
140192
endsWithParent
141193
^^^^^^^^^^^^^^
142194

143-
**type**: boolean
195+
- **type**: boolean
144196

145197
A flag showing that a mode ends when its parent ends.
146198

@@ -169,7 +221,7 @@ This is when ``endsWithParent`` comes into play:
169221
endsParent
170222
^^^^^^^^^^^^^^
171223

172-
**type**: boolean
224+
- **type**: boolean
173225

174226
Forces closing of the parent mode right after the current mode is closed.
175227

@@ -215,7 +267,7 @@ endSameAsBegin (deprecated as of 10.1)
215267
``END_SAME_AS_BEGIN`` mode or use the ``on:begin`` and ``on:end`` attributes to
216268
build more complex paired matchers.
217269

218-
**type**: boolean
270+
- **type**: boolean
219271

220272
Acts as ``end`` matching exactly the same string that was found by the
221273
corresponding ``begin`` regexp.
@@ -244,7 +296,7 @@ and ``endSameAsBegin: true``.
244296
lexemes (now keywords.$pattern)
245297
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
246298

247-
**type**: regexp
299+
- **type**: regexp
248300

249301
A regular expression that extracts individual "words" from the code to compare
250302
against :ref:`keywords <keywords>`. The default value is ``\w+`` which works for
@@ -260,7 +312,7 @@ constant that you repeat multiple times within different modes of your grammar.
260312
keywords
261313
^^^^^^^^
262314

263-
**type**: object
315+
- **type**: object / string
264316

265317
Keyword definition comes in two forms:
266318

@@ -273,7 +325,7 @@ For detailed explanation see :doc:`Language definition guide </language-guide>`.
273325
illegal
274326
^^^^^^^
275327

276-
**type**: regexp
328+
- **type**: regexp
277329

278330
A regular expression that defines symbols illegal for the mode.
279331
When the parser finds a match for illegal expression it immediately drops parsing the whole language altogether.
@@ -282,7 +334,7 @@ When the parser finds a match for illegal expression it immediately drops parsin
282334
excludeBegin, excludeEnd
283335
^^^^^^^^^^^^^^^^^^^^^^^^
284336

285-
**type**: boolean
337+
- **type**: boolean
286338

287339
Exclude beginning or ending lexemes out of mode's generated markup. For example in CSS syntax a rule ends with a semicolon.
288340
However visually it's better not to color it as the rule contents. Having ``excludeEnd: true`` forces a ``<span>`` element for the rule to close before the semicolon.
@@ -291,7 +343,7 @@ However visually it's better not to color it as the rule contents. Having ``excl
291343
returnBegin
292344
^^^^^^^^^^^
293345

294-
**type**: boolean
346+
- **type**: boolean
295347

296348
Returns just found beginning lexeme back into parser. This is used when beginning of a sub-mode is a complex expression
297349
that should not only be found within a parent mode but also parsed according to the rules of a sub-mode.
@@ -302,7 +354,7 @@ Since the parser is effectively goes back it's quite possible to create a infini
302354
returnEnd
303355
^^^^^^^^^
304356

305-
**type**: boolean
357+
- **type**: boolean
306358

307359
Returns just found ending lexeme back into parser. This is used for example to parse JavaScript embedded into HTML.
308360
A JavaScript block ends with the HTML closing tag ``</script>`` that cannot be parsed with JavaScript rules.
@@ -314,15 +366,15 @@ Since the parser is effectively goes back it's quite possible to create a infini
314366
contains
315367
^^^^^^^^
316368

317-
**type**: array
369+
- **type**: array
318370

319371
The list of sub-modes that can be found inside the mode. For detailed explanation see :doc:`Language definition guide </language-guide>`.
320372

321373

322374
starts
323375
^^^^^^
324376

325-
**type**: identifier
377+
- **type**: identifier
326378

327379
The name of the mode that will start right after the current mode ends. The new mode won't be contained within the current one.
328380

@@ -333,7 +385,7 @@ Tags ``<script>`` and ``<style>`` start sub-modes that use another language defi
333385
variants
334386
^^^^^^^^
335387

336-
**type**: array
388+
- **type**: array
337389

338390
Modification to the main definitions of the mode, effectively expanding it into several similar modes
339391
each having all the attributes from the main definition augmented or overridden by the variants::
@@ -366,10 +418,11 @@ Further info: https://github.com/highlightjs/highlight.js/issues/826
366418

367419
.. _subLanguage:
368420

421+
369422
subLanguage
370423
^^^^^^^^^^^
371424

372-
**type**: string or array
425+
- **type**: string or array
373426

374427
Highlights the entire contents of the mode with another language.
375428

@@ -381,10 +434,11 @@ The value of the attribute controls which language or languages will be used for
381434
* empty array: auto detection with all the languages available
382435
* array of language names: auto detection constrained to the specified set
383436

437+
384438
skip
385439
^^^^
386440

387-
**type**: boolean
441+
- **type**: boolean
388442

389443
Skips any markup processing for the mode ensuring that it remains a part of its
390444
parent buffer along with the starting and the ending lexemes. This works in
@@ -407,10 +461,3 @@ handle pairs of ``/* .. */`` to correctly find the ending ``?>``::
407461
Without ``skip: true`` every comment would cause the parser to drop out back
408462
into the HTML mode.
409463

410-
disableAutodetect
411-
^^^^^^^^^^^^^^^^^
412-
413-
**type**: boolean
414-
415-
Disables autodetection for this language.
416-

src/highlight.js

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,8 @@ const HLJS = function(hljs) {
174174
buf = "";
175175

176176
relevance += keywordRelevance;
177-
emitter.addKeyword(match[0], kind);
177+
const cssClass = language.classNameAliases[kind] || kind;
178+
emitter.addKeyword(match[0], cssClass);
178179
} else {
179180
buf += match[0];
180181
}
@@ -225,7 +226,7 @@ const HLJS = function(hljs) {
225226
*/
226227
function startNewMode(mode) {
227228
if (mode.className) {
228-
emitter.openNode(mode.className);
229+
emitter.openNode(language.classNameAliases[mode.className] || mode.className);
229230
}
230231
top = Object.create(mode, { parent: { value: top } });
231232
return top;

0 commit comments

Comments
 (0)