Skip to content

Commit 03d6bdc

Browse files
committed
Add notes on tokenization
1 parent 24ae1bf commit 03d6bdc

File tree

2 files changed

+7
-3
lines changed

2 files changed

+7
-3
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,12 @@ I personally use the [One Dark Pro](https://marketplace.visualstudio.com/items?i
106106

107107
Replace `/server/scripts/nwscript.nss` by its new version, `/server/scripts/base_scripts/` files by their new versions, `/server/scripts/ovr/` includes by their new versions and execute `yarn run generate-lib-defs` in the server root directory.
108108

109+
## Notes
110+
111+
The language symbols or tokens are not generated using an AST like language servers usually do. The NWScript Language Server exploits its TextMate grammar, which is derived from C's, to transform a file of code into tokens. While it works well for most cases since it is a simple scripting language built on C - even for a language like NWScript, we need to cheat and use lookahead and lookbehind strategies to ensure we are in the right context -, it will also fail for complex or uncommon code structures and styles. A TextMate grammar will never cover the most extreme cases of a language grammar. An AST represents the hierarchical structure of a file of code in a much more complete and precise way.
112+
113+
Implementing a language parser to build its AST is a lot of work, and none was available at the time I implemented this project. Now that NWScript compiler has been made [public](https://github.com/niv/neverwinter.nim), it would be much easier to create a utility responsible for parsing a file of code and generating its AST. Implementing this utility and refactoring the whole tokenization engine of the Language Server is, however, a non-negligible amount of work. Considering the fact that the current solution works well for common use, I do not intend to do it.
114+
109115
## Known issues
110116

111117
The nwnsc process doesn't terminate on linux. This is caused by the [compiler](https://github.com/nwneetools/nwnsc) itself, not the extension.

server/src/Tokenizer/Tokenizer.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ export type LocalScopeTokenizationResult = {
2828

2929
// Naive implementation
3030
// Ideally we would use an AST tree
31+
// See the Notes section of the README for the explications
3132
export default class Tokenizer {
3233
private readonly registry: Registry;
3334
private grammar: IGrammar | null = null;
@@ -223,7 +224,6 @@ export default class Tokenizer {
223224
for (let tokenIndex = 0; tokenIndex < tokensArray.length; tokenIndex++) {
224225
const token = tokensArray[tokenIndex];
225226

226-
// STRUCT PROPERTIES
227227
if (currentStruct) {
228228
if (token.scopes.includes(LanguageScopes.blockTermination)) {
229229
scope.structComplexTokens.push(currentStruct);
@@ -315,7 +315,6 @@ export default class Tokenizer {
315315
for (let tokenIndex = 0; tokenIndex < tokensArray.length; tokenIndex++) {
316316
const token = tokensArray[tokenIndex];
317317

318-
// VARIABLE
319318
if (computeFunctionLocals && this.isLocalVariable(tokenIndex, token, tokensArray)) {
320319
const complexToken = {
321320
position: { line: lineIndex, character: token.startIndex },
@@ -347,7 +346,6 @@ export default class Tokenizer {
347346
}
348347
}
349348

350-
// FUNCTION PARAM
351349
if (computeFunctionLocals && token.scopes.includes(LanguageScopes.functionParameter)) {
352350
scope.functionVariablesComplexTokens.push({
353351
position: { line: lineIndex, character: token.startIndex },

0 commit comments

Comments
 (0)