You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/TreeSitter-FAST-Utils/TSFASTImporter.class.st
+38-6Lines changed: 38 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,29 @@
1
1
"
2
+
## Description
3
+
2
4
I am a generic importer for a FAST model.
3
5
4
6
I will create all the nodes and relations of the FAST model taking a root node as parameter.
5
7
6
8
I will do an exact match to the Tree Sitter AST but I have a subclass that can allow to tweak the model to generate.
7
9
8
-
Implementation details:
10
+
# Implementation details
11
+
12
+
## Context
13
+
9
14
- The context contains the stack of all elements ""parent"" to the node that is currently been visited.
10
15
- The #currentFMProperty can either be nil or a FMProperty. If it is a property, it means that the nodes been visited are part of a field of their parent that has the same name as a contained entities property of the fast entity. Thus we save it to save the children in this property instead of the generic one.
11
16
- #containedEntitiesPropertiesMap will save for each kind of FAST class the possible children properties for perf reasons.
17
+
18
+
### Source positions management
19
+
20
+
TreeSitter is providing the positions of the nodes in the parsed string in number of bytes but the current implementation of FAST requires the positions in number of characters.
21
+
In the origin implementation we were computing for each nodes the number of characters from the start and end positions in number of bytes.
22
+
23
+
Now we are taking a different direction. We know that we provide the source code to tree sitter encoded un UTF8.
24
+
With this information we build a map cached in #bytesToCharactersMap that will associate to the index of each leading bytes, the index of the corresponding character.
25
+
26
+
This allows to build once the index map and to just use it to convert bytes positions into characters positions which is speeding up a lot the import.
12
27
"
13
28
Class {
14
29
#name : 'TSFASTImporter',
@@ -19,12 +34,28 @@ Class {
19
34
'originString',
20
35
'containedEntitiesPropertiesMap',
21
36
'context',
22
-
'currentFMProperty'
37
+
'currentFMProperty',
38
+
'bytesToCharactersMap'
23
39
],
24
40
#category : 'TreeSitter-FAST-Utils',
25
41
#package : 'TreeSitter-FAST-Utils'
26
42
}
27
43
44
+
{ #category : 'private' }
45
+
TSFASTImporter>> bytesToCharacterMap [
46
+
"We consider that the string is UTF8 encoded in the FAST importer. If we parse a file in UTF16 or another encoding, we should decode it and encode it in UTF8.
47
+
48
+
In Famix we cannot do that since the source code is in files. But in FAST we keep the source code in a Pharo string allowing to do this."
"I take as parameter a ByteArray and for each character I will fill a dictionary associating the index of the byte with the index of the character corresponding."
0 commit comments