Skip to content

Commit 09872b4

Browse files
authored
Merge pull request #91 from moosetechnology/blog
New blog post on model creation optimization
2 parents 8a217f4 + d5e440f commit 09872b4

File tree

5 files changed

+109
-0
lines changed

5 files changed

+109
-0
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
authors:
3+
- CyrilFerlicot
4+
title: "Speed up models creation: application to JSON/MSE parsing"
5+
date: 2025-11-18
6+
lastUpdated: 2025-11-18
7+
tags:
8+
- infrastructure
9+
- optimization
10+
---
11+
12+
import { Code } from '@astrojs/starlight/components';
13+
14+
## Context
15+
16+
In order to be able to work with Moose there is a prerequisite we cannot avoid: we need a model to analyze. This can be archieved in 2 principal ways:
17+
- Importing an existing JSON/MSE file containing a model
18+
- Importing a model via a Moose importer such as the Pharo importer or Python importer
19+
20+
While doing this, we create a lot of entities and set a lot of relations. But this can take some time. I found out that this time was even bigger than I anticipated while profiling a JSON import.
21+
22+
Here is the result of the profiling of a JSON of 330MB on a Macbook pro M1 from 2023:
23+
24+
![Image of a profiling](./img/posts/2025-11-18-speedup-import-json-mse/profiler_full_before.png)
25+
26+
Form this profiling we can see that we spend 351sec for this import. We can find more information in this report:
27+
28+
![Image of a profiling 2](./img/posts/2025-11-18-speedup-import-json-mse/profiler_2_before.png)
29+
30+
On this screenshot we can see some noise due to the fact that the profiler was not adapted to the new event listening loop of Pharo. But in the leaves we can also see that most of the time is spent in `FMSlotMultivaluedLink>>#indexOf:startingAt:ifAbsent:`.
31+
32+
This is used by a mecanism of all instance variables that are `FMMany` because those we do not want duplicated elements. Thus, we check if the collection contains the element before adding it.
33+
34+
But during the import of a JSON file, we should have no duplicates making this check useless. This also explains why we spend so much time in this method: we always are in the worst case scenario: there is no element matching.
35+
36+
## The optimization
37+
38+
In order to optimize the creation of a model when we know we will not create any duplicates, we can disable the check.
39+
40+
For this, we can use a dynamic variable declaring that we should check for duplicated elements by default, but allowing to disable the check during the execution of some code.
41+
42+
:::note[Note]
43+
If you do not know what is a Dynamic Variable you can check [the pharo wiki](https://github.com/pharo-open-documentation/pharo-wiki/blob/master/PharoProjects/DynamicVariables.md).
44+
:::
45+
46+
```smalltalk
47+
DynamicVariable << #FMShouldCheckForDuplicatedEntitiesInMultivalueLinks
48+
slots: {};
49+
tag: 'Utilities';
50+
package: 'Fame-Core'
51+
```
52+
53+
```smalltalk
54+
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks>>#default
55+
^ true
56+
```
57+
58+
And now that we have the variable, we can use it:
59+
60+
```diff lang="smalltalk"
61+
FMSlotMultivalueLink >> unsafeAdd: element
62+
- (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
63+
+ FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
64+
+ ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
65+
+ ifFalse: [ self uncheckUnsafeAdd: element ]
66+
```
67+
68+
69+
```diff lang="smalltalk"
70+
FMMultivalueLink >> unsafeAdd: element
71+
- (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
72+
+ FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
73+
+ ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
74+
+ ifFalse: [ self uncheckUnsafeAdd: element ]
75+
```
76+
77+
And the last step is to disable the check during the MSE/JSON parsing:
78+
79+
80+
```diff lang="smalltalk"
81+
FMMSEParser >> basicRun
82+
- self Document.
83+
- self atEnd ifFalse: [ ^ self syntaxError ]
84+
+ FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [
85+
+ self Document.
86+
+ self atEnd ifFalse: [ ^ self syntaxError ] ]
87+
```
88+
89+
## Result of the optimization
90+
91+
Now let's try to import the same JSON file with the optiwization enabled:
92+
93+
![Image of a profiling](./img/posts/2025-11-18-speedup-import-json-mse/profiler_full_after.png)
94+
95+
![Image of a profiling 2](./img/posts/2025-11-18-speedup-import-json-mse/profiler_2_after.png)
96+
97+
We can see that the import time went from 351sec to 113sec!
98+
99+
We can also notice that we do not have one bottleneck in our parsing. This means that it will be harder to optimize more this task (even if some people still have some ideas on how to do that).
100+
101+
## Use this optimization in your project
102+
103+
This optimization has been made for the import of JSON but it can be used in other contexts.
104+
For example, in the Moose Python importer, the implementation is sure to never produce a duplicate. Thus, we could use the same trick this way:
105+
106+
```smalltalk
107+
FamixPythonImporter >> import
108+
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [ super import ]
109+
```
61.7 KB
Loading
77.1 KB
Loading
107 KB
Loading
107 KB
Loading

0 commit comments

Comments
 (0)