Skip to content

Commit 3f70476

Browse files
committed
ATM: Optimize body tokens by pushing in size limit
Pushing the restriction to 256 tokens into the `bodyTokens` predicate means we avoid this predicate blowing up due to very large functions. This results in a runtime improvement from 1800s+ to 294s as measured on a problematic repo on my machine (I didn't wait for the query to finish running).
1 parent 4aacba8 commit 3f70476

File tree

1 file changed

+9
-6
lines changed
  • javascript/ql/experimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling

1 file changed

+9
-6
lines changed

javascript/ql/experimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,15 @@ module FunctionBodies {
133133
// Performance optimization: Restrict the set of entities to those containing an endpoint to featurize.
134134
entity =
135135
getRepresentativeEntityForEndpoint(any(FeaturizationConfig cfg).getAnEndpointToFeaturize()) and
136+
// Performance optimization: If a function has more than 256 body tokens, then featurize it as
137+
// absent. This approximates the behavior of the classifer on non-generic body features where
138+
// large body features are replaced by the absent token.
139+
//
140+
// We count nodes instead of tokens because tokens are often not unique.
141+
strictcount(DatabaseFeatures::AstNode node |
142+
DatabaseFeatures::astNodes(entity, _, _, node, _) and
143+
exists(string t | DatabaseFeatures::nodeAttributes(node, t))
144+
) <= 256 and
136145
exists(DatabaseFeatures::AstNode node |
137146
DatabaseFeatures::astNodes(entity, _, _, node, _) and
138147
token = unique(string t | DatabaseFeatures::nodeAttributes(node, t)) and
@@ -146,12 +155,6 @@ module FunctionBodies {
146155
* This is a string containing natural language tokens in the order that they appear in the source code for the entity.
147156
*/
148157
string getBodyTokenFeatureForEntity(DatabaseFeatures::Entity entity) {
149-
// If a function has more than 256 body subtokens, then featurize it as absent. This
150-
// approximates the behavior of the classifer on non-generic body features where large body
151-
// features are replaced by the absent token.
152-
//
153-
// We count locations instead of tokens because tokens are often not unique.
154-
strictcount(Location l | bodyTokens(entity, l, _)) <= 256 and
155158
result =
156159
strictconcat(string token, Location l |
157160
bodyTokens(entity, l, token)

0 commit comments

Comments
 (0)