Skip to content

Commit 52c679d

Browse files
nakamura-toclaude
andauthored
refactor: Optimize ExpressionTokenizer with improved performance and code clarity (#1367)
* test: Add comprehensive test case for complex mixed expressions in ExpressionTokenizer Add testComplexMixedExpression to verify proper tokenization of expressions containing: - Method chaining (user.getName().length()) - Comparison operators (>, >=) - Logical operators (&&) - Mixed operand types This improves test coverage for real-world expression parsing scenarios. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Create Classic versions of ExpressionTokenizer for preservation Add ClassicExpressionTokenizer and ClassicExpressionTokenizerTest as copies of the current implementation. This preserves the existing tokenizer logic before refactoring, allowing for comparison and fallback if needed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * perf: Add JMH benchmark for ExpressionTokenizer performance comparison Add ExpressionTokenizerBenchmark.java to measure performance differences between ExpressionTokenizer and ClassicExpressionTokenizer implementations. The benchmark includes test scenarios for: - Simple expressions (property access) - Complex expressions (multiple operators and method calls) - Literal-heavy expressions (strings, numbers, booleans) - Function-heavy expressions (built-in and static method calls) - Operator-heavy expressions (arithmetic and logical operators) - Whitespace and string literal handling This enables data-driven optimization decisions for the tokenizer. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Optimize ExpressionTokenizer.peekOneChar with character-based branching - Replace if-else chains with efficient switch statements for better performance - Group binary operators and single-character tokens using character-based branching - Extract complex parsing logic into specialized handler methods - Add explicit whitespace character cases to avoid method call overhead - Maintain 100% API compatibility while improving parsing efficiency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Optimize peek methods with streamlined conditional logic - Simplify peek() method by reorganizing control flow for better readability - Inline isWordTerminated() calls in literal checking methods for efficiency - Remove redundant nested conditions in peekFiveChars, peekFourChars, and peekThreeChars - Maintain identical functionality while reducing method call overhead 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Remove duplicated buffer and simplify token preparation logic in ExpressionTokenizer - Eliminated `duplicatedBuf` field to reduce memory redundancy. - Replaced duplicated buffer logic with `substring` operations using `tokenStartIndex`. - Improved code maintainability while preserving existing functionality. * refactor: Optimize peek method with reduced nesting and improved documentation - Reduced deep nesting in peek() method using early returns for better readability - Added comprehensive comments explaining the parsing logic and optimizations - Improved character-by-character parsing flow with clear decision points - Enhanced keyword detection for 'new', 'null', 'true', and 'false' literals - Optimized identifier processing with early bailout for common cases - Maintained backward compatibility while improving code maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Improve encapsulation by making internal members private in ExpressionTokenizer - Changed all protected fields to private as they are not accessed externally - Changed all protected methods to private as they are only used internally - Maintained public API methods (constructor, next, getToken, getPosition, setPosition) - Enhanced code maintainability and encapsulation without breaking functionality - No inheritance or external field access found in codebase analysis Fields changed to private: - expression, buf, type, token, position, tokenStartIndex, binaryOpAvailable Methods changed to private: - prepareToken, peek, peekTwoChars, peekOneChar, peekStaticMember, peekNumber, isWordTerminated 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Improve variable naming consistency in ExpressionTokenizer - Renamed character variables to use consistent c1, c2, c3, c4, c5 naming pattern - Enhanced code readability by using descriptive variable names throughout - Maintained functional behavior while improving code consistency - Applied consistent naming across all character handling methods This change improves code maintainability by using a unified naming convention for character variables, making the parsing logic easier to follow and understand. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: Expand wildcard imports in ExpressionTokenizer for better code clarity Replace wildcard static imports with explicit imports for all 38 ExpressionTokenType constants and 1 AssertionUtil method to improve code readability and make dependencies explicit. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent fb6d67d commit 52c679d

File tree

5 files changed

+1820
-242
lines changed

5 files changed

+1820
-242
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
/*
2+
* Copyright Doma Authors
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package org.seasar.doma.internal.expr;
17+
18+
import static org.seasar.doma.internal.expr.ExpressionTokenType.EOE;
19+
20+
import java.util.concurrent.TimeUnit;
21+
import org.openjdk.jmh.annotations.Benchmark;
22+
import org.openjdk.jmh.annotations.BenchmarkMode;
23+
import org.openjdk.jmh.annotations.Mode;
24+
import org.openjdk.jmh.annotations.OutputTimeUnit;
25+
import org.openjdk.jmh.annotations.Param;
26+
import org.openjdk.jmh.annotations.Scope;
27+
import org.openjdk.jmh.annotations.Setup;
28+
import org.openjdk.jmh.annotations.State;
29+
import org.openjdk.jmh.infra.Blackhole;
30+
31+
/**
32+
* JMH benchmark comparing performance between ExpressionTokenizer and ClassicExpressionTokenizer.
33+
*/
34+
@BenchmarkMode(Mode.AverageTime)
35+
@OutputTimeUnit(TimeUnit.NANOSECONDS)
36+
@State(Scope.Benchmark)
37+
public class ExpressionTokenizerBenchmark {
38+
39+
// Test expression samples representing different complexity levels
40+
private static final String SIMPLE_EXPRESSION = "employee.name";
41+
42+
private static final String COMPLEX_EXPRESSION =
43+
"employee.getName().trim().toUpperCase() == \"JOHN\" && "
44+
+ "(employee.getAge() >= 18 && employee.getAge() <= 65) || "
45+
+ "employee.getStatus() != null && employee.getStatus().isActive()";
46+
47+
private static final String LITERAL_HEAVY_EXPRESSION =
48+
"\"Hello World\" + 123 + 45.67F + 89.01D + 123L + 456B + "
49+
+ "true && false || null != \"test string with spaces\"";
50+
51+
private static final String FUNCTION_HEAVY_EXPRESSION =
52+
"@prefix(name, \"Mr.\") + @suffix(name, \"Jr.\") + "
53+
+ "@java.lang.String@valueOf(age) + @java.lang.Math@max(10, 20) + "
54+
+ "@isEmpty(list) || @isNotEmpty(collection)";
55+
56+
private static final String OPERATOR_HEAVY_EXPRESSION =
57+
"!active && (x > 0 || y < 0) && z >= 10 && w <= 20 && "
58+
+ "a == b && c != d && (e + f - g * h / i % j) > 0";
59+
60+
@Param({
61+
"SIMPLE",
62+
"COMPLEX",
63+
"LITERAL_HEAVY",
64+
"FUNCTION_HEAVY",
65+
"OPERATOR_HEAVY",
66+
})
67+
private String expressionType;
68+
69+
private String expression;
70+
71+
@Setup
72+
public void setup() {
73+
switch (expressionType) {
74+
case "SIMPLE":
75+
expression = SIMPLE_EXPRESSION;
76+
break;
77+
case "COMPLEX":
78+
expression = COMPLEX_EXPRESSION;
79+
break;
80+
case "LITERAL_HEAVY":
81+
expression = LITERAL_HEAVY_EXPRESSION;
82+
break;
83+
case "FUNCTION_HEAVY":
84+
expression = FUNCTION_HEAVY_EXPRESSION;
85+
break;
86+
case "OPERATOR_HEAVY":
87+
expression = OPERATOR_HEAVY_EXPRESSION;
88+
break;
89+
default:
90+
throw new IllegalArgumentException("Unknown expression type: " + expressionType);
91+
}
92+
}
93+
94+
@Benchmark
95+
public void optimizedTokenizer(Blackhole bh) {
96+
ExpressionTokenizer tokenizer = new ExpressionTokenizer(expression);
97+
while (tokenizer.next() != EOE) {
98+
bh.consume(tokenizer.getToken());
99+
}
100+
}
101+
102+
@Benchmark
103+
public void classicTokenizer(Blackhole bh) {
104+
ClassicExpressionTokenizer tokenizer = new ClassicExpressionTokenizer(expression);
105+
while (tokenizer.next() != EOE) {
106+
bh.consume(tokenizer.getToken());
107+
}
108+
}
109+
}

0 commit comments

Comments
 (0)