Skip to content

Conversation

@lukasz-stec
Copy link
Contributor

The cache is not thread safe, but it was used in the multithreaded context, making it possible for the Interval.of method to return invalid values.

This is a proposed fix for the #4901

The cache is not thread safe, but it was used in the
multithreaded context, making it possible for the `Interval.of`
method to return invalid values.

Signed-off-by: lukasz-stec <[email protected]>
@ericvergnaud
Copy link
Contributor

Hey, thanks for this. Have you considered the option of prepopulating the cache ? Sounds cheap and likely to preserve performance ?

@lukasz-stec
Copy link
Contributor Author

Hey, thanks for this. Have you considered the option of prepopulating the cache ? Sounds cheap and likely to preserve performance ?

Yeah, it is one of the options I included in #4901. I think it is not a bad idea if we want to keep the cache. The memory overhead would be small. The same goes for class initialization. The alternative is to make the Interval fields final.

That said, I would be surprised if the cache brings noticeable performance benefits because the allocation and gc of short-lived objects in Java is cheap, and it is likely that in many places the construction of Interval will be inlined and “scalar replaced” anyway, so the allocation is avoided entirely.

@ericvergnaud
Copy link
Contributor

I guess you'd need to measure performance rather than make assumptions?
I suspect the boost comes from comparisons rather than allocation.
There are often large amounts of single token intervals, so much less to allocate, and comparing pointers much faster than comparing values

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Nov 17, 2025

Making the fields final is also certainly helpful, although I doubt we have any code writing to them ? If we do, it should be changed (that would deserve a separate PR)

@KvanTTT
Copy link
Member

KvanTTT commented Nov 17, 2025

As far as I see, the performance probably might be improved if use the primitive long type (something like a value class in Kotlin) for the Interval instead of the class. The interval holds only two int values, they can be place into a single long. In this case the cache will not be needed at all.

@lukasz-stec
Copy link
Contributor Author

I guess you'd need to measure performance rather than make assumptions?
I suspect the boost comes from comparisons rather than allocation.
There are often large amounts of single token intervals, so much less to allocate, and comparing pointers much faster than comparing values

Are there any benchmarks that I can run? I'm new to the codebase so any help is appriciated.

Making the fields final is also certainly helpful, although I doubt we have any code writing to them ? If we do, it should be changed (that would deserve a separate PR)

If we make the fields final, it should fix the issue, because then the Interval instance is published safely. According to Java memory model, final fields need to be initialized before the reference to the object is available, unless it is leaked from the constructor.
IMO it would need a reproduction then to confirm it works, and that may not be easy to do. Especially because the problem happens only on Graviton.

@kaby76
Copy link
Contributor

kaby76 commented Nov 17, 2025

The scientific method should be employed rather than guessing.

Here is a graph comparing the effect of the PR for two SQL grammars (mysql and plsql). I didn't do a t-test because these two grammars appeared to have opposite speed-up effects. For the mysql, the PR appears to have no statistical effect, while for plsql, it does, although quite small.

times

I may write a Bash/Octave script that goes through all grammars in grammars-v4 and determines which grammars exhibit statistical differences in performance for the PR.

plsql-before.txt
plsql-after.txt
mysql-before.txt
mysql-after.txt
gr.sh.txt
te.sh.txt

@lukasz-stec
Copy link
Contributor Author

I prepared a small JMH benchmark for the Interval creation directly (code and full results below). The results show that if the Interval creation is not inlined, the cache is a bit faster, and the direct object creation is a bit faster if the code is inlined. In both cases, the cost is about 1ns per operation. If the object is created, there is also an indirect cost for the GC.

Benchmark                                                        (maxInterval)  Mode  Cnt      Score     Error   Units
BenchmarkInterval.intervalInlineWithCache                                 1000  avgt   10      1.645 ±   0.004   ns/op
BenchmarkInterval.intervalInlineWithCache:gc.alloc.rate                   1000  avgt   10      0.013 ±   0.001  MB/sec
BenchmarkInterval.intervalInlineWithCache:gc.alloc.rate.norm              1000  avgt   10     ≈ 10⁻⁵              B/op
BenchmarkInterval.intervalInlineWithCache:gc.count                        1000  avgt   10        ≈ 0            counts
BenchmarkInterval.intervalInlineWithoutCache                              1000  avgt   10      1.270 ±   0.003   ns/op
BenchmarkInterval.intervalInlineWithoutCache:gc.alloc.rate                1000  avgt   10      0.013 ±   0.001  MB/sec
BenchmarkInterval.intervalInlineWithoutCache:gc.alloc.rate.norm           1000  avgt   10     ≈ 10⁻⁵              B/op
BenchmarkInterval.intervalInlineWithoutCache:gc.count                     1000  avgt   10        ≈ 0            counts
BenchmarkInterval.intervalWithCache                                       1000  avgt   10      0.844 ±   0.003   ns/op
BenchmarkInterval.intervalWithCache:gc.alloc.rate                         1000  avgt   10      0.013 ±   0.001  MB/sec
BenchmarkInterval.intervalWithCache:gc.alloc.rate.norm                    1000  avgt   10     ≈ 10⁻⁵              B/op
BenchmarkInterval.intervalWithCache:gc.count                              1000  avgt   10        ≈ 0            counts
BenchmarkInterval.intervalWithoutCache                                    1000  avgt   10      1.361 ±   0.023   ns/op
BenchmarkInterval.intervalWithoutCache:gc.alloc.rate                      1000  avgt   10  16818.338 ± 276.260  MB/sec
BenchmarkInterval.intervalWithoutCache:gc.alloc.rate.norm                 1000  avgt   10     24.000 ±   0.001    B/op
BenchmarkInterval.intervalWithoutCache:gc.count                           1000  avgt   10     96.000            counts
BenchmarkInterval.intervalWithoutCache:gc.time                            1000  avgt   10     49.000                ms
package org.antlr.v4.test.tool;

import org.antlr.v4.runtime.misc.Interval;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.profile.GCProfiler;
import org.openjdk.jmh.results.format.ResultFormatType;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.ChainedOptionsBuilder;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.VerboseMode;

import java.security.SecureRandom;
import java.time.LocalDateTime;
import java.util.concurrent.TimeUnit;

import static java.lang.String.format;
import static java.time.format.DateTimeFormatter.ISO_DATE_TIME;

@SuppressWarnings("MethodMayBeStatic")
@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class BenchmarkInterval
{
	private static final int INTERVALS = 1_000_000;

	@Benchmark
	@OperationsPerInvocation(INTERVALS)
	public void intervalWithCache(BenchmarkData data, Blackhole bh)
	{
		for (int i = 0; i < data.intervals.length; i++) {
			bh.consume(Interval.of(data.intervals[i], data.intervals[i]));
		}
	}

	@Benchmark
	@OperationsPerInvocation(INTERVALS)
	public void intervalWithoutCache(BenchmarkData data, Blackhole bh)
	{
		for (int i = 0; i < data.intervals.length; i++) {
			bh.consume(new Interval(data.intervals[i], data.intervals[i]));
		}
	}

	@Benchmark
	@OperationsPerInvocation(INTERVALS)
	public void intervalInlineWithCache(BenchmarkData data, Blackhole bh)
	{
		long result = 0;
		for (int i = 0; i < data.intervals.length; i++) {
			Interval interval = Interval.of(data.intervals[i], data.intervals[i]);
			result = doStuff(result, interval);
		}
		bh.consume(result);
	}

	@Benchmark
	@OperationsPerInvocation(INTERVALS)
	public void intervalInlineWithoutCache(BenchmarkData data, Blackhole bh)
	{
		long result = 0;
		for (int i = 0; i < data.intervals.length; i++) {
			Interval interval = new Interval(data.intervals[i], data.intervals[i]);
			result = doStuff(result, interval);
		}
		bh.consume(result);
	}

	private static long doStuff(long result, Interval interval)
	{
		return result + interval.a + interval.b * 37L;
	}

	@State(Scope.Thread)
	public static class BenchmarkData
	{
		@Param({"256", "500", "1000", "2000"})
		private int maxInterval = 1000;

		private int[] intervals = new int[INTERVALS];

		@Setup
		public void setup()
		{
			SecureRandom secureRandom = new SecureRandom();
			for (int i = 0; i < intervals.length; i++) {
				intervals[i] = secureRandom.nextInt(maxInterval);
			}
		}
	}

	public static void main(String[] args)
		throws RunnerException
	{
		// assure the benchmarks are valid before running
		BenchmarkData data = new BenchmarkData();
		data.setup();
		Blackhole bh = new Blackhole("Today's password is swordfish. I understand instantiating Blackholes directly is dangerous.");
		new BenchmarkInterval().intervalWithCache(data, bh);
		new BenchmarkInterval().intervalWithoutCache(data, bh);
		new BenchmarkInterval().intervalInlineWithCache(data, bh);
		new BenchmarkInterval().intervalInlineWithoutCache(data, bh);

		Class<?> benchmarkClass = BenchmarkInterval.class;
		ChainedOptionsBuilder optionsBuilder = new OptionsBuilder()
			.verbosity(VerboseMode.NORMAL)
			.resultFormat(ResultFormatType.JSON)
			.result(format("%s/%s-result-%s.json", System.getProperty("java.io.tmpdir"), benchmarkClass.getSimpleName(), ISO_DATE_TIME.format(LocalDateTime.now())))
			.addProfiler(GCProfiler.class)
			.param("maxInterval", "1000")
			.jvmArgs("-Xmx10g");
		new Runner(optionsBuilder.build()).run();
	}
}

@ericvergnaud
Copy link
Contributor

Thanks for this. Can you rerun it with INTERVALS = 10_000; ?
(I'm not sure there is a scenario where an Interval(1_000_000, 1_000_000) would ever be instantiated...)

@lukasz-stec
Copy link
Contributor Author

Thanks for this. Can you rerun it with INTERVALS = 10_000; ? (I'm not sure there is a scenario where an Interval(1_000_000, 1_000_000) would ever be instantiated...)

INTERVALS is the number of intervals created during the benchmark. It is high to avoid the framework overhead overshadowing the actual code under benchmark.
The actual interval values are chosen randomly from 0 to maxInterval range, and the results are for maxInterval = 1000 (i.e., the cache size). I ran it with maxInterval = 2000 as well (results below), and the effects of branch misprediction are visible and make the simple allocation case about 3x better. I consider this unlikely scenario, though.

Benchmark                                                        (maxInterval)  Mode  Cnt      Score     Error   Units
BenchmarkInterval.intervalInlineWithCache                                 2000  avgt   10      4.300 ±   0.015   ns/op
BenchmarkInterval.intervalInlineWithCache:gc.alloc.rate                   2000  avgt   10      0.013 ±   0.001  MB/sec
BenchmarkInterval.intervalInlineWithCache:gc.alloc.rate.norm              2000  avgt   10     ≈ 10⁻⁴              B/op
BenchmarkInterval.intervalInlineWithCache:gc.count                        2000  avgt   10        ≈ 0            counts
BenchmarkInterval.intervalInlineWithoutCache                              2000  avgt   10      1.258 ±   0.004   ns/op
BenchmarkInterval.intervalInlineWithoutCache:gc.alloc.rate                2000  avgt   10      0.013 ±   0.001  MB/sec
BenchmarkInterval.intervalInlineWithoutCache:gc.alloc.rate.norm           2000  avgt   10     ≈ 10⁻⁵              B/op
BenchmarkInterval.intervalInlineWithoutCache:gc.count                     2000  avgt   10        ≈ 0            counts
BenchmarkInterval.intervalWithCache                                       2000  avgt   10      4.146 ±   0.014   ns/op
BenchmarkInterval.intervalWithCache:gc.alloc.rate                         2000  avgt   10   2756.130 ±   9.145  MB/sec
BenchmarkInterval.intervalWithCache:gc.alloc.rate.norm                    2000  avgt   10     11.985 ±   0.001    B/op
BenchmarkInterval.intervalWithCache:gc.count                              2000  avgt   10     23.000            counts
BenchmarkInterval.intervalWithCache:gc.time                               2000  avgt   10     17.000                ms
BenchmarkInterval.intervalWithoutCache                                    2000  avgt   10      1.353 ±   0.022   ns/op
BenchmarkInterval.intervalWithoutCache:gc.alloc.rate                      2000  avgt   10  16917.636 ± 262.802  MB/sec
BenchmarkInterval.intervalWithoutCache:gc.alloc.rate.norm                 2000  avgt   10     24.000 ±   0.001    B/op
BenchmarkInterval.intervalWithoutCache:gc.count                           2000  avgt   10     97.000            counts
BenchmarkInterval.intervalWithoutCache:gc.time                            2000  avgt   10     51.000                ms

@ericvergnaud
Copy link
Contributor

Thanks for the clarification.
@parrt based on the benchmarks, this PR has an overall neutral impact on performance (sometimes slightly better, sometimes slightly worse). It solves a problem, so blessing it.

@kaby76
Copy link
Contributor

kaby76 commented Nov 19, 2025

I've tested this against the 377 grammars in grammars-v4 multiple times, in different orders to eliminate bias, and double-checked everything. The cache has no statistical benefit for the Java target. While Interval.cache is missing form most target runtimes, it is in Dart and Antlr4ng. I tested the change in the Dart target; The cache has no statistical benefit.
te.sh.txt out.txt out-reverse.txt summarize.awk.txt

@parrt parrt merged commit fbb20fe into antlr:dev Nov 19, 2025
42 checks passed
@parrt
Copy link
Member

parrt commented Nov 19, 2025

thanks folks!

@stevenschlansker
Copy link

Thanks everyone for looking at this. Does it make sense to make the a and b fields final anyway, even though the cache goes away? I don't see a reason for them to be mutable - the class docs state that it is immutable - and it seems to just ask for trouble.

@ericvergnaud
Copy link
Contributor

Yes it does, please file a dedicated PR for this.

@stevenschlansker
Copy link

Ah, it turns out that IntervalSet actually mutates this "immutable" type - so maybe the fix isn't quite as easy as I'd hoped...

@stevenschlansker
Copy link

#4903

@parrt
Copy link
Member

parrt commented Nov 19, 2025

I think we should proceed with caution here. I’m certain there’s a good reason we did that even if it makes no sense ha ha

@ericvergnaud
Copy link
Contributor

I think we should proceed with caution here. I’m certain there’s a good reason we did that even if it makes no sense ha ha

I agree we should proceed with caution. Given the performance benchmark though, I suspect the original reason was good at the time but maybe no longer good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants