Make GeoIP database node loader project-aware #129829

samxbr · 2025-06-23T07:34:02Z

Make GeoIP database node loader project-aware:

loads the downloaded GeoIP databases from system index to ingest node file system for each project
each project's databases are loaded to directory tmp/geoip-databases/{nodeId}/{projectId}

Note: more work is needed to make the REST tests run in MP mode

Apologies for the change in many classes, they are related and hard to split to separate PRs. Some FixForMultiProject annotation will be fixed in separate PR's to reduce the PR size.

elasticsearchmachine · 2025-06-23T09:20:36Z

Pinging @elastic/es-data-management (Team:Data Management)

PeteGillinElastic

Mostly LGTM, just a few comments. I'll hold off on approving because @nielsbauman said he'd like to take a look as well.

PeteGillinElastic · 2025-06-23T11:19:36Z

...geoip/src/internalClusterTest/java/org/elasticsearch/ingest/geoip/DatabaseNodeServiceIT.java

+
+    @Before
+    private void setup() {
+        projectId = ProjectId.DEFAULT;


Am I right that there's more work required to make this pass with a random project? Can we add a @FixForMultiProject to make sure we remember to come back to it?

AbstractGeoIpIT extends ESIntegTestCase. We currently don't have support for running internal cluster tests in MP mode yet. Therefore, we're bound to use the default project ID in these tests.

That said, we don't need to reinitialize this field for every test. At the very least, we should make it a private final or even a private static final, although I'm personally more leaning towards just passing the ProjectId.DEFAULT constant where we need it, as a private static final doesn't feel super valuable to me.

Exactly as Niels said, ESIntegTestCase is not MP enabled yet, I put a @FixForMultiProject as reminder. I prefer keeping it as a class variable instead of local variable since it's easier change it later at a single place for all tests.

PeteGillinElastic · 2025-06-23T11:23:27Z

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java

+    private ProjectResolver projectResolver;

-    private final ConcurrentMap<String, DatabaseReaderLazyLoader> databases = new ConcurrentHashMap<>();
+    private final ConcurrentMap<ProjectId, ConcurrentMap<String, DatabaseReaderLazyLoader>> databases = new ConcurrentHashMap<>();


Would it make things simpler or not to use a map with a two-member record as a key, rather than the nested maps?

It would save having to do the databases.computeIfAbsent(projectId, (k) -> new ConcurrentHashMap<>()) dance in a few places. On the other hand, it would mean that in the removeStale... method you'd have iterate over everything and filter rather than being able to go straight to the map for the project in question... I'm not sure which is better.

I always struggle with these kinds of tradeoffs. FWIW, avoiding the "dance" could also be partially mitigated by adding a private method that does the retrieval. I think the performance impact of the iteration in removeStale is acceptable, as that method won't be called with a high frequency. I am personally usually more a fan of the nested maps as it avoids an extra record class and extra object instances/creations. I don't think there are very strong arguments either way in this case, so I don't think it matters much.

I used nested map as it saves object creation as Niels pointed out (and we use nested map in a few other places so maybe more consistent?). record key does make the map initialization easier. I don't have a strong opinion either way. Unless anyone feels strongly about this, I will just leave it unchanged :)

I don't feel strongly :-)

PeteGillinElastic · 2025-06-23T11:26:44Z

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java

            @Override
            public FileVisitResult visitFileFailed(Path file, IOException e) {
                if (e instanceof NoSuchFileException == false) {
+                    // https://github.com/elastic/elasticsearch/issues/104782


I'm not sure what this comment is meant to tell the reader?

Ah my IDE gives me a warning since it thinks I should favor parameterized log {}, but that would fail due to logger check. I put a comment to point to a previous issue that explains it. I updated the comment to make it clearer.

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java

nielsbauman · 2025-06-23T13:32:00Z

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java

+    private ProjectResolver projectResolver;

-    private final ConcurrentMap<String, DatabaseReaderLazyLoader> databases = new ConcurrentHashMap<>();
+    private final ConcurrentMap<ProjectId, ConcurrentMap<String, DatabaseReaderLazyLoader>> databases = new ConcurrentHashMap<>();


I always struggle with these kinds of tradeoffs. FWIW, avoiding the "dance" could also be partially mitigated by adding a private method that does the retrieval. I think the performance impact of the iteration in removeStale is acceptable, as that method won't be called with a high frequency. I am personally usually more a fan of the nested maps as it avoids an extra record class and extra object instances/creations. I don't think there are very strong arguments either way in this case, so I don't think it matters much.

nielsbauman

Just left one more comment but other than that LGTM, so I'm approving to allow you to merge in your morning.

nielsbauman · 2025-06-24T13:59:11Z

modules/ingest-geoip/src/test/java/org/elasticsearch/ingest/geoip/DatabaseNodeServiceTests.java

+        boolean multiProject = randomBoolean();
+        projectId = multiProject ? randomProjectIdOrDefault() : ProjectId.DEFAULT;
+        projectResolver = multiProject ? TestProjectResolvers.singleProject(projectId) : TestProjectResolvers.DEFAULT_PROJECT_ONLY;
+        projectId = randomProjectIdOrDefault();


I assume we do the multiProject switch because we want to cover when the project resolver doesn't support multiple projects?

Could you add a comment explaining that? (also in GeoIpProcessorFactoryTests.java)

I don't think the last projectId assignment is correct, right? I think that line should be deleted.

Ah good catch, that's a left over from before.

- loads the downloaded GeoIP databases from system index to ingest node file system for each project - each project's databases are loaded to directory `tmp/geoip-databases/{nodeId}/{projectId}`

Make GeoIP database node loader project-aware

228e538

elasticsearchmachine added the v9.1.0 label Jun 23, 2025

samxbr mentioned this pull request Jun 23, 2025

Make GeoIP database node loader project-aware #129137

Closed

elasticsearchmachine and others added 2 commits June 23, 2025 07:43

[CI] Auto commit changes from spotless

ae1f1d4

format log

fc99efc

samxbr added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jun 23, 2025

samxbr requested a review from PeteGillinElastic June 23, 2025 09:19

samxbr marked this pull request as ready for review June 23, 2025 09:20

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 23, 2025

PeteGillinElastic reviewed Jun 23, 2025

View reviewed changes

nielsbauman reviewed Jun 23, 2025

View reviewed changes

samxbr and others added 6 commits June 24, 2025 13:56

Address comments

e7a5432

log format

23fb681

Merge branch 'main' into feature/multi-project/geoip-node-loader

5a04183

[CI] Auto commit changes from spotless

95b9bad

use metadata setting

acbf4e4

minor

c069583

samxbr requested review from PeteGillinElastic and nielsbauman June 24, 2025 08:48

nielsbauman approved these changes Jun 24, 2025

View reviewed changes

samxbr added 2 commits June 25, 2025 10:57

minor

1e91f60

Merge branch 'main' into feature/multi-project/geoip-node-loader

cd03a0c

samxbr merged commit 41a47c2 into elastic:main Jun 25, 2025
33 checks passed

mashhurs mentioned this pull request Jun 26, 2025

Sync up with Elasticsearch main on GeoIP changes. elastic/logstash-filter-elastic_integration#309

Merged

mergify bot mentioned this pull request Jun 26, 2025

[9.1] (backport #309) Sync up with Elasticsearch main on GeoIP changes. elastic/logstash-filter-elastic_integration#310

Merged

Make GeoIP database node loader project-aware #129829

Make GeoIP database node loader project-aware #129829

Uh oh!

Conversation

samxbr commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 23, 2025

Uh oh!

PeteGillinElastic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nielsbauman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samxbr commented Jun 23, 2025 •

edited

Loading