-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make GeoIP database node loader project-aware #129829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make GeoIP database node loader project-aware #129829
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
PeteGillinElastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, just a few comments. I'll hold off on approving because @nielsbauman said he'd like to take a look as well.
|
|
||
| @Before | ||
| private void setup() { | ||
| projectId = ProjectId.DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that there's more work required to make this pass with a random project? Can we add a @FixForMultiProject to make sure we remember to come back to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractGeoIpIT extends ESIntegTestCase. We currently don't have support for running internal cluster tests in MP mode yet. Therefore, we're bound to use the default project ID in these tests.
That said, we don't need to reinitialize this field for every test. At the very least, we should make it a private final or even a private static final, although I'm personally more leaning towards just passing the ProjectId.DEFAULT constant where we need it, as a private static final doesn't feel super valuable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly as Niels said, ESIntegTestCase is not MP enabled yet, I put a @FixForMultiProject as reminder. I prefer keeping it as a class variable instead of local variable since it's easier change it later at a single place for all tests.
| private ProjectResolver projectResolver; | ||
|
|
||
| private final ConcurrentMap<String, DatabaseReaderLazyLoader> databases = new ConcurrentHashMap<>(); | ||
| private final ConcurrentMap<ProjectId, ConcurrentMap<String, DatabaseReaderLazyLoader>> databases = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make things simpler or not to use a map with a two-member record as a key, rather than the nested maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would save having to do the databases.computeIfAbsent(projectId, (k) -> new ConcurrentHashMap<>()) dance in a few places. On the other hand, it would mean that in the removeStale... method you'd have iterate over everything and filter rather than being able to go straight to the map for the project in question... I'm not sure which is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always struggle with these kinds of tradeoffs. FWIW, avoiding the "dance" could also be partially mitigated by adding a private method that does the retrieval. I think the performance impact of the iteration in removeStale is acceptable, as that method won't be called with a high frequency. I am personally usually more a fan of the nested maps as it avoids an extra record class and extra object instances/creations. I don't think there are very strong arguments either way in this case, so I don't think it matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used nested map as it saves object creation as Niels pointed out (and we use nested map in a few other places so maybe more consistent?). record key does make the map initialization easier. I don't have a strong opinion either way. Unless anyone feels strongly about this, I will just leave it unchanged :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly :-)
| @Override | ||
| public FileVisitResult visitFileFailed(Path file, IOException e) { | ||
| if (e instanceof NoSuchFileException == false) { | ||
| // https://github.com/elastic/elasticsearch/issues/104782 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment is meant to tell the reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my IDE gives me a warning since it thinks I should favor parameterized log {}, but that would fail due to logger check. I put a comment to point to a previous issue that explains it. I updated the comment to make it clearer.
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java
Outdated
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java
Outdated
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java
Outdated
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java
Outdated
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseNodeService.java
Outdated
Show resolved
Hide resolved
| private ProjectResolver projectResolver; | ||
|
|
||
| private final ConcurrentMap<String, DatabaseReaderLazyLoader> databases = new ConcurrentHashMap<>(); | ||
| private final ConcurrentMap<ProjectId, ConcurrentMap<String, DatabaseReaderLazyLoader>> databases = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always struggle with these kinds of tradeoffs. FWIW, avoiding the "dance" could also be partially mitigated by adding a private method that does the retrieval. I think the performance impact of the iteration in removeStale is acceptable, as that method won't be called with a high frequency. I am personally usually more a fan of the nested maps as it avoids an extra record class and extra object instances/creations. I don't think there are very strong arguments either way in this case, so I don't think it matters much.
nielsbauman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left one more comment but other than that LGTM, so I'm approving to allow you to merge in your morning.
| boolean multiProject = randomBoolean(); | ||
| projectId = multiProject ? randomProjectIdOrDefault() : ProjectId.DEFAULT; | ||
| projectResolver = multiProject ? TestProjectResolvers.singleProject(projectId) : TestProjectResolvers.DEFAULT_PROJECT_ONLY; | ||
| projectId = randomProjectIdOrDefault(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we do the multiProject switch because we want to cover when the project resolver doesn't support multiple projects?
- Could you add a comment explaining that? (also in
GeoIpProcessorFactoryTests.java) - I don't think the last
projectIdassignment is correct, right? I think that line should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch, that's a left over from before.
- loads the downloaded GeoIP databases from system index to ingest node file system for each project
- each project's databases are loaded to directory `tmp/geoip-databases/{nodeId}/{projectId}`
Make GeoIP database node loader project-aware:
tmp/geoip-databases/{nodeId}/{projectId}Note: more work is needed to make the REST tests run in MP mode
Apologies for the change in many classes, they are related and hard to split to separate PRs. Some
FixForMultiProjectannotation will be fixed in separate PR's to reduce the PR size.