The GeoIpCache has suboptimal performance on cache hits

The `geoip` processor is one of our more costly ones (see [comment](https://github.com/elastic/elasticsearch/issues/96116#issuecomment-1548644191)), and a common pattern is to have several of them in a pipeline (e.g. a location and asn geoip processor for both a source and destination ip address, so that'd be four processors).

We have a cache around the actual geoip database lookup, but even in the case of a cache hit, it's still relatively expensive.

**First**, the cache is roughly an `InetAddress` --> `Response` map. However, we receive ip addresses as strings -- if the cache used `String` keys we could avoid even converting the String to an `InetAddress`.


**Second**, the cached values are not small, here these `CityResponse` objects are ~7kb a piece:

![Screen Shot 2023-05-15 at 5 37 29 PM](https://github.com/elastic/elasticsearch/assets/187034/dfd8bd61-bcac-45e5-87dc-a7db667aaf2a)

If we cached a more compact 'destination' object (e.g. a `Map<String, Object>`), we could expand the size (in objects) of the cache while not increasing the memory footprint (in bytes) -- that is, for a given bytes-size of cache we could have more records cached.

**Third**, the cached values are not optimized for cache-y usage. Consider for example this code snippet:

https://github.com/elastic/elasticsearch/blob/e81f461aa17b7bfa43ea58d851849fd5604f2e21/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java#L231

and then see the implementation of `getName` itself:

https://github.com/maxmind/GeoIP2-java/blob/a64bd49181c9c362d883477d910c165aec6f6116/src/main/java/com/maxmind/geoip2/record/AbstractNamedRecord.java#L43-L51

Each time we process a cache hit, we call several methods that end up invoking a `getName()` and those end up doing a map scan to find a result `String`, but the returned `String` would be the same each time -- rather than caching a `Response` and building an object from it N times, we could build such an object once and just use it as the cached value.

----

That is, we're paying a price to build the cache key (`String` to `InetAddress` conversion), and we're paying both a memory and cpu price for using the `Response` as the cache value.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The GeoIpCache has suboptimal performance on cache hits #96116

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The GeoIpCache has suboptimal performance on cache hits #96116

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions