Skip to content

Commit 5266f79

Browse files
authored
Add support for the 'Anonymous IP' database to the geoip processor (#107287)
1 parent d6d782a commit 5266f79

File tree

9 files changed

+157
-19
lines changed

9 files changed

+157
-19
lines changed

docs/changelog/107287.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 107287
2+
summary: Add support for the 'Anonymous IP' database to the geoip processor
3+
area: Ingest Node
4+
type: enhancement
5+
issues:
6+
- 90789

docs/reference/ingest/processors/geoip.asciidoc

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ IPv4 or IPv6 address.
99

1010
[[geoip-automatic-updates]]
1111
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
12-
ASN GeoIP2 databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
12+
ASN IP geolocation databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
1313
CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either:
1414

1515
* `ingest.geoip.downloader.eager.download` is set to true
@@ -38,7 +38,7 @@ field instead.
3838
| Name | Required | Default | Description
3939
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
4040
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
41-
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
41+
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to one of the automatically downloaded GeoLite2 databases (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or the name of a supported database file in the `ingest-geoip` config directory.
4242
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
4343
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
4444
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
@@ -47,15 +47,18 @@ field instead.
4747

4848
*Depends on what is available in `database_file`:
4949

50-
* If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
51-
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
50+
* If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`,
51+
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
5252
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
53-
* If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
53+
* If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
5454
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
5555
were configured in `properties`.
5656
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
5757
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
5858
in `properties`.
59+
* If the GeoIP2 Anonymous IP database is used, then the following fields may be added under the `target_field`: `ip`,
60+
`hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added
61+
depend on what has been found and which properties were configured in `properties`.
5962

6063

6164
Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
@@ -109,7 +112,7 @@ Which returns:
109112

110113
Here is an example that uses the default country database and adds the
111114
geographical information to the `geo` field based on the `ip` field. Note that
112-
this database is included in the module. So this:
115+
this database is downloaded automatically. So this:
113116

114117
[source,console]
115118
--------------------------------------------------
@@ -316,14 +319,14 @@ GET /my_ip_locations/_search
316319
////
317320

318321
[[manage-geoip-database-updates]]
319-
==== Manage your own GeoIP2 database updates
322+
==== Manage your own IP geolocation database updates
320323

321-
If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
322-
databases from the Elastic endpoint, you have a few other options:
324+
If you can't <<geoip-automatic-updates,automatically update>> your IP geolocation databases
325+
from the Elastic endpoint, you have a few other options:
323326

324327
* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
325328
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
326-
* <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
329+
* <<manually-update-geoip-databases,Manually update your IP geolocation databases>>
327330

328331
[[use-proxy-geoip-endpoint]]
329332
**Use a proxy endpoint**
@@ -375,7 +378,7 @@ settings API>> to set
375378
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.
376379

377380
[[manually-update-geoip-databases]]
378-
**Manually update your GeoIP2 databases**
381+
**Manually update your IP geolocation databases**
379382

380383
. Use the <<cluster-update-settings,cluster update settings API>> to set
381384
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
@@ -414,22 +417,22 @@ Note that these settings are node settings and apply to all `geoip` processors,
414417
[[ingest-geoip-downloader-enabled]]
415418
`ingest.geoip.downloader.enabled`::
416419
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
417-
If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
420+
If `true`, {es} automatically downloads and manages updates for IP geolocation databases
418421
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
419422
updates and deletes all downloaded databases. Defaults to `true`.
420423

421424
[[ingest-geoip-downloader-eager-download]]
422425
`ingest.geoip.downloader.eager.download`::
423426
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
424-
If `true`, {es} downloads GeoIP2 databases immediately, regardless of whether a
427+
If `true`, {es} downloads IP geolocation databases immediately, regardless of whether a
425428
pipeline exists with a geoip processor. If `false`, {es} only begins downloading
426429
the databases if a pipeline with a geoip processor exists or is added. Defaults
427430
to `false`.
428431

429432
[[ingest-geoip-downloader-endpoint]]
430433
`ingest.geoip.downloader.endpoint`::
431434
(<<static-cluster-setting,Static>>, string)
432-
Endpoint URL used to download updates for GeoIP2 databases. For example, `https://myDomain.com/overview.json`.
435+
Endpoint URL used to download updates for IP geolocation databases. For example, `https://myDomain.com/overview.json`.
433436
Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
434437
each node's <<es-tmpdir,temporary directory>> at `$ES_TMPDIR/geoip-databases/<node_id>`.
435438
Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`,
@@ -440,6 +443,6 @@ The GeoIP downloader uses the JDK's builtin cacerts. If you're using a custom en
440443
[[ingest-geoip-downloader-poll-interval]]
441444
`ingest.geoip.downloader.poll.interval`::
442445
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
443-
How often {es} checks for GeoIP2 database updates at the
446+
How often {es} checks for IP geolocation database updates at the
444447
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
445448
to `3d` (three days).

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/Database.java

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ enum Database {
3939
Property.LOCATION
4040
),
4141
Set.of(
42-
Property.CONTINENT_NAME,
43-
Property.COUNTRY_NAME,
4442
Property.COUNTRY_ISO_CODE,
43+
Property.COUNTRY_NAME,
44+
Property.CONTINENT_NAME,
4545
Property.REGION_ISO_CODE,
4646
Property.REGION_NAME,
4747
Property.CITY_NAME,
@@ -55,11 +55,31 @@ enum Database {
5555
Asn(
5656
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK),
5757
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK)
58+
),
59+
AnonymousIp(
60+
Set.of(
61+
Property.IP,
62+
Property.HOSTING_PROVIDER,
63+
Property.TOR_EXIT_NODE,
64+
Property.ANONYMOUS_VPN,
65+
Property.ANONYMOUS,
66+
Property.PUBLIC_PROXY,
67+
Property.RESIDENTIAL_PROXY
68+
),
69+
Set.of(
70+
Property.HOSTING_PROVIDER,
71+
Property.TOR_EXIT_NODE,
72+
Property.ANONYMOUS_VPN,
73+
Property.ANONYMOUS,
74+
Property.PUBLIC_PROXY,
75+
Property.RESIDENTIAL_PROXY
76+
)
5877
);
5978

6079
private static final String CITY_DB_SUFFIX = "-City";
6180
private static final String COUNTRY_DB_SUFFIX = "-Country";
6281
private static final String ASN_DB_SUFFIX = "-ASN";
82+
private static final String ANONYMOUS_IP_DB_SUFFIX = "-Anonymous-IP";
6383

6484
/**
6585
* Parses the passed-in databaseType (presumably from the passed-in databaseFile) and return the Database instance that is
@@ -79,6 +99,8 @@ public static Database getDatabase(final String databaseType, final String datab
7999
database = Database.Country;
80100
} else if (databaseType.endsWith(Database.ASN_DB_SUFFIX)) {
81101
database = Database.Asn;
102+
} else if (databaseType.endsWith(Database.ANONYMOUS_IP_DB_SUFFIX)) {
103+
database = Database.AnonymousIp;
82104
}
83105
}
84106

@@ -147,7 +169,13 @@ enum Property {
147169
LOCATION,
148170
ASN,
149171
ORGANIZATION_NAME,
150-
NETWORK;
172+
NETWORK,
173+
HOSTING_PROVIDER,
174+
TOR_EXIT_NODE,
175+
ANONYMOUS_VPN,
176+
ANONYMOUS,
177+
PUBLIC_PROXY,
178+
RESIDENTIAL_PROXY;
151179

152180
/**
153181
* Parses a string representation of a property into an actual Property instance. Not all properties that exist are

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseReaderLazyLoader.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
import com.maxmind.db.Reader;
1313
import com.maxmind.geoip2.DatabaseReader;
1414
import com.maxmind.geoip2.model.AbstractResponse;
15+
import com.maxmind.geoip2.model.AnonymousIpResponse;
1516
import com.maxmind.geoip2.model.AsnResponse;
1617
import com.maxmind.geoip2.model.CityResponse;
1718
import com.maxmind.geoip2.model.CountryResponse;
@@ -169,6 +170,12 @@ public AsnResponse getAsn(InetAddress ipAddress) {
169170
return getResponse(ipAddress, DatabaseReader::tryAsn);
170171
}
171172

173+
@Nullable
174+
@Override
175+
public AnonymousIpResponse getAnonymousIp(InetAddress ipAddress) {
176+
return getResponse(ipAddress, DatabaseReader::tryAnonymousIp);
177+
}
178+
172179
boolean preLookup() {
173180
return currentUsages.updateAndGet(current -> current < 0 ? current : current + 1) > 0;
174181
}

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDatabase.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88

99
package org.elasticsearch.ingest.geoip;
1010

11+
import com.maxmind.geoip2.model.AnonymousIpResponse;
1112
import com.maxmind.geoip2.model.AsnResponse;
1213
import com.maxmind.geoip2.model.CityResponse;
1314
import com.maxmind.geoip2.model.CountryResponse;
@@ -53,6 +54,9 @@ public interface GeoIpDatabase {
5354
@Nullable
5455
AsnResponse getAsn(InetAddress ipAddress);
5556

57+
@Nullable
58+
AnonymousIpResponse getAnonymousIp(InetAddress ipAddress);
59+
5660
/**
5761
* Releases the current database object. Called after processing a single document. Databases should be closed or returned to a
5862
* resource pool. No further interactions should be expected.

modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
package org.elasticsearch.ingest.geoip;
1010

1111
import com.maxmind.db.Network;
12+
import com.maxmind.geoip2.model.AnonymousIpResponse;
1213
import com.maxmind.geoip2.model.AsnResponse;
1314
import com.maxmind.geoip2.model.CityResponse;
1415
import com.maxmind.geoip2.model.CountryResponse;
@@ -172,6 +173,7 @@ private Map<String, Object> getGeoData(GeoIpDatabase geoIpDatabase, String ip) t
172173
case City -> retrieveCityGeoData(geoIpDatabase, ipAddress);
173174
case Country -> retrieveCountryGeoData(geoIpDatabase, ipAddress);
174175
case Asn -> retrieveAsnGeoData(geoIpDatabase, ipAddress);
176+
case AnonymousIp -> retrieveAnonymousIpGeoData(geoIpDatabase, ipAddress);
175177
};
176178
}
177179

@@ -340,6 +342,46 @@ private Map<String, Object> retrieveAsnGeoData(GeoIpDatabase geoIpDatabase, Inet
340342
return geoData;
341343
}
342344

345+
private Map<String, Object> retrieveAnonymousIpGeoData(GeoIpDatabase geoIpDatabase, InetAddress ipAddress) {
346+
AnonymousIpResponse response = geoIpDatabase.getAnonymousIp(ipAddress);
347+
if (response == null) {
348+
return Map.of();
349+
}
350+
351+
boolean isHostingProvider = response.isHostingProvider();
352+
boolean isTorExitNode = response.isTorExitNode();
353+
boolean isAnonymousVpn = response.isAnonymousVpn();
354+
boolean isAnonymous = response.isAnonymous();
355+
boolean isPublicProxy = response.isPublicProxy();
356+
boolean isResidentialProxy = response.isResidentialProxy();
357+
358+
Map<String, Object> geoData = new HashMap<>();
359+
for (Property property : this.properties) {
360+
switch (property) {
361+
case IP -> geoData.put("ip", NetworkAddress.format(ipAddress));
362+
case HOSTING_PROVIDER -> {
363+
geoData.put("hosting_provider", isHostingProvider);
364+
}
365+
case TOR_EXIT_NODE -> {
366+
geoData.put("tor_exit_node", isTorExitNode);
367+
}
368+
case ANONYMOUS_VPN -> {
369+
geoData.put("anonymous_vpn", isAnonymousVpn);
370+
}
371+
case ANONYMOUS -> {
372+
geoData.put("anonymous", isAnonymous);
373+
}
374+
case PUBLIC_PROXY -> {
375+
geoData.put("public_proxy", isPublicProxy);
376+
}
377+
case RESIDENTIAL_PROXY -> {
378+
geoData.put("residential_proxy", isResidentialProxy);
379+
}
380+
}
381+
}
382+
return geoData;
383+
}
384+
343385
/**
344386
* Retrieves and verifies a {@link GeoIpDatabase} instance for each execution of the {@link GeoIpProcessor}. Guards against missing
345387
* custom databases, and ensures that database instances are of the proper type before use.

modules/ingest-geoip/src/test/java/org/elasticsearch/ingest/geoip/GeoIpProcessorTests.java

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,39 @@ public void testAsn() throws Exception {
303303
assertThat(geoData.get("network"), equalTo("82.168.0.0/14"));
304304
}
305305

306+
public void testAnonymmousIp() throws Exception {
307+
String ip = "81.2.69.1";
308+
GeoIpProcessor processor = new GeoIpProcessor(
309+
randomAlphaOfLength(10),
310+
null,
311+
"source_field",
312+
loader("/GeoIP2-Anonymous-IP-Test.mmdb"),
313+
() -> true,
314+
"target_field",
315+
ALL_PROPERTIES,
316+
false,
317+
false,
318+
"filename"
319+
);
320+
321+
Map<String, Object> document = new HashMap<>();
322+
document.put("source_field", ip);
323+
IngestDocument ingestDocument = RandomDocumentPicks.randomIngestDocument(random(), document);
324+
processor.execute(ingestDocument);
325+
326+
assertThat(ingestDocument.getSourceAndMetadata().get("source_field"), equalTo(ip));
327+
@SuppressWarnings("unchecked")
328+
Map<String, Object> geoData = (Map<String, Object>) ingestDocument.getSourceAndMetadata().get("target_field");
329+
assertThat(geoData.size(), equalTo(7));
330+
assertThat(geoData.get("ip"), equalTo(ip));
331+
assertThat(geoData.get("hosting_provider"), equalTo(true));
332+
assertThat(geoData.get("tor_exit_node"), equalTo(true));
333+
assertThat(geoData.get("anonymous_vpn"), equalTo(true));
334+
assertThat(geoData.get("anonymous"), equalTo(true));
335+
assertThat(geoData.get("public_proxy"), equalTo(true));
336+
assertThat(geoData.get("residential_proxy"), equalTo(true));
337+
}
338+
306339
public void testAddressIsNotInTheDatabase() throws Exception {
307340
GeoIpProcessor processor = new GeoIpProcessor(
308341
randomAlphaOfLength(10),

modules/ingest-geoip/src/test/java/org/elasticsearch/ingest/geoip/MaxMindSupportTests.java

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,16 @@
6060
*/
6161
public class MaxMindSupportTests extends ESTestCase {
6262

63+
private static final Set<String> ANONYMOUS_IP_SUPPORTED_FIELDS = Set.of(
64+
"anonymous",
65+
"anonymousVpn",
66+
"hostingProvider",
67+
"publicProxy",
68+
"residentialProxy",
69+
"torExitNode"
70+
);
71+
private static final Set<String> ANONYMOUS_IP_UNSUPPORTED_FIELDS = Set.of("ipAddress", "network");
72+
6373
private static final Set<String> ASN_SUPPORTED_FIELDS = Set.of("autonomousSystemNumber", "autonomousSystemOrganization", "network");
6474
private static final Set<String> ASN_UNSUPPORTED_FIELDS = Set.of("ipAddress");
6575

@@ -192,6 +202,8 @@ public class MaxMindSupportTests extends ESTestCase {
192202
);
193203

194204
private static final Map<Database, Set<String>> TYPE_TO_SUPPORTED_FIELDS_MAP = Map.of(
205+
Database.AnonymousIp,
206+
ANONYMOUS_IP_SUPPORTED_FIELDS,
195207
Database.Asn,
196208
ASN_SUPPORTED_FIELDS,
197209
Database.City,
@@ -200,6 +212,8 @@ public class MaxMindSupportTests extends ESTestCase {
200212
COUNTRY_SUPPORTED_FIELDS
201213
);
202214
private static final Map<Database, Set<String>> TYPE_TO_UNSUPPORTED_FIELDS_MAP = Map.of(
215+
Database.AnonymousIp,
216+
ANONYMOUS_IP_UNSUPPORTED_FIELDS,
203217
Database.Asn,
204218
ASN_UNSUPPORTED_FIELDS,
205219
Database.City,
@@ -208,6 +222,8 @@ public class MaxMindSupportTests extends ESTestCase {
208222
COUNTRY_UNSUPPORTED_FIELDS
209223
);
210224
private static final Map<Database, Class<? extends AbstractResponse>> TYPE_TO_MAX_MIND_CLASS = Map.of(
225+
Database.AnonymousIp,
226+
AnonymousIpResponse.class,
211227
Database.Asn,
212228
AsnResponse.class,
213229
Database.City,
@@ -217,7 +233,6 @@ public class MaxMindSupportTests extends ESTestCase {
217233
);
218234

219235
private static final Set<Class<? extends AbstractResponse>> KNOWN_UNSUPPORTED_RESPONSE_CLASSES = Set.of(
220-
AnonymousIpResponse.class,
221236
ConnectionTypeResponse.class,
222237
DomainResponse.class,
223238
EnterpriseResponse.class,
Binary file not shown.

0 commit comments

Comments
 (0)