Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/126532.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 126532
summary: TO_IP can handle leading zeros
area: ES|QL
type: bug
issues:
- 125460

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 60 additions & 0 deletions x-pack/plugin/esql/qa/testFixtures/src/main/resources/ip.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,66 @@ str1:keyword |str2:keyword |ip1:ip |ip2:ip
// end::to_ip-result[]
;

convertFromStringLeadingZerosDecimal
required_capability: to_ip_leading_zeros
// tag::to_ip_leading_zeros_decimal[]
ROW s = "1.1.010.1" | EVAL ip = TO_IP(s, {"leading_zeros":"decimal"})
// end::to_ip_leading_zeros_decimal[]
;

// tag::to_ip_leading_zeros_decimal-result[]
s:keyword | ip:ip
1.1.010.1 | 1.1.10.1
// end::to_ip_leading_zeros_decimal-result[]
;

convertFromStringLeadingZerosOctal
required_capability: to_ip_leading_zeros
// tag::to_ip_leading_zeros_octal[]
ROW s = "1.1.010.1" | EVAL ip = TO_IP(s, {"leading_zeros":"octal"})
// end::to_ip_leading_zeros_octal[]
;

// tag::to_ip_leading_zeros_octal-result[]
s:keyword | ip:ip
1.1.010.1 | 1.1.8.1
// end::to_ip_leading_zeros_octal-result[]
;

toIpInAgg
ROW s = "1.1.1.1" | STATS COUNT(*) BY ip = TO_IP(s)
;

COUNT(*):long | ip:ip
1 | 1.1.1.1
;

toIpInSort
ROW s = "1.1.1.1" | SORT TO_IP(s)
;

s:keyword
1.1.1.1
;

toIpInAggOctal
required_capability: to_ip_leading_zeros
ROW s = "1.1.010.1" | STATS COUNT(*) BY ip = TO_IP(s, {"leading_zeros":"octal"})
;

COUNT(*):long | ip:ip
1 | 1.1.8.1
;

toIpInSortOctal
required_capability: to_ip_leading_zeros
ROW s = "1.1.010.1" | SORT TO_IP(s, {"leading_zeros":"octal"})
;

s:keyword
1.1.010.1
;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great to have some tests where the ip address with leading 0s appears in predicates with real indices, if they are valid use cases. For example,

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where client.ip == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}   
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where client.ip in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), to_ip(\"172.21.3.15\"))"
}
'
       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15    |725448         |Connection error     |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15    |8268153        |Connection error     |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15    |5033755        |Connection error     |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15    |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.

    {
      "index" : {
        "_index" : "sample_data",
        "_id" : "KBZsG5YBxZF4dlPHvvyx",
        "status" : 400,
        "error" : {
          "type" : "document_parsing_exception",
          "reason" : "[1:57] failed to parse field [client.ip] of type [ip] in document with id 'KBZsG5YBxZF4dlPHvvyx'. Preview of field's value: '172.21.04.15'",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "'172.21.04.15' is not an IP string literal."
          }
        }
      }
    }

However it can be loaded into keyword fields, I changed the schema to have client.ip as keyword.

       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T12:15:03.360Z|172.21.2.162   |3450233        |Connected to 10.1.0.3|2025-11-21T00:00:00.000Z
2023-10-23T12:27:28.948Z|172.21.2.113   |2764889        |Connected to 10.1.0.2|2025-11-22T00:00:00.000Z
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:51:54.732Z|172.21.3.15    |725448         |Connection error     |2025-11-24T00:00:00.000Z
2023-10-23T13:52:55.015Z|172.21.3.15    |8268153        |Connection error     |2025-11-25T00:00:00.000Z
2023-10-23T13:53:55.832Z|172.21.3.15    |5033755        |Connection error     |2025-11-26T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.3.15    |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15   |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

I tried some queries, the to_ip with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip) == \"172.21.0.5\""
}
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip) == to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"})"
}
'
       @timestamp       |   client.ip   |event.duration |    message    |       test_date        
------------------------+---------------+---------------+---------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected   |2025-11-23T00:00:00.000Z
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data | where to_ip(client.ip, {\"leading_zeros\":\"decimal\"}) in ( to_ip(\"172.021.00.05\", {\"leading_zeros\":\"decimal\"}), \"172.21.4.15\" )"
}
'
       @timestamp       |   client.ip   |event.duration |       message       |       test_date        
------------------------+---------------+---------------+---------------------+------------------------
2023-10-23T13:33:34.937Z|172.21.0.5     |1232382        |Disconnected         |2025-11-23T00:00:00.000Z
2023-10-23T13:55:01.543Z|172.21.04.15   |1756467        |Connected to 10.1.0.1|2025-11-27T00:00:00.000Z

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that the ip addresses with leading 0s can appear in index fields? I gave it a try and got this error, it seems like they cannot be loaded into indices as valid ips.

You can't index such a field, no. But with this ESQL can parse them!

However it can be loaded into keyword fields

Just like that, yeah.

I tried some queries, the to_ip with leading 0s options, and it works on index fields too! I don't know if this is a valid use case, it seems like the original Github issue only mention leading 0s in string literals.

I think it's valid. Let me add a test case for it too.

cdirMatchOrsIPs
required_capability: combine_disjunctive_cidrmatches

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -947,7 +947,12 @@ public enum Cap {
/**
* Supercedes {@link Cap#MAKE_NUMBER_OF_CHANNELS_CONSISTENT_WITH_LAYOUT}.
*/
FIX_REPLACE_MISSING_FIELD_WITH_NULL_DUPLICATE_NAME_ID_IN_LAYOUT;
FIX_REPLACE_MISSING_FIELD_WITH_NULL_DUPLICATE_NAME_ID_IN_LAYOUT,

/**
* Support for the {@code leading_zeros} named parameter.
*/
TO_IP_LEADING_ZEROS;

private final boolean enabled;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
import org.elasticsearch.xpack.esql.core.util.Holder;
import org.elasticsearch.xpack.esql.core.util.StringUtils;
import org.elasticsearch.xpack.esql.expression.NamedExpressions;
import org.elasticsearch.xpack.esql.expression.SurrogateExpression;
import org.elasticsearch.xpack.esql.expression.UnresolvedNamePattern;
import org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry;
import org.elasticsearch.xpack.esql.expression.function.FunctionDefinition;
Expand All @@ -58,6 +59,7 @@
import org.elasticsearch.xpack.esql.expression.function.scalar.conditional.Greatest;
import org.elasticsearch.xpack.esql.expression.function.scalar.conditional.Least;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.AbstractConvertFunction;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ConvertFunction;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.FoldablesConvertFunction;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToDouble;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToInteger;
Expand Down Expand Up @@ -1523,10 +1525,12 @@ private LogicalPlan doRule(LogicalPlan plan) {
int alreadyAddedUnionFieldAttributes = unionFieldAttributes.size();
// See if the eval function has an unresolved MultiTypeEsField field
// Replace the entire convert function with a new FieldAttribute (containing type conversion knowledge)
plan = plan.transformExpressionsOnly(
AbstractConvertFunction.class,
convert -> resolveConvertFunction(convert, unionFieldAttributes)
);
plan = plan.transformExpressionsOnly(e -> {
if (e instanceof ConvertFunction convert) {
return resolveConvertFunction(convert, unionFieldAttributes);
}
return e;
});
// If no union fields were generated, return the plan as is
if (unionFieldAttributes.size() == alreadyAddedUnionFieldAttributes) {
return plan;
Expand Down Expand Up @@ -1557,7 +1561,7 @@ private LogicalPlan doRule(LogicalPlan plan) {
return plan;
}

private Expression resolveConvertFunction(AbstractConvertFunction convert, List<FieldAttribute> unionFieldAttributes) {
private Expression resolveConvertFunction(ConvertFunction convert, List<FieldAttribute> unionFieldAttributes) {
if (convert.field() instanceof FieldAttribute fa && fa.field() instanceof InvalidMappedField imf) {
HashMap<TypeResolutionKey, Expression> typeResolutions = new HashMap<>();
Set<DataType> supportedTypes = convert.supportedTypes();
Expand Down Expand Up @@ -1586,7 +1590,7 @@ private Expression resolveConvertFunction(AbstractConvertFunction convert, List<
} else if (convert.field() instanceof AbstractConvertFunction subConvert) {
return convert.replaceChildren(Collections.singletonList(resolveConvertFunction(subConvert, unionFieldAttributes)));
}
return convert;
return (Expression) convert;
}

private Expression createIfDoesNotAlreadyExist(
Expand Down Expand Up @@ -1622,7 +1626,7 @@ private MultiTypeEsField resolvedMultiTypeEsField(FieldAttribute fa, HashMap<Typ
return MultiTypeEsField.resolveFrom(imf, typesToConversionExpressions);
}

private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {
private Expression typeSpecificConvert(ConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {
EsField field = new EsField(mtf.getName(), type, mtf.getProperties(), mtf.isAggregatable());
FieldAttribute originalFieldAttr = (FieldAttribute) convert.field();
FieldAttribute resolvedAttr = new FieldAttribute(
Expand All @@ -1634,7 +1638,19 @@ private Expression typeSpecificConvert(AbstractConvertFunction convert, Source s
originalFieldAttr.id(),
true
);
return convert.replaceChildren(Collections.singletonList(resolvedAttr));
Expression e = convert.replaceChildren(Collections.singletonList(resolvedAttr));
/*
* Resolve surrogates immediately because these type specific conversions are serialized
* and SurrogateExpressions are expected to be resolved on the coordinating node. At least,
* TO_IP is expected to be resolved there.
*/
if (e instanceof SurrogateExpression s) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a duplication of SubstituteSurrogateExpressions in LogicalPlanOptimizer and the transformation seems belong to LogicalPlanOptimizer, but after looking it to closer, I understand why it is here, I couldn't think of a better choice for union typed fields with less code changes... Can we refactor this piece of code, and share it between Analyzer and SubstituteSurrogateExpressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is! I'll bet I can reuse this better. Let me have a look.

Expression surrogate = s.surrogate();
if (surrogate != null) {
return surrogate;
}
}
return e;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToDouble;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToGeoPoint;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToGeoShape;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIP;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToInteger;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosDecimal;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosOctal;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosRejected;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToLong;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToRadians;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToString;
Expand Down Expand Up @@ -192,7 +194,9 @@ public static List<NamedWriteableRegistry.Entry> unaryScalars() {
entries.add(ToGeoShape.ENTRY);
entries.add(ToCartesianShape.ENTRY);
entries.add(ToGeoPoint.ENTRY);
entries.add(ToIP.ENTRY);
entries.add(ToIpLeadingZerosDecimal.ENTRY);
entries.add(ToIpLeadingZerosOctal.ENTRY);
entries.add(ToIpLeadingZerosRejected.ENTRY);
entries.add(ToInteger.ENTRY);
entries.add(ToLong.ENTRY);
entries.add(ToRadians.ENTRY);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import org.elasticsearch.xpack.esql.core.tree.Source;
import org.elasticsearch.xpack.esql.core.type.DataType;
import org.elasticsearch.xpack.esql.core.util.Check;
import org.elasticsearch.xpack.esql.expression.SurrogateExpression;
import org.elasticsearch.xpack.esql.expression.function.aggregate.Avg;
import org.elasticsearch.xpack.esql.expression.function.aggregate.Count;
import org.elasticsearch.xpack.esql.expression.function.aggregate.CountDistinct;
Expand Down Expand Up @@ -54,8 +55,11 @@
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToDouble;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToGeoPoint;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToGeoShape;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIP;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToInteger;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIp;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosDecimal;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosOctal;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToIpLeadingZerosRejected;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToLong;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToRadians;
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToString;
Expand Down Expand Up @@ -228,6 +232,7 @@ public class EsqlFunctionRegistry {
public EsqlFunctionRegistry() {
register(functions());
buildDataTypesForStringLiteralConversion(functions());
nameSurrogates();
}

EsqlFunctionRegistry(FunctionDefinition... functions) {
Expand Down Expand Up @@ -389,7 +394,7 @@ private static FunctionDefinition[][] functions() {
def(ToDouble.class, ToDouble::new, "to_double", "to_dbl"),
def(ToGeoPoint.class, ToGeoPoint::new, "to_geopoint"),
def(ToGeoShape.class, ToGeoShape::new, "to_geoshape"),
def(ToIP.class, ToIP::new, "to_ip"),
def(ToIp.class, ToIp::new, "to_ip"),
def(ToInteger.class, ToInteger::new, "to_integer", "to_int"),
def(ToLong.class, ToLong::new, "to_long"),
def(ToRadians.class, ToRadians::new, "to_radians"),
Expand Down Expand Up @@ -791,6 +796,15 @@ protected void buildDataTypesForStringLiteralConversion(FunctionDefinition[]...
}
}

/**
* Add {@link #names} entries for functions that are not registered, but we rewrite to using {@link SurrogateExpression}.
*/
private void nameSurrogates() {
names.put(ToIpLeadingZerosRejected.class, "TO_IP");
names.put(ToIpLeadingZerosDecimal.class, "TO_IP");
names.put(ToIpLeadingZerosOctal.class, "TO_IP");
}

protected interface FunctionBuilder {
Function build(Source source, List<Expression> children, Configuration cfg);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
* {@link org.elasticsearch.xpack.esql.expression.function.scalar}.
* </p>
*/
public abstract class AbstractConvertFunction extends UnaryScalarFunction {
public abstract class AbstractConvertFunction extends UnaryScalarFunction implements ConvertFunction {

// the numeric types convert functions need to handle; the other numeric types are converted upstream to one of these
private static final List<DataType> NUMERIC_TYPES = List.of(DataType.INTEGER, DataType.LONG, DataType.UNSIGNED_LONG, DataType.DOUBLE);
Expand Down Expand Up @@ -76,11 +76,12 @@ protected TypeResolution resolveType() {
return isTypeOrUnionType(field(), factories()::containsKey, sourceText(), null, supportedTypesNames(supportedTypes()));
}

@Override
public Set<DataType> supportedTypes() {
return factories().keySet();
}

private static String supportedTypesNames(Set<DataType> types) {
static String supportedTypesNames(Set<DataType> types) {
List<String> supportedTypesNames = new ArrayList<>(types.size());
HashSet<DataType> supportTypes = new HashSet<>(types);
if (supportTypes.containsAll(NUMERIC_TYPES)) {
Expand Down
Loading
Loading