Skip to content

Commit 64c2426

Browse files
committed
[BinaryPlatforms] Change "shortest match" algorithm to "best match"
My assertion in the previous attempt to fix this issue was incorrect: > We define a simpler match as one that has fewer tags overall. > As these candidate matches have already been filtered to match the > given platform, the only other tags that exist are ones that are in > addition to the tags declared by the platform. Hence, selecting the > minimum in number of tags is equivalent to selecting the closest match. This is demonstrably false, by my own test case: ``` platforms = Dict( Platform("x86_64", "linux") => "bad", Platform("x86_64", "linux"; sanitize="memory") => "good", ) select_platform(platforms, Platform("x86_64", "linux"; sanitize="memory")) == "good" ``` In this case, because there exists a candidate that is _more general_ than the provided platform type, the shortest match is no longer the best match. This PR performs a more rigorous matching that works more reliably in all cases.
1 parent b554e8f commit 64c2426

File tree

2 files changed

+36
-22
lines changed

2 files changed

+36
-22
lines changed

base/binaryplatforms.jl

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1068,21 +1068,28 @@ function select_platform(download_info::Dict, platform::AbstractPlatform = HostP
10681068
end
10691069

10701070
# At this point, we may have multiple possibilities. We now engage a multi-
1071-
# stage selection algorithm, where we first choose simpler matches over more
1072-
# complex matches. We define a simpler match as one that has fewer tags
1073-
# overall. As these candidate matches have already been filtered to match
1074-
# the given platform, the only other tags that exist are ones that are in
1075-
# addition to the tags declared by the platform. Hence, selecting the
1076-
# minimum in number of tags is equivalent to selecting the closest match.
1077-
min_tag_count = minimum(length(tags(p)) for p in ps)
1078-
filter!(p -> length(tags(p)) == min_tag_count, ps)
1079-
1080-
# Now we _still_ may continue to have multiple matches, so we now simply sort
1081-
# the candidate matches by their triplets and take the last one, so as to
1082-
# generally choose the latest release (e.g. a `libgfortran5` tarball over a
1083-
# `libgfortran3` tarball).
1084-
p = last(sort(ps, by = p -> triplet(p)))
1085-
return download_info[p]
1071+
# stage selection algorithm, where we first sort the matches by how complete
1072+
# the match is, e.g. preferring matches where the intersection of tags is
1073+
# equal to the union of the tags:
1074+
function match_loss(a, b)
1075+
a_tags = Set(keys(tags(a)))
1076+
b_tags = Set(keys(tags(b)))
1077+
return length(union(a_tags, b_tags)) - length(intersect(a_tags, b_tags))
1078+
end
1079+
1080+
# We prefer these better matches, and secondarily reverse-sort by triplet so
1081+
# as to generally choose the latest release (e.g. a `libgfortran5` tarball
1082+
# over a `libgfortran3` tarball).
1083+
ps = sort(ps, lt = (a, b) -> begin
1084+
loss_a = match_loss(a, platform)
1085+
loss_b = match_loss(b, platform)
1086+
if loss_a != loss_b
1087+
return loss_a < loss_b
1088+
end
1089+
return triplet(a) > triplet(b)
1090+
end)
1091+
1092+
return download_info[first(ps)]
10861093
end
10871094

10881095
# precompiles to reduce latency (see https://github.com/JuliaLang/julia/pull/43990#issuecomment-1025692379)

test/binaryplatforms.jl

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -330,8 +330,7 @@ end
330330
# Ambiguity test
331331
@test select_platform(platforms, P("aarch64", "linux")) == "linux3"
332332
@test select_platform(platforms, P("aarch64", "linux"; libgfortran_version=v"3")) == "linux3"
333-
# This one may be surprising, but we still match `linux3`, and since linux3 is shorter, we choose it.
334-
@test select_platform(platforms, P("aarch64", "linux"; libgfortran_version=v"3", libstdcxx_version=v"3.4.18")) === "linux3"
333+
@test select_platform(platforms, P("aarch64", "linux"; libgfortran_version=v"3", libstdcxx_version=v"3.4.18")) === "linux5"
335334
@test select_platform(platforms, P("aarch64", "linux"; libgfortran_version=v"4")) === nothing
336335

337336
@test select_platform(platforms, P("x86_64", "macos")) == "mac4"
@@ -343,13 +342,21 @@ end
343342
# Sorry, Alex. ;)
344343
@test select_platform(platforms, P("x86_64", "freebsd")) === nothing
345344

346-
# The new "prefer shortest matching" algorithm is meant to be used to resolve ambiguities such as the following:
345+
# The new "most complete match" algorithm deals with ambiguities as follows:
347346
platforms = Dict(
348-
# Typical binning test
349-
P("x86_64", "linux") => "good",
350-
P("x86_64", "linux"; sanitize="memory") => "bad",
347+
P("x86_64", "linux") => "normal",
348+
P("x86_64", "linux"; sanitize="memory") => "sanitized",
349+
)
350+
@test select_platform(platforms, P("x86_64", "linux")) == "normal"
351+
@test select_platform(platforms, P("x86_64", "linux"; sanitize="memory")) == "sanitized"
352+
353+
# Ties are broken by reverse-sorting by triplet:
354+
platforms = Dict(
355+
P("x86_64", "linux"; libgfortran_version=v"3") => "libgfortran3",
356+
P("x86_64", "linux"; libgfortran_version=v"4") => "libgfortran4",
351357
)
352-
@test select_platform(platforms, P("x86_64", "linux")) == "good"
358+
@test select_platform(platforms, P("x86_64", "linux")) == "libgfortran4"
359+
@test select_platform(platforms, P("x86_64", "linux"; libgfortran_version=v"3")) == "libgfortran3"
353360
end
354361

355362
@testset "Custom comparators" begin

0 commit comments

Comments
 (0)