Skip to content

Jaccard of empty strings inconsistency on MacOS #100

@pieterhartel

Description

@pieterhartel

The behaviour below is inconsistent on my Mac; on Ubuntu the results are mostly consistent. I cannot reproduce the inconsistency on Ubuntu, but on MacOS see below.

Here is the Jaccard similarity of two empty strings, first as arguments to the stringsim function, and then as components of a vector.

> x <- stringdist::stringsim("","",method="jaccard")
> str(x)
 num 1
> y <- stringdist::stringsim(c("y",""),c("y",""),method="jaccard")
> str(y)
 num [1:2] 1 NaN

Here is another example of inconsistent behaviour:

> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 5)
[1] 1 1
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 3)
[1]   1 NaN
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 1)
[1] 1.0000000 0.3333333

I tried this with a fresh install of the stringdist package:

$ R
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20 (64-bit)
> packageVersion('stringdist')
[1] ‘0.9.10’

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions