Skip to content

Question about stringdist() #104

@JackGuo15

Description

@JackGuo15

Hi Mark,

I hope you're doing well. My name is Ruohan, and I'm a second-year PhD student at UCL. I'm currently using the stringdist package to measure linguistic distance, and I’ve found it incredibly useful. However, I’ve encountered a few issues that I’m hoping you can help clarify.

I’ve been working through the R manual for stringdist (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf), which discusses how different edits (deletion, insertion, substitution, transposition) can be weighted (on page 20). For example, in the case stringdist('ab', 'ba', weight=c(1,1,1,0.5)), the output is "0.5," suggesting that a transposition was performed.

Building on this example, I tried the following cases:

  • 1. stringdist('ab', 'a', weight=c(0.5, 1, 1, 1))
  • I expected an output of "0.5" due to a weighted deletion, but the output was "1."
  • 2. stringdist('ab', 'a', weight=c(1, 0.5, 1, 1))
  • This returned "0.5," which seems to indicate an insertion rather than a deletion.
  • 3. stringdist('a', 'ab', weight=c(0.5, 1, 1, 1))
  • Here, I received the "0.5" output, indicating a weighted deletion.

Given these results, I’m wondering if I might have misunderstood the string distance calculation. Specifically, I assumed that stringdist('ab', 'a') would attempt to match 'ab' to 'a' by deleting a character, while stringdist('a', 'ab') would result in an insertion. Could you clarify how the algorithm determines whether to apply an insertion or deletion in these cases?

Additionally, when I tried stringdist('abc', 'ca', method = "dl", weight = c(1, 0.1, 0.01, 0.001)), I received an output of "0.002," which suggests that two transpositions were performed to match "abc" to "ca." Shouldn’t this also involve a deletion or insertion?

I look forward to your insights. Thank you very much for your time.

Best wishes,
Ruohan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions