Conversation
|
I just ran the I’m guessing that it’s normal for some mod10 CHIs to coincidentally pass the mod11 check (and vice versa)? library(tidyverse)
library(phsmethods) # feature/mod10 version
chis <- readr::read_csv("~/chi_dataset.csv")
#> Rows: 105000 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): CHI_NUMBER, MODULUS
#> lgl (1): IS_VALID
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
chis |>
count(MODULUS)
#> # A tibble: 3 × 2
#> MODULUS n
#> <chr> <int>
#> 1 Mod10 50000
#> 2 Mod11 50000
#> 3 N/A 5000
chis <- chis |>
mutate(valid_mod11 = chi_check(CHI_NUMBER, check_mod11 = TRUE, check_mod10 = FALSE),
valid_mod10 = chi_check(CHI_NUMBER, check_mod11 = FALSE, check_mod10 = TRUE),
valid_any = chi_check(CHI_NUMBER))
chis |>
filter(valid_mod11 == "Valid CHI" & MODULUS != "Mod11") |>
count(MODULUS)
#> # A tibble: 1 × 2
#> MODULUS n
#> <chr> <int>
#> 1 Mod10 4530
chis |>
filter(valid_mod10 == "Valid CHI" & MODULUS != "Mod10") |>
count(MODULUS)
#> # A tibble: 1 × 2
#> MODULUS n
#> <chr> <int>
#> 1 Mod11 4920
# All combinations
chis |>
count(MODULUS, valid_mod10, valid_mod11)
#> # A tibble: 5 × 4
#> MODULUS valid_mod10 valid_mod11 n
#> <chr> <chr> <chr> <int>
#> 1 Mod10 Valid CHI Invalid checksum 45470
#> 2 Mod10 Valid CHI Valid CHI 4530
#> 3 Mod11 Invalid checksum Valid CHI 45080
#> 4 Mod11 Valid CHI Valid CHI 4920
#> 5 N/A Invalid checksum Invalid checksum 5000Created on 2025-11-05 with reprex v2.1.1 |
I think definitely some to be expected, although the ~10% number you found is surprising / worrying, as I would have hoped it would be much lower than that! |
Yeah I saw this as well when I tested. The CHIs in that dataset were generated and not real so I'm not sure if the CHIs given to people for real will run into this issue as much? I'm guessing that in reality they can only assign CHIs that haven't already been used for people so the likelihood of this happening (CHI generated with Mod10 also passing Mod11) could be a lot less in practice? I also think that passing both mod11 and mod10 isn't really the issue. The main issue is probably passing Mod10 but not Mod11 as that is not allowed prior to implementation date. I'll do some testing on a real current dataset as that'll give us a chance to see how many CHIs are passing Mod10 and not Mod11 which is not currently allowed so we can see the scale of the issue. |
Co-authored-by: James Hayes (né McMahon) <james.hayes2@phs.scot>
|
@Moohan I've tried updating the vignette to add something about Mod-10. I'm sure wording will need edited (e.g. making a bit more use of 'plain English' but thought it would be good to put something down for now. I think that the vignette has ended up being run through the air formatter - hope that's ok! |
Moohan
left a comment
There was a problem hiding this comment.
Some fairly minor wording change suggestions.
Co-authored-by: James Hayes (né McMahon) <james.hayes2@phs.scot>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
Added Mod 10 check to
checksum()forchi_check(). I have done some initial testing and seems ok to me but definitely needs more testing from others and maybe documentation updates, etc. Happy to take suggestions if coding can be improved as well.Just thought of an idea - maybe include an option to run Mod 11 only or both Mod 11 and Mod 10?