Skip to content

Updates geodist#133

Open
JanLinnenbrink wants to merge 1 commit intoHannaMeyer:masterfrom
JanLinnenbrink:master
Open

Updates geodist#133
JanLinnenbrink wants to merge 1 commit intoHannaMeyer:masterfrom
JanLinnenbrink:master

Conversation

@JanLinnenbrink
Copy link
Contributor

Implements more flexible prediction point generation in geodist and aligns the distance calculation logic with that in (k)NNDM.

#130

…ligns the distance calculation logic with that in (k)NNDM. HannaMeyer#130
@Nowosad
Copy link
Contributor

Nowosad commented Feb 2, 2026

@JanLinnenbrink :

  1. An argument name could be improved: space -> dist_space (I think the second one is easier to understand)
  2. Another argument -- useMD -- may also be improved. My suggestion would be to rename it, e.g., to dist_fun (I use this name in several of my packages) and set it to Euclidean by default, with the option to change it to Mahalanobis. This way, we will have an open door for future expansions (if needed).
  3. The cvtrain argument is gone. I do not think I ever used it, but maybe some other people did. Should we add information about this change and a possible workaround somewhere?
  4. I do not know how big changes you want to make, but maybe for consistency, we should have "timevar" and "timeunit" or "time_var" and "time_unit"?
  5. The sapply function should be avoided inside of functions. Instead, vapply is a usual better replacement, e.g., "timevar <- names(which(vapply(x, lubridate::is.Date, logical(1))))".
  6. ref_crs -- what if the modeldomain is a raster object?
  7. terra::vect is not needed in terra::extract (just to let you know -- we can keep the code as it is)
  8. I just noticed that in the past code you used na.rm=TRUE in terra::extract and now this argument is gone. Is it on purpose?
  9. More general question: it seems that scaling is performed inside the geodist function. Wouldn't it be better if that were an option? Is this default scaling always the best approach?
  10. When calling internal functions, I think it would be best to do it with named arguments (for clarity). E.g., it is hard to understand the sample2sample or prediction2sample call now.
  11. Why are sample2sample and prediction2sample using the islonglat and tcoords arguments? Would it not be cleaner if those were directly read from x?
  12. sample2sample, prediction2sample, test2sample functions have a lot of repeated code. Have you considered writing some more general function for distance calculations (or distance calculations given the "space")?
  13. The last code chunk in the vignette 2 returns "Warning message: [extract] transforming vector data to the CRS of the raster" twice. Shouldn't we guard the CRSs earlier, so this message is not shown? Similarly, do we need the "although coordinates are longitude/latitude, st_intersects assumes that they are planar" message there?
  14. Documentation: the NEWS.md file should be updated
  15. Have you considered using air for code formatting (after my review) to ease future code changes? (Or maybe even we could "air" the whole code base before the CAST-days?)
    (16. There is also a discussion on terminology including "sample-to-sample, etc.)

@Nowosad
Copy link
Contributor

Nowosad commented Feb 2, 2026

CC: @HannaMeyer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants