Skip to content

Conversation

@eleuven
Copy link
Contributor

@eleuven eleuven commented Oct 23, 2025

this is just a suggestion

loess is extremely slow or just impossible to use on even moderate sized data sets

an alternative is locfit which can be added as an option here (or a separate type)

to compare speeds and results see for example:

library(data.table)
n = 10000
dt = data.table(sex = rep(c(1.0,0.0), each = n))
dt[, x := rnorm(.N)]
dt[, y := 1 + 0.5*sex + (1-0.2*sex)*x - (0.5 - 0.1*sex)*x^2 + 0.05*sex*x^3 + rnorm(.N)] dt$sex = as.factor(dt$sex)

# n = 10000:    user  system elapsed 
#             49.860  17.245  87.297 
# n = 20000: Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
plt(y ~ x, dt, type="loess")

# n = 10000:   0.165   0.017   0.213
# n = 20000:   0.333   0.037   0.406 
plt(y ~ x, dt, type=type_loess(locfit=TRUE))

this is just a suggestion

loess is extremely slow or just impossible to use on even moderate sized data sets

an alternative is locfit which can be added as an option here (or a separate type)

to compare speeds and results see for example:

library(data.table)
n = 10000
dt = data.table(sex = rep(c(1.0,0.0), each = n))
dt[, x := rnorm(.N)]
dt[, y := 1 + 0.5*sex + (1-0.2*sex)*x - (0.5 - 0.1*sex)*x^2 + 0.05*sex*x^3 + rnorm(.N)]
dt$sex = as.factor(dt$sex)

# n = 10000:    user  system elapsed 
#             49.860  17.245  87.297 
# n = 20000: Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
plt(y ~ x, dt, type="loess")

# n = 10000:   0.165   0.017   0.213
# n = 20000:   0.333   0.037   0.406 
plt(y ~ x, dt, type=type_loess(locfit=TRUE))
@vincentarelbundock
Copy link
Collaborator

Thanks for this!

If we're going to merge this, we'd need to add to Suggests and also add a call to assert_dependency().

But we should probably have a larger discussion about the extent to which we want to include calls to external packages in the code base. Currently, there is only very minimal use, and this would be a qualitative shift.

If we don't merge this, it may be a good candidate for a "user-designed types" library to be posted on the website.

@eleuven
Copy link
Contributor Author

eleuven commented Oct 23, 2025

yes i realize that. i just wanted to put this out there because the practical limitations of loess makes it mostly useless to me (and to many others working with moderate to large datasets as well i suspect), while it is a plot type i use a lot...

@zeileis
Copy link
Collaborator

zeileis commented Oct 24, 2025

Thanks for the PR Edwin @eleuven and for raising the more general issue about dependencies Vincent @vincentarelbundock. I'm sure Grant @grantmcdermott will already have thought about how to deal with such dependencies - but just in case it is useful, I will post some thoughts.

  • In general, we want to be careful with adding such dependencies in order to keep the package lean and lightweight.
  • Hence we should try to avoid "Depends" or "Imports" dependencies to non-base packages (as we currently do).
  • But careful use of "Suggests" dependencies would be ok in my opinion, provided that these dependencies (a) are of high quality, (b) have been very stable in the past, and (c) do not come with too many dependencies themselves.
  • These criteria would in general be fulfilled by CRAN packages with "Priority: recommended" such as mgcv which we could also consider for a type_gam as an alternative scatterplot smoother.
  • But locfit would also satisfy these criteria because it has been around since before R 1.0.0, is widely used, patches have been made repeatedly by R Core (by BDR in particular), and the only "Imports" dependency is the recommended lattice package.

So I think it would be justifiable to add this into tinyplot. But, of course, adding it as a separate type to an extension gallery (or even extension package, say tinyplotplus or extended tinyplotx) would also be ok.

P.S.: A lesser known fact is that even base R packages have "Suggests" dependencies to non-base packages. These are mostly to "recommended" packages, e.g., grDevices has a "Suggests" dependency to KernSmooth or stats to Matrix and MASS etc. But there are also some further dependencies to other CRAN packages, e.g., from tools and utils to xml2, curl, and commonmark (among others) and from stats to SuppDists etc.

@vincentarelbundock
Copy link
Collaborator

Interesting! Didn't know several of those facts.

@grantmcdermott
Copy link
Owner

Thanks folks. I'm just coming off this extremely busy period at work and need to decompress a bit before I can turn my attention to tinyplot. But in general, I'm open to the idea of supporting enhanced functionality via Suggests and a select group of third-party packages. See also #318 and #359.

@grantmcdermott
Copy link
Owner

grantmcdermott commented Oct 24, 2025

Oh, and a quick aside on the slowness of loess. I fully agree that it is not usable for big datasets. I wonder whether this could be fixed in the base implementation itself?

I seem to recall a recent thread on BlueSky where some users were bemoaning the silent (?) switching from loess to mcv::gam that happens in ggplot2::geom_smooth() with n > some threshold. It's a common problem.

@zeileis
Copy link
Collaborator

zeileis commented Oct 24, 2025

Thanks for the pointer. Indeed geom_smooth() switches to mgcv::gam at 1,000 observations by default (unless the method argument is specified explicitly). See: https://ggplot2.tidyverse.org/reference/geom_smooth.html#arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants