Skip to content

error with tidyee_ob |>group_by(a,b,c) |> summarise(stat=stat) when grouping creates >n groups #29

@zackarno

Description

@zackarno

When your grouping creates a huge number of groups to summarise the tidyee object over there seems to be an issue. This wont happen with typical group_by(year) or group_by(year,month) work flows, but can happen if you include doy in the grouping. I have not figured out the limit # of groups or the exact source of the problem, but the reprex below shows the issue and gets passed the first error message and onto the next.

library(tidyrgee)

library(rgee)
ee_Initialize()
#> -- rgee 1.1.2.9000 ---------------------------------- earthengine-api 0.1.295 -- 
#>  v user: not_defined
#>  v Initializing Google Earth Engine: v Initializing Google Earth Engine:  DONE!
#> --------------------------------------------------------------------------------
ic <- ee$ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")
ic_tidy <- as_tidyee(ic)
ic_tidy
#> band names: [ NO2_column_number_density, tropospheric_NO2_column_number_density, stratospheric_NO2_column_number_density, NO2_slant_column_number_density, tropopause_pressure, absorbing_aerosol_index, cloud_fraction, sensor_altitude, sensor_azimuth_angle, sensor_zenith_angle, solar_azimuth_angle, solar_zenith_angle ] 
#> 
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 21,185 x 8
#>    id           time_start          syste~1 date       month  year   doy band_~2
#>    <chr>        <dttm>              <chr>   <date>     <dbl> <dbl> <dbl> <list> 
#>  1 COPERNICUS/~ 2018-06-28 10:45:42 201806~ 2018-06-28     6  2018   179 <chr>  
#>  2 COPERNICUS/~ 2018-06-28 12:27:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  3 COPERNICUS/~ 2018-06-28 14:52:09 201806~ 2018-06-28     6  2018   179 <chr>  
#>  4 COPERNICUS/~ 2018-06-28 15:50:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  5 COPERNICUS/~ 2018-06-28 17:31:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  6 COPERNICUS/~ 2018-06-28 19:13:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  7 COPERNICUS/~ 2018-06-28 20:54:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  8 COPERNICUS/~ 2018-06-28 22:36:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  9 COPERNICUS/~ 2018-06-29 00:17:40 201806~ 2018-06-29     6  2018   180 <chr>  
#> 10 COPERNICUS/~ 2018-06-29 01:59:11 201806~ 2018-06-29     6  2018   180 <chr>  
#> # ... with 21,175 more rows, and abbreviated variable names 1: system_index,
#> #   2: band_names
#> # i Use `print(n = ...)` to see more rows
#> 
#> attr(,"class")
#> [1] "tidyee"


# the l3_NO2 ic has multiple records per day so I want to summarise by dat (i.e  year, month , doy)
# there is a silent failure going on here
ic_summarised_daily <- ic_tidy |>
  group_by(year, month,doy) |>
  summarise(stat = "mean")

# this often happens with `rgee` and thus `tidyrgee`... it seems like the best
# way to check if the object has been created successfully is to try a `$getInfo` call

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): RecursionError: maximum recursion depth exceeded in comparison

# okay a maximum recursion issue - seems reasonable.. under the hood we are splitting 
# the `vrt` and `ic` into thousands of groups... I can increase the recursion limit and see what 
# happens (default is 1000)

sys <-  reticulate::import("sys")
sys$setrecursionlimit(as.integer(5000))

# lets run `$getInfo()` again with the recursion limit increased....

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): ee.ee_exception.EEException: Collection.first: merge() is too deeply nested.

# we get a new error, which took alot longer to appear than the first.

Created on 2022-08-19 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions