Using SingleCellExperiment slows down processing on QFeatures objects

Hello,
Some weeks ago, @lgatto  showed me that the performance of `readQFeatures` is significantly better than `readSCP`. This is because `readSCP` requires an additional conversion step: first, the `QFeatures` object is created with `SE` (SummarizedExperiment) objects, and then these `SE` objects are converted into `SCP` (SingleCellExperiment) objects. This conversion takes a significant amount of time, almost twice as long. 
``` r
Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
1       readSCP 200.7866 215.9122 221.5260 220.8470 231.5974 238.6010    10
2 readQFeatures 128.7114 131.8418 146.7216 136.9536 143.7293 197.8956    10
```
To address this, I modified `readSCP` so that the `QFeatures` object is created directly with `SCP` objects, eliminating the need for a conversion (In fact, I reused the code of readQFeatures but changing the call to `SummarizedExperiment()` to `SingleCellExperiment()`). 

However, to my surprise, this new implementation, which I expected to be faster, actually takes more time than the current `readSCP` implementation.  
``` r
Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
1       readSCP 199.9432 203.7096 217.4162 208.5496 233.9788 256.7043    10
2      readSCP2 370.7522 377.4332 387.5432 387.9019 391.7997 406.4224    10
3 readQFeatures 126.8745 128.1580 135.7015 133.2495 139.8490 157.7520    10
```
After profiling different chunks of code, I found that functions called inside `readQFeatures` processing `SCP` objects are significantly slower than the same functions applied to `SE` objects.

Further investigation revealed that this slowdown is due to the fact that the `SCE` (SingleCellExperiment) class inherits from `RangedSummarizedExperiment` rather than `SummarizedExperiment`. As a result, when methods are called on an `SCE` object, the implementation from `RangedSummarizedExperiment` is used. This implementation is much slower because it often requires a call to `rowRanges`, which takes some computation. For example, the execution time of `rowData` differs noticeably between an `SE` and an `SCE`. Even though difference in runtime between 1500 and 100 microseconds does not seems a lot, but if this operation is performed a lot of time it can impact the performance.

``` r
Unit: microseconds
         expr      min       lq      mean   median        uq       max neval
 rowData(sce) 1524.946 1573.279 2159.7648 1615.196 1672.3090 52978.701   100
  rowData(se)   86.633   98.068  116.2062  115.442  124.5145   311.793   100
```

This means that using `SCE` objects instead of `SE` objects could slow down the entire workflow (for instance see the impact of using `SCE` instead of `SE` in the new implementation of `readSCP` I made).  

To test this in a real use case, I used the [leduc2022](https://uclouvain-cbio.github.io/SCP.replication/articles/leduc2022.html) vignette from `SCP.replication`. For each step of the vignette, I recorded the execution time of this step, comparing the case where the initial `QFeatures` object contained `SCE` objects versus the case where it contained `SE` objects.
Note that to make this comparison possible I needed to remove one step, the `medianCVperCell` step which in the current implementation of `scp` does not allow to work with `SE`. This is caused by a check in the internal function `filterCV` that force the use of a `SCE`, but this function could also work for a `SE`. I assume that other functions from `scp` could have the same issue.

![Image](https://github.com/user-attachments/assets/c308874b-ceb9-4b47-8885-63f1e20b605e)

The results show a performance difference between `SE` and `SCE` for all the steps, but this difference is not as pronounced as in `readSCP`. 

Note that this benchmark was made with the current BioConductor version of QFeatures which does not have the [optimisation for aggregateFeatures](https://github.com/rformassspectrometry/QFeatures/pull/224).

Given that, as far as I understand, the functionalities provided by the `SCE` class are mainly used in the context of `SCPlainer`, wouldn't it be more efficient to use `SE` objects in `QFeatures` while allowing the possibility of exporting an `assay` as an `SCE` object for use with `SCPlainer`, for instance? 

What do you think @lgatto @cvanderaa ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using SingleCellExperiment slows down processing on QFeatures objects #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using SingleCellExperiment slows down processing on QFeatures objects #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions