tinydenseR/README.Rmd at main · Novartis/tinydenseR · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    fig.width = 3.5,
    fig.height = 2.5
)
```

# tinydenseR <a href="artwork/tinydenseR_hex_piano_behind.png"><img src="artwork/tinydenseR_hex_piano_behind.png" align="right" height="138" /></a>

<!-- badges: start -->
[![R-CMD-check](https://github.com/Novartis/tinydenseR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Novartis/tinydenseR/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/Novartis/tinydenseR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Novartis/tinydenseR)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE.md)
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![GitHub release](https://img.shields.io/github/v/release/Novartis/tinydenseR?include_prereleases)](https://github.com/Novartis/tinydenseR/releases)
<!-- badges: end -->

## Table of Contents

- [Overview](#overview)
- [Key Features](#key-features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Background](#background)
- [How it Works](#how-it-works)
- [Detailed Example](#detailed-example)
- [Getting Help](#getting-help)
- [Contributing](#contributing)
- [Citation](#citation)
- [License](#license)
- [Note](#note)

## Overview

`tinydenseR` is a landmark‑based R package for single-cell data analysis that goes beyond traditional clustering approaches. Instead of treating each cell as an independent biological replicate, tinydenseR considers samples as the true biological units, enabling more accurate statistical modeling and interpretation.

**Why use `tinydenseR`?** Traditional single-cell analysis methods rely heavily on clustering, which can be oversimplified and subjective. `tinydenseR` provides a clustering-independent framework that preserves biological complexity while maintaining statistical rigor.

## Key Features

- **🎯 Sample-centric analysis**: Treats samples, not cells, as biological replicates for proper statistical modeling
- **🚀 Memory efficient**: Handles atlas-scale datasets with minimal memory footprint
- **🔬 Multi-technology support**: Works with scRNA-seq, flow cytometry, mass cytometry (multi-modal data support coming soon)
- **📊 Rich visualizations**: Built-in plotting functions for exploring results
- **🔗 Clinical integration**: Links cell-level variation to clinical and experimental outcomes
- **⚡ Fast processing**: Efficient algorithms for large-scale data analysis

## Installation

### System Requirements

- R version 4.1 or higher

You can install `tinydenseR` from GitHub using devtools:

```r
# Install devtools if you haven't already
if (!require("devtools")) install.packages("devtools")

# Install `tinydenseR`
devtools::install_github("Novartis/tinydenseR")
```

### Dependencies

`tinydenseR` requires R (>= 4.1) and several Bioconductor and CRAN packages. Most dependencies will be installed automatically, but you may need to install Bioconductor and its dependencies first:

```r
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

# Download DESCRIPTION from GitHub
data_url <-
  "https://raw.githubusercontent.com/Novartis/tinydenseR/main/DESCRIPTION"
temp_file <-
  tempfile()
utils::download.file(
  url = data_url,
  destfile = temp_file,
  mode = "wb",
  quiet = TRUE)

# Parse Imports
desc <-
  read.dcf(file = temp_file)
imports <-
  strsplit(x = desc[, "Imports"],
           split = "\\s*,\\s*")[[1]]
imports <-
  gsub(pattern = "\\s*\\(.*?\\)",
       replacement = "",
       x = imports)  # remove version constraints

# Install only missing Bioconductor packages
avail.bioc.pkgs <-
  BiocManager::repositories() |>
  (\(x)
  available.packages(repos = x)
  )() |>
  rownames()

bioc_pkgs <-
  imports[imports %in%
            avail.bioc.pkgs[!avail.bioc.pkgs %in%
                              (installed.packages() |>
                                 rownames())]]
if (length(bioc_pkgs) > 0) {
  BiocManager::install(pkgs = bioc_pkgs)
}

unlink(temp_file)

```

### Example Data

Examples in this README use simulated trajectory data that is automatically fetched from the [miloR package repository](https://github.com/MarioniLab/miloR). This data is sourced from:

> Dann, E., Henderson, N.C., Teichmann, S.A. et al. Differential abundance testing on single-cell data using k-nearest neighbor graphs. *Nat Biotechnol* (2021). https://doi.org/10.1038/s41587-021-01033-z


## Quick Start

Here's a minimal example to get you started:

```{r eval=FALSE}

# Note: This example downloads data from miloR (GPL v3 licensed)
# for demonstration purposes only

library(tinydenseR)

# Try to fetch trajectory data from miloR repository
# If no internet connection, use miloR package directly
if (curl::has_internet()) {
    # Fetch example data from miloR repository
    sim_trajectory <- fetch_trajectory_data()

} else {
    # Fall back to using miloR package directly
    message("No internet connection detected. Using miloR package data directly.")
    library(miloR)
    data(sim_trajectory)
    SummarizedExperiment::colData(x = sim_trajectory$SCE) <-
        S4Vectors::DataFrame(as.list(x = sim_trajectory$meta))
    colnames(x = sim_trajectory$SCE) <-
        sim_trajectory$meta$cell_id
}

# Extract components
sim_trajectory.meta <-
    sim_trajectory$meta
sim_trajectory <-
    sim_trajectory$SCE

# Create .min.meta for 2-sample example
.min.meta <-
    tinydenseR::get.meta(.obj = sim_trajectory,
                         .sample.var = "Sample",
                         .verbose = FALSE)[c("A_R1", "B_R1"),]

# Create .cells object using SCE method
.min.cells <-
    tinydenseR::get.cells(.exprs = sim_trajectory,
                          .meta = .min.meta,
                          .sample.var = "Sample")[rownames(x = .min.meta)]

# Set up the landmark object
lm.cells <-
    tinydenseR::setup.lm.obj(
        .cells = .min.cells,
        .meta = .min.meta,
        .assay.type = "RNA",
        .prop.landmarks = 0.05
    ) |>
    tinydenseR::get.landmarks(.nHVG = 500,
                              .nPC = 3) |>
    tinydenseR::get.graph(.k = 5) |>
    tinydenseR::get.map()

# Visualize results
tinydenseR::plotPCA(.lm.obj = lm.cells,
                    .point.size = 1,
                    .panel.size = 1.5)
```

## Background

Single-cell technologies have revolutionized our understanding of cellular biology, but current analysis methods face significant challenges:

### The Problem with Current Methods

**Clustering limitations**: Most single-cell analysis tools rely heavily on clustering algorithms, which can be:

- Oversimplistic for complex biological systems

- Sensitive to parameter choices and method selection

- Poor at capturing cell states at cluster boundaries

- Subjective and labor-intensive to optimize

**Statistical modeling issues**: Traditional approaches treat each cell as an independent biological replicate, which:

- Ignores the hierarchical structure of biological systems (cells within samples)

- Exaggerates differences between cell populations

- Can lead to misleading statistical conclusions

### The `tinydenseR` Solution

`tinydenseR` addresses these challenges by:

1. **Using samples as biological replicates**: This respects the true experimental design and enables proper statistical inference
2. **Providing clustering-independent analysis**: Reduces subjectivity and captures biological complexity more accurately
3. **Linking cellular variation to outcomes**: Connects cell-level changes to clinical, experimental, or treatment variables
4. **Scaling to large datasets**: Efficient algorithms handle atlas-scale data with minimal memory requirements

This approach enables researchers to answer the key question: **"How does cellular variation relate to sample-level outcomes?"**

## How it Works

`tinydenseR` uses a straightforward three-step process to analyze single-cell data:

<a href="man/figures/README1.png"><img src="man/figures/README1.png" align="center" height="600" /></a>

### Step 1: Identify Landmarks 🗺️
- Select representative cells that capture the diversity of your entire dataset
- These "landmark" cells serve as reference points for all subsequent analysis

### Step 2: Map Cells to Landmarks 📍
For each sample:

- **Map**: Determine how similar each cell is to each landmark

- **Calculate probabilities**: Estimate the likelihood that each cell belongs to each landmark's neighborhood

- **Sum probabilities**: Add up all the probabilities for each landmark (this gives you the "density" estimate)

*Higher density = many cells are similar to that landmark*

### Step 3: Link to Outcomes 🔗

- Use the density estimates as features in statistical models

- Connect cellular variation patterns to your experimental variables, treatments, or clinical outcomes

- Generate insights about which cell states are associated with your conditions of interest

<a href="man/figures/README2.png"><img src="man/figures/README2.png" align="center" height="600" /></a>

- Use cell states associated with outcomes to identify key genes or markers

- `tinydenseR` automatically generates pseudo-bulks of cells mapped to landmarks of interest

<a href="man/figures/README3.png"><img src="man/figures/README3.png" align="center" height="250" /></a>


This approach captures the complexity of cell-to-cell variation while maintaining statistical rigor by treating samples (not individual cells) as the unit of biological replication.


## Detailed Example

This example demonstrates a complete `tinydenseR` analysis workflow using simulated trajectory data with two conditions (A and B) across three replicates each.

### Load Libraries and Data

```{r include=TRUE, echo=TRUE, eval=TRUE}

# Note: This example downloads data from miloR (GPL v3 licensed)
# for demonstration purposes only

library(tinydenseR)
library(tidyverse)

# Check package version
if(utils::packageVersion(pkg = "tinydenseR") < "0.0.1.0013") {
  stop("please update the installation of tinydenseR")
}

# Try to fetch trajectory data from miloR repository
# If no internet connection, use miloR package directly
if (curl::has_internet()) {
  # Fetch trajectory data from miloR repository
  sim_trajectory <- fetch_trajectory_data()

  # Extract components
  sim_trajectory.meta <- sim_trajectory$meta
  sim_trajectory <- sim_trajectory$SCE
} else {
  # Fall back to using miloR package directly
  message("No internet connection detected. Using miloR package data directly.")
  library(miloR)
  data(sim_trajectory)

  # Extract components (miloR format is already the expected structure)
  sim_trajectory.meta <- sim_trajectory$meta
  sim_trajectory <- sim_trajectory$SCE

  SingleCellExperiment::colData(x = sim_trajectory) <-
  sim_trajectory.meta

}
```

### Prepare Data for Analysis

```{r include=TRUE, echo=TRUE, eval=TRUE}
# Create .meta object containing sample-level data
.meta <- get.meta(.obj = sim_trajectory,
                  .sample.var = "Sample",
                  .verbose = FALSE)

# Create .cells object using SCE method
.cells <- get.cells(.exprs = sim_trajectory,
                    .meta = .meta,
                    .sample.var = "Sample")[rownames(.meta)]

```

### Set Up Landmark Object

```{r include=TRUE, echo=TRUE, eval=TRUE}
set.seed(seed = 123)

# Create the main tinydenseR object
lm.cells <- tinydenseR::setup.lm.obj(
    .cells = .cells,                    # Expression data
    .meta = .meta,                      # Sample metadata  ,
    .assay.type = "RNA",              # Data type
    .prop.landmarks = 0.15,           # Proportion of cells to use as landmarks
    .verbose = FALSE
) |>
    # Find highly variable genes and create landmarks
    tinydenseR::get.landmarks(.nHVG = 500,
                              .nPC = 3,
                              .verbose = FALSE) |>

    # Build neighborhood graph
    tinydenseR::get.graph(
        .cl.resolution.parameter = 2e2,
        .k = 10,
        .small.size = 3,
        .verbose = FALSE
    )

# Map all cells to landmarks
lm.cells <- tinydenseR::get.map(.lm.obj = lm.cells,
                               .verbose = FALSE)
```

### Access fuzzy landmark-by-sample density matrix

```{r include=TRUE, echo=TRUE, eval=TRUE}
# View first 10 landmarks and their density estimates across samples
lm.cells$map$fdens |>
  (\(x)
   x[, order(colnames(x = x))]
   )() |>
  round(digits = 2) |>
  head(n = 10) |>
  knitr::kable()
```

### Statistical Analysis

```{r include=TRUE, echo=TRUE, eval=TRUE}
# Set up design matrix for statistical testing
.design <- model.matrix(object = ~ Condition + Replicate,
                       data = lm.cells$metadata)

# Test for differential abundance between conditions
condition.stats <- tinydenseR::get.stats(
    .lm.obj = lm.cells,
    .design = .design,
    .verbose = FALSE
)

# Perform differential expression analysis
.dea <- tinydenseR::get.dea(
    .lm.obj = lm.cells,
    .design = .design,
    .verbose = FALSE
)
```

### Visualizations

Landmarks with differential density:

```{r include=TRUE, echo=TRUE, eval=FALSE, dpi=300}
# Show density fold changes between conditions
tinydenseR::plotPCA(
    .lm.obj = lm.cells,
    .feature = condition.stats$fit$coefficients[,"ConditionB"],
    .panel.size = 1.5,
    .point.size = 1,
    .color.label = "estimated density\nlog2 fold change",
    .midpoint = 0
)

# Highlight significantly different regions
tinydenseR::plotPCA(
    .lm.obj = lm.cells,
    .feature = ifelse(
        test = condition.stats$fit$coefficients[,"ConditionB"] < 0,
        yes = "less abundant",
        no = "more abundant") |>
        ifelse(
            test = condition.stats$fit$pca.weighted.q[,"ConditionB"] < 0.1,
            no = "not sig.") |>
        factor(levels = c("less abundant", "not sig.", "more abundant")),
    .cat.feature.color = Color.Palette[1,c(1,6,2)],
    .color.label = "q < 0.1",
    .point.size = 1,
    .panel.size = 1.5
)
```

<a href="man/figures/README-unnamed-chunk-8-1.png"><img src="man/figures/README-unnamed-chunk-8-1.png" align="center" height="300" /></a>

<a href="man/figures/README-unnamed-chunk-8-2.png"><img src="man/figures/README-unnamed-chunk-8-2.png" align="center" height="300" /></a>

Samples quantitatively embedded along the Condition axis:

```{r include=TRUE, echo=TRUE, eval=TRUE}
# Create reduced model to embed samples quantitatively along the Condition axis
red.design <- model.matrix(object = ~ Replicate,
                       data = lm.cells$metadata)

red.stats <- tinydenseR::get.stats(
    .lm.obj = lm.cells,
    .design = red.design,
    .verbose = FALSE
)

# update stats results to get sample embedding
condition.stats <-
  tinydenseR::get.embedding(
    .lm.obj = lm.cells,
    .stats.obj = condition.stats,
    .term.of.interest = "Condition",
    .red.stats.obj = red.stats,
    .verbose = FALSE
)
```

```{r include=TRUE, echo=TRUE, eval=FALSE, dpi=300}
# Embed samples based on differences along Condition
tinydenseR::plotSampleEmbedding(
    .lm.obj = lm.cells,
    .stats.obj = condition.stats,
    .embedding.slot = "Condition",
    .color.by = "Condition",
    .cat.feature.color = tinydenseR::Color.Palette[1,c(1,2)],
    .panel.size = 1.5,
    .point.size = 2
) +
  ggplot2::labs(title = "Quantitative Sample Embedding",
                subtitle = "in relation to Condition") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5),
                 plot.subtitle = ggplot2::element_text(hjust = 0.5))

```

<a href="man/figures/README-unnamed-chunk-10-1.png"><img src="man/figures/README-unnamed-chunk-10-1.png" align="center" height="300" /></a>

### Explore Individual Genes

```{r include=TRUE, echo=TRUE, eval=FALSE, dpi=300}
# Find most downregulated gene in condition B
most_down_gene <- sort(.dea$coefficients[,"ConditionB"])[1] |> names()

tinydenseR::plotPCA(
    .lm.obj = lm.cells,
    .feature = lm.cells$lm[,most_down_gene],
    .panel.size = 1.5,
    .point.size = 1,
    .color.label = most_down_gene
)

# Find most upregulated gene in condition B
most_up_gene <- sort(.dea$coefficients[,"ConditionB"], decreasing = TRUE)[1] |> names()

tinydenseR::plotPCA(
    .lm.obj = lm.cells,
    .feature = lm.cells$lm[,most_up_gene],
    .panel.size = 1.5,
    .point.size = 1,
    .color.label = most_up_gene
)
```

<a href="man/figures/README-unnamed-chunk-9-1.png"><img src="man/figures/README-unnamed-chunk-9-1.png" align="center" height="300" /></a>

<a href="man/figures/README-unnamed-chunk-9-2.png"><img src="man/figures/README-unnamed-chunk-9-2.png" align="center" height="300" /></a>


### Interactive Exploration

```{r include=TRUE, echo=TRUE, eval=FALSE, dpi=300}
# Add feature statistics for interactive exploration
lm.cells <-
  tinydenseR::get.lm.features.stats(.lm.obj = lm.cells)

# Create interactive plot with hover information
tinydenseR::plotPCA(
    .lm.obj = lm.cells,
    .hover.stats = "marker",
    .panel.size = 1.5,
    .point.size = 1
)
```

## Getting Help

### Documentation

- **Function documentation**: Use `?function_name` in R for detailed help on any function

- **Reproducible scripts**: Check the `inst/scripts/` directory for example workflows

### Troubleshooting Common Issues

**Installation problems:**

- Ensure you have R >= 4.1

- Install BiocManager first: `install.packages("BiocManager")`

- Try installing dependencies manually if automatic installation fails

**Questions and Support:**

- 🐛 Report bugs: [GitHub Issues](https://github.com/Novartis/tinydenseR/issues)

- 💬 Discussions: Use GitHub Discussions for general questions

## Contributing

We welcome contributions to `tinydenseR`! Here's how you can help:

### Types of Contributions

- 🐛 **Bug reports**: Found an issue? Please report it!

- ✨ **Feature requests**: Have an idea for improvement? We'd love to hear it!

- 📖 **Documentation**: Help improve our docs and examples

- 🧪 **Testing**: Add test cases or test on your data

- 💻 **Code**: Submit bug fixes or new features

### How to Contribute

1. **Fork** the repository on GitHub

2. **Create** a new branch for your changes

3. **Make** your changes and add tests if applicable

4. **Test** your changes thoroughly

5. **Submit** a pull request with a clear description

## Citation

If you use `tinydenseR` in your research, please cite:

```
Milanez-Almeida, P. et al. (2025). Sample-level modeling of single-cell data at scale with tinydenseR. bioRxviv https://doi.org/10.1101/2025.11.26.690752.
```

## License

The code is licensed under the MIT License (see [LICENSE.md](LICENSE.md)).

The sticker artwork (PNG) is licensed under CC0 (see [LICENSE-artwork](LICENSE-artwork)).

Copyright 2025 Novartis Biomedical Research Inc.

## Note

This is an open‑source package by the authors; not an official Novartis mark or program.

---

*tinydenseR: Making single-cell analysis more rigorous, one sample at a time* 🧬📊