Enhancer-Enhancer Network-based Prediction of Clustered Open Regulatory Elements (CORE) using scATAC-seq data
enCORE is a computational framework for identifying highly interactive enhancer clusters from single-cell chromatin accessibility. enCORE uniquely defines such enhancer clusters as Clustered Open Regulatory Elements (CORE). enCORE operates solely on single-cell ATAC-seq data, without requiring matched single-cell RNA-seq data or multimodal measurements (e.g., 10X Multiome RNA/ATAC).
You can install the development version of enCORE from GitHub with:
# install.packages("devtools")
devtools::install_github("R-Krait/enCORE")The packages listed below are required dependencies for enCORE.
ArchR (>= 1.0.2)TxDb.Hsapiens.UCSC.hg38.knownGene (>= 3.16.0)TxDb.Hsapiens.UCSC.hg19.knownGene (>= 3.2.2)TxDb.Mmusculus.UCSC.mm10.knownGene (>= 3.10.0)org.Hs.eg.db (>= 3.16.0)org.Mm.eg.db (>= 3.16.0)GenomicRanges (>= 1.50.2)GenomicFeatures (>= 1.50.4)ChIPseeker (>= 1.34.1)data.table (>= 1.16.0)dplyr (>= 1.1.4)reshape2 (>= 1.4.4)AnnotationDbi (>= 1.60.2)progress (>= 1.2.2)igraph (>= 1.5.1)mefa4 (>= 0.3-9)parallel (>= 4.2.1)stringr (>= 1.5.1)coop (>= 0.6-3)chromVARmotifs (>= 0.2.0)motifmatchr (>= 1.20.0)kneedle (>= 1.0.0)scales (>= 1.3.0)
The enCORE package also requires command-line tools, STARE & BEDTools.
First, please install mamba as fast alternative to conda for package installation.
conda install conda-forge::mamba
Then, install STARE & BEDTools.
mamba install -c conda-forge -c bioconda stare-abc bedtools
Additionally, you should download .gtf files for enCORE STARE-gABC scoring.
Please download them from https://figshare.com/articles/dataset/GTF_files_for_enCORE_STARE-gABC_scoring/31567372
To execute enCORE, .gtf files should be located in your own working directory.
Please refer to vignettes/introduction.Rmd. The following parts should be modified to match your own dataset.
-
setwd("~/PSJ/enCORE_dev")Replace this path with your own working directory.
-
proj4 <- readRDS("~/PSJ/test_enCORE/Save-Proj_r4/Save-ArchR-Project.rds")Replace this with your own ArchR object saved in
.rdsformat. The object must have been generated after completing IterativeLSI, peak calling, and iterative overlap peak merging procedures. -
proj5$Clusters2 <- mapLabels(proj5$Sample, newLabels = remapClust, oldLabels = names(remapClust))Add the annotation labels for which you want to extract CORE, such as disease status or cell type, to the cell metadata under the name
Clusters2. -
The
output_dirargument in the functionsExtract_initial_enhancer_candidates,Calculate_gABC_score, andDistill_CORE_per_clusterSpecify the directory where the output files generated by each function will be saved. Since these functions do not automatically create directories, the path provided to
output_dirmust already exist. -
The
organismargument inExtract_initial_enhancer_candidatesOnly
"hg38","hg19", and"mm10"are supported. -
The
n_colargument inCalculate_gABC_scoreThis argument has the same meaning as the
n_colargument in STARE. It should be set to [the number of annotation label classes in Clusters2 (e.g., the number of cell types) + 3]. -
The
motifPWMsargument inaddMotifAnnotationsUse
human_pwms_v2from chromVARmotifs when theorganismis"hg38"or"hg19", and usemouse_pwms_v2when theorganismis"mm10". -
The
list_clusterargument inDetermine_TF_weight_thresholdProvide the annotation labels for which you want to extract CORE as a vector. You may use either all annotation labels (Clusters2) or only a subset of them.
Except for special cases, such as the automatically calculated threshold is very small (e.g., because of extremely low cell-to-cell heterogeneity), we recommend using all annotation labels, as shown below.
list_group <- sort(unique(as.character(proj6$Clusters2))) data_lump_enCORE <- Determine_TF_weight_threshold(proj_atac = proj6, data_lump_enCORE = data_lump_enCORE, list_cluster = list_group, use_default_thres = FALSE)
As a quick start example, you can run the 10X PBMC 3k demo dataset (example_demo.zip). [If you want to try it, ] Use the following command to download the demo dataset:
Please download it from https://figshare.com/articles/dataset/Demo_PBMC_3k_enCORE_/31577779
Output files for the demo dataset are also available (example_output.zip).
Also, you can download the results of rGREAT peak annotation for CORE from the active option (results_core_annotation_PBMC_3k.csv).
-
CORE_potential_[Clusters2]_f.bedBED3 file containing CORE profiles from [Clusters2] using the
potentialoption. -
total_enhc_[Clusters2].bedBED3 file containing total enhancer candidates from [Clusters2].
If you had applied the active option,
-
CORE_active_[Clusters2]_f.bedBED3 file containing CORE profiles from [Clusters2] using the
activeoption (2/2 iteration ofIterative proximal enhancer clusters filtering). -
CORE_active_[Clusters2]_initial.bedBED3 file containing initial CORE profiles from [Clusters2] using the
activeoption (1/2 iteration ofIterative proximal enhancer clusters filtering). -
enhc_inactive_[Clusters2].bedBED3 file containing inactive proximal enhancer regions.
To perform downstream analysis with CORE,
Please use CORE_active_[Clusters2]_f.bed (active option) or CORE_potential_[Clusters2]_f.bed (potential option). These are the final results for each option.
If you use enCORE in your research, please cite using:
@article {Park2026.03.17.712366,
author = {Park, Seonjun and Ma, Sungkook and Lee, Wonhyo and Park, Sung Ho},
title = {Deciphering context-dependent epigenetic program by network-based prediction of clustered open regulatory elements from single-cell chromatin accessibility},
elocation-id = {2026.03.17.712366},
year = {2026},
doi = {10.64898/2026.03.17.712366},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/03/26/2026.03.17.712366},
eprint = {https://www.biorxiv.org/content/early/2026/03/26/2026.03.17.712366.full.pdf},
journal = {bioRxiv}
}
