Home

ChIP-Atlas / Documents

Documents for computational processing in ChIP-Atlas.

1. Data source

Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:

LIBRARY STRATEGY == ChIP-Seq, ATAC-Seq, DNase-Hypersensitivity, or Bisulfite-Seq
LIBRARY_SOURCE == GENOMIC
taxonomy_name == Homo sapiens, Mus musculus, Rattus norvegicus, Caenorhabditis elegans, Drosophila melanogaster, or Saccharomyces cerevisiae
INSTRUMENT_MODEL ~ Illumina, NextSeq or HiSeq

2. Primary processing

Introduction

Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.

Methods

Binarized sequence raw data (.sra) for each SRX were downloaded and decoded into Fastq format with the fastq-dump command of SRA Toolkit (ver 2.3.2-4) with default parameters, except for paired-end reads, which were decoded with the --split-files option. In an SRX including multiple runs, decoded Fastq files were concatenated into a single file.
Fastq files were then aligned with Bowtie 2 (ver 2.2.2) using default parameters, except for paired-end reads, for which two Fastq files were specified with -1 and -2 options. The following genome assemblies were used for alignment and subsequent processing:
- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
Resultant SAM-formatted files were converted into BAM format with SAMtools (ver 0.1.19; samtools view) and sorted (samtools sort) before removing PCR duplicates (samtools rmdup).
BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0; genomeCoverageBed) in RPM (Reads Per Million mapped reads) units with the -scale 1000000/N option, where N is the number of mapped reads after removing PCR duplicates.
BedGraph files were converted into BigWig format with the UCSC bedGraphToBigWig tool (ver 4).
BAM files generated in step (3) were used for peak calling with MACS2 (ver 2.1.0; macs2 callpeak) in BED4 format. Q-value thresholds were set to 1e-05, 1e-10, or 1e-20, with genome size parameters specified as follows:
- hg38, hg19: -g hs
- mm10, mm9: -g mm
- rn6: -g 2.15e9
- dm6, dm3: -g dm
- ce11, ce10: -g ce
- sacCer3: -g 12100000
Each row in the BED4 files includes genomic coordinates in columns 1–3 and the MACS2 score (−10 × log₁₀[MACS2 Q-value]) in column 4.
BED4 files were converted into BigBed format with the UCSC bedToBigBed tool (ver 2.5).

3. Data Annotation

Introduction

Experimental materials used for each SRX were manually annotated to allow extraction of data using keywords for track types and cell types.

Methods

Sample metadata for all SRXs (biosample_set.xml) were downloaded from the NCBI FTP site to extract attributes for antigens and antibodies (see here) as well as cell types and tissues (see here).
According to the attribute values assigned to each SRX, antigens and cell types were manually annotated by curators trained in molecular and developmental biology. Each annotation is assigned a ‘Class’ and ‘Subclass’ as described in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema).
Guidelines for antigen annotation:
- Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
- Gene-encoded proteins
  - Gene symbols were recorded according to the following gene nomenclature databases (e.g., OCT3/4 → POU5F1; p53 → TP53):
    - HGNC (H. sapiens)
    - MGI (M. musculus)
    - RGD (R. norvegicus)
    - FlyBase (D. melanogaster)
    - WormBase (C. elegans)
    - SGD (S. cerevisiae)
  - Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)
  - If an antibody recognizes multiple molecules within a family, the first in ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)
Criteria for cell type annotation:
- H. sapiens, M. musculus, and R. norvegicus: Cell types were mainly classified by tissue of origin. ES and iPS cells were exceptionally classified under the ‘Pluripotent stem cell’ class.
  
  Cell-type class Cell type
  
  Blood K-562; CD4-Positive T-Lymphocytes
  
  Breast MCF-7; T-47D
  
  Pluripotent stem cell hESC H1; iPS cells
- D. melanogaster: Cell types were mainly classified by cell lines and developmental stages.
- C. elegans: Mainly classified by developmental stages.
- S. cerevisiae: Classified by yeast strains.
- Standardized nomenclatures
  
  Nomenclatures of cell lines and tissue names were standardized according to the following resources:
  - Supplementary Table S2 in Yu et al. 2015 (PMID: 25877200) → Proposed unified cell-line names
  - ATCC → A nonprofit repository providing standardized cell line information
  - MeSH (Medical Subject Headings) → Controlled vocabulary for tissue and anatomical terms
  - FlyBase → Authoritative resource for D. melanogaster cell lines (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
Antigens or cell types were classified into the ‘Uncategorized’ class if curators could not interpret attribute values.
Antigens or cell types were classified into the ‘No description’ class if no attribute values were provided.

4. Peak Browser

ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-call datasets, including transcription factors (TFs) and histone modifications, as well as ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This functionality facilitates the identification of cis-regulatory elements, regulatory proteins, and epigenetic states of genomic regions of interest.

BED4-formatted peak-call data generated in Section 2 were concatenated and converted into BED9 + GFF3-compatible format for visualization on IGV. The resulting BED9 files are available for download from the Peak Browser web page.

BED9 file schema

Column	Description	Example
Header	Track name and link URL	(Strings)
Column 1	Chromosome	chr12
Column 2	Begin	1234
Column 3	End	5678
Column 4*	Sample metadata	(Strings)
Column 5	–10 × log₁₀(MACS2 Q-value)	345
Column 6	.	.
Column 7	Begin (= Column 2)	1234
Column 8	End (= Column 3)	5678
Column 9**	Color code	255,61,0

* Column 4

Sample metadata are described in GFF3 attribute format, enabling IGV to display annotated antigens and cell types. When hovering over a peak, IGV shows the accession number, experiment title, and all attribute values provided in the Biosample metadata for the corresponding SRX.
** Column 9

Heatmap color codes represent the MACS2 score in Column 5. If the MACS2 score is 0, 500, or 1000, the corresponding colors are blue, green, or red, respectively.

To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” (here) in Section 10.

Annotation tracks

In addition to experimental peak-call tracks, users can overlay Annotation Tracks in the Peak Browser to visualize functional genomic annotations within regions of interest.

Available annotation tracks are summarized below:

Genome	hg38	hg19	mm10	mm9	rn6	dm6	dm3	ce11	ce10	sacCer3
ENCODE Hi-C	○	○	○	○
GTEx eQTL	○	○
ChromHMM	○	○	○	○
CAGE	○	○	○	○
FANTOM5 enhancers	○	○
JASPAR TF motifs	○	○	○	○		○	○	○	○	○
GWAS Catalog	○	○
ClinVar	○	○
Orphanet	○	○
MGI Phenotype			○	○
PhastCons	○	○	○	○	○	○	○	○	○	○
RepeatMasker	○	○	○	○	○	○	○	○	○
RNA-seq^1,2,3,4	○	○	○	○		○	○
Ensembl genes	○	○	○	○	○	○	○	○	○	○
GENCODE genes	○	○	○	○
ENCODE Blacklist	○	○	○	○		○	○	○	○
CpG Islands	○	○	○	○	○	○	○	○	○

5. Target Genes

Introduction

The ChIP-Atlas Target Genes feature predicts genes directly regulated by a given protein, based on binding profiles of all public ChIP-seq data around gene loci. Target genes are defined as those whose transcription start sites (TSSs) overlap with peak-call intervals of the queried protein within a window of ± N kb (N = 1, 5, or 10).

Methods

Peak-call data

BED4-formatted peak-call data for each SRX generated in Section 2 were used (MACS2 Q-value < 1e-05; antigen class = ‘TFs and others’).
Preparation of TSS library

Locations of TSSs and corresponding gene symbols were obtained from refFlat files distributed via the UCSC FTP site. Only protein-coding genes were used for this analysis.
Preparation of STRING library

Protein–gene interaction data (protein.actions.v10.txt.gz) were downloaded from the STRING database. Protein identifiers were converted to gene symbols using protein.aliases.v10.txt.gz from the same source.
Processing

The bedtools window command (bedtools, ver 2.17.0) was used to identify genes whose TSSs overlapped with peak-call intervals within windows of ± 1 kb, ± 5 kb, or ± 10 kb, using the -w option (-w 1000, -w 5000, or -w 10000, respectively).
Peak-call data derived from the same antigen were aggregated.
MACS2 scores (−10 × log₁₀[MACS2 Q-value]) were visualized as heatmap colors (MACS2 score = 0, 500, 1000 → blue, green, red).
If multiple peaks from a single SRX overlapped the same gene, the highest MACS2 score was selected.
The Average column at the far left of the result table represents the mean MACS2 score for each gene.
The STRING column at the far right represents STRING interaction scores between the protein and the target gene.
Protein–gene interactions were extracted from protein.actions.v10.txt.gz when all of the following conditions were satisfied:
- Column 1 (item_id_a) == query antigen
- Column 2 (item_id_b) == target gene
- Column 3 (mode) == "expression"
- Column 5 (a_is_acting) == "1"

6. Colocalization

Introduction

Many TFs form complexes that cooperatively regulate gene expression (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). Such TFs often exhibit highly similar ChIP-seq binding profiles across the genome.

The ChIP-Atlas Colocalization feature predicts potential co-association partners of a given TF by evaluating similarities among all public ChIP-seq datasets using a dedicated algorithm termed CoLo.

Algorithms

BED4-formatted peak-call data generated in Section 2 were analyzed to assess pairwise similarities among experiments within the same cell-type class.

CoLo has two main advantages:

(a) Compensation for biases arising from differences in experimental conditions
(b) Adjustment for differences in peak numbers and genomic distributions intrinsic to individual TFs

To achieve (a), MACS2 scores within each BED4 file were fitted to a Gaussian distribution and classified into three binding-level groups:

H (High binding): Z-score > 0.5
M (Middle binding): −0.5 ≤ Z-score ≤ 0.5
L (Low binding): Z-score < −0.5

These groups are treated as independent strata when evaluating similarity (b).

For two SRXs (SRX₁ and SRX₂), CoLo evaluates similarity across all nine combinations:

[H / M / L of SRX_1] × [H / M / L of SRX_2]

Each combination yields a Boolean result (similar or not), resulting in a total of nine Boolean similarity indicators.

Methods

Peak-call data: Same as Section 5.1.
STRING library: Same as Section 5.3.
Similarity scoring: Similarity scores were calculated by multiplying the binding-level weights assigned to each combination:

SRX_1 SRX_2 Score

H H 9

H M 6

H L 3

M H 6

M M 4

M L 2

L H 3

L M 2

L L 1

If multiple H/M/L combinations were observed between SRX₁ and SRX₂, the highest score was adopted.
- Scores from 1 to 9 are visualized using a color gradient from blue to green to red.
- If all nine combinations were false, the result is shown in gray.
- The Average column at the far left represents the mean CoLo score for each protein.
- The STRING column at the far right represents STRING protein–protein interaction scores.
- Protein–protein interactions were extracted from protein.actions.v10.txt.gz when all of the following conditions were satisfied:
  - Column 1 (item_id_a) == query antigen
  - Column 2 (item_id_b) == co-association partner
  - Column 3 (mode) == "binding"

7. Enrichment Analysis

Introduction

ChIP-Atlas Enrichment Analysis accepts users’ data in the following formats:

Genomic regions (BED) to search features enriched to the regions
Gene list (gene symbols or IDs) to search features enriched to the genes
Gene count table (CSV or TSV) to search features showing concordant differences between two biological states

In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:

Data in panel 4	Data in panel 5	Aims and analyses
BED	Random permutation	Features overlapping with BED intervals more often than by chance
BED	BED	Features differentially overlapping between the two sets of BED intervals
Gene list	RefSeq coding genes	Features overlapping with genes more often than other RefSeq genes
Gene list	Gene list	Features differentially overlapping between the two sets of gene lists
Gene count table	Not required	Features showing concordant differences between two biological states

Requirements and acceptable data

Reference peak-call data (panels 1–3)

Reference peak-call data specified in the upper panels (1 to 3) of the [submission form][Enrichment_Analysis_submission] consist of comprehensive peak-call data described in Section 4.

The result will be returned more quickly if antigen classes and cell-type classes are specified.

Genomic regions (BED; panels 4–5)

Submitted BED files must follow the UCSC BED format and minimally contain three tab-delimited columns describing chromosome, start, and end positions:

Header lines and columns beyond column 3 may be included but are ignored.

chr1<tab>1435385<tab>1436458
chrX<tab>4634643<tab>4635798

Only BED files using the following genome assemblies are supported. BED files in any other genome assemblies MUST be converted using the UCSC liftOver tool prior to submission.

hg38, hg19 (H. sapiens)
mm10, mm9 (M. musculus)
rn6 (R. norvegicus)
dm6, dm3 (D. melanogaster)
ce11, ce10 (C. elegans)
sacCer3 (S. cerevisiae)

Gene list (panels 4–5)

Gene lists may be provided using official gene symbols or supported identifiers.

If gene lists are described using other formats, batch conversion tools such as DAVID should be used to convert them into official gene symbols or supported IDs.

Official gene symbols must follow standardized nomenclatures:
- HGNC (H. sapiens)
- MGI (M. musculus)
- RGD (R. norvegicus)
- FlyBase (D. melanogaster)
- WormBase (C. elegans)
- SGD (S. cerevisiae)
Examples: OCT3/4 → POU5F1, p53 → TP53
In addition to official gene symbols, the following identifiers are also acceptable:
- Ensembl IDs (e.g., ENSG00000204531)
- UniProt IDs (e.g., Q01860)
- RefSeq IDs (e.g., NM_002701)

Gene count table (panel 4)

An integer-valued gene count table obtained from RNA-seq experiments (CSV or TSV) with a header is required.

The first column of the table contains gene identifiers.

Remaining columns represent samples; sample names must include replicate numbers appended with an underscore (e.g., wt_1, wt_2).

Example (CSV):

Gene ID,treated_1,treated_2,treated_3,untreated_1,untreated_2,untreated_3
DDX11L1,8,7,5,12,8,13
WASH7P,1512,985,1236,2342,1600,2075
FAM138A,0,0,0,0,0,3
OR4F5,0,0,0,0,0,0
LOC100996442,279,208,234,402,285,370
:

Methods

A. Genomic regions and gene lists

Conversion to BED format

Submitted data are converted to BED format depending on the data type:
- BED
  
  Submitted BED files are used directly for downstream processing. If Random permutation is selected, BED intervals are randomly permuted across chromosomes using bedtools shuffle (bedtools; ver 2.17.0).
- Gene list
  
  Unique TSSs of submitted genes are defined using xxxCanonical.txt.gz libraries distributed from the UCSC FTP site, where xxx denotes:
  - known (H. sapiens and M. musculus)
  - flyBase (D. melanogaster)
  - sanger (C. elegans)
  - sgd (S. cerevisiae)
  Unique TSSs of Rattus norvegicus genes are defined using gene lists distributed by RGD.
  
  TSS coordinates are converted to BED format with widths specified by the Distance range from TSS parameter on the submission form. When RefSeq coding genes are selected as background, RefSeq coding genes excluding submitted genes are processed in the same manner.
Overlap counting

Overlaps between BED intervals derived from panels 4–5 and reference peak-call data specified in panels 1–3 are counted using bedtools intersect (bedtools; ver 2.23.0).
Statistical testing

Two-tailed Fisher’s exact probability tests are performed (see [example][insilicoChIPsample]).

The null hypothesis assumes that the proportion of reference peaks overlapping submitted data in panel 4 is equal to that overlapping data in panel 5.

Q-values are calculated using the Benjamini–Hochberg procedure.
Fold enrichment

Fold enrichment is calculated as:
```
Fold enrichment = Column 6 / Column 7
```
If the ratio exceeds 1, the feature is considered preferentially associated with the data in panel 4.

B. Gene count table

Upon receiving a two-group integer-valued gene count table, log₂ fold changes (log₂FC) for all genes between the two experimental groups are estimated using the DESeq2 package in R.
Gene identifiers provided in column 1 as Ensembl, UniProt, or RefSeq IDs are mapped to official gene symbols, and duplicate symbols are merged by summing their counts.
For each ChIP-seq, ATAC-seq, DNase-seq, or Bisulfite-seq experiment, overlaps between peak regions and gene loci within the user-defined TSS window are assessed, thereby constructing experiment-specific target gene sets.
In accordance with the original PAGE framework, Z-scores are calculated as:
```
Z = (S_m − μ) × √m / δ
```
where μ and δ represent the mean and standard deviation of genome-wide log₂FC values, respectively, and S_m denotes the mean log₂FC of a target gene set of size m.

Two-tailed P-values are derived from the Z-scores, followed by multiple-testing correction using the Benjamini–Hochberg procedure to obtain Q-values.

API of Enrichment Analysis

An API is available to perform Enrichment Analysis programmatically. Please see here for details.

8. Diff Analysis

Introduction

ChIP-Atlas Diff Analysis is a feature identifies differential peak regions (DPRs) or differentially methylated regions (DMRs) from two sets of queried ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.

Requirements and acceptable data

Experiment type (panel 1)

ChIP/ATAC/DNase-seq: Detection of DPRs
Bisulfite-seq: Detection of DMRs

Experiment IDs (panels 2–3)

Accepted identifiers include:

Experiment IDs from NCBI, ENA, or DDBJ (e.g., SRX18419259, ERX1103210, DRX335588)
GEO accessions from NCBI GEO (e.g., GSM6765200)
Publicly accessible URLs are pointing to user-hosted datasets

URL format for ChIP/ATAC/DNase-seq

Each line must include:

BigWig file (integer-valued read coverage)
BED file (peak-call data)
Total number of mapped reads

Example (hg38):

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed	205201674
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed	208332830

URL format for Bisulfite-seq

BigWig files describing methylation rates (0–1):

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw

The Dataset Search tool is useful for finding experiment IDs of interest.

Methods

Detection of differential peak regions (DPRs)

BigWig and BED files recorded in ChIP-Atlas are identified based on submitted IDs.
BigWig files are converted to bedGraph format.
Raw read coverage is reconstructed using total mapped read counts.
The genome is segmented based on peak regions from the query datasets.
Read counts are aggregated into an m × n matrix (m: regions, n: experiments).
Differential analysis is performed using the edgeR package in R.
Results are summarized in BED format containing genomic coordinates and statistics.

This approach is conceptually related to DiffBind but does not require BAM files.

Detection of differentially methylated regions (DMRs)

BigWig methylation data are converted to bedGraph format.
Methylation levels are aggregated using metilene_input.pl from the metilene package (PMID: 26631489).
DMRs are detected using the metilene command with default parameters.
Results are reported in BED format with statistical annotations.

Output files

Results are returned as a ZIP archive containing:

.igv.xml: IGV session file
.log: Analysis log
.bed: DPRs or DMRs in BED9 format
.igv.bed: BED9 + GFF3 format for IGV visualization

API of Diff Analysis

An API is available for programmatic execution. See here for details.

9. Experiment Comparative Profile

Experiment Comparative Profile is an experiment-level quality-control panel located at the bottom of each detailed experiment page (e.g., https://chip-atlas.org/view?id=SRX018625). This panel provides quantitative quality metrics for individual experiments by contextualizing each dataset relative to all other experiments of the same assay type. The panel consists of two components: Read and Peak Distribution and Correlation-Based Clustering.

Read and Peak Distribution

For each experiment type, sequencing read counts and the number of detected peaks were summarized across all experiments. For ChIP-seq experiments, datasets were further subcategorized by antigen class, including histone marks, TFs and others, RNA polymerase, and input controls.

For ChIP-seq, ATAC-seq, and DNase-seq experiments, peak counts were calculated using peaks with MACS2 scores < 50. For Bisulfite-seq experiments, “peaks” corresponded to hypermethylated regions identified using MethPipe.

These distributions were visualized as violin plots overlaid with box plots. The position of each individual experiment within its corresponding distribution is indicated by an orange horizontal line, allowing users to assess sequencing depth and signal yield in a cohort-level context.

Correlation-Based Clustering

For experiments sharing the same biological context—defined by the same genome assembly and cell type, as well as the same antigen in the case of ChIP-seq—pairwise Pearson correlations of BigWig signal profiles were computed and visualized using deepTools.

Briefly, BigWig files were segmented into 10-kb genomic windows, and signal intensities were summarized across bins using the multiBigwigSummary subcommand, for example:

multiBigwigSummary bins -b srx1.bw srx2.bw srx3.bw -o results.npz

The resulting matrix was subsequently used as input for plotCorrelation to calculate correlation coefficients, generate heatmaps, and perform hierarchical clustering, for example:

plotCorrelation -in results.npz -c pearson -p heatmap -o plot.png --outFileCorMatrix

Within the heatmap, arrowheads indicate the selected experiment, and their colors represent the median correlation coefficient with other experiments belonging to the same cluster.

10. Downloads

Data for each SRX

All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema).

BigWig (Download URL)

ChIP-seq, ATAC-seq, and DNase-seq

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bw/[Experimental_ID].bw

Bisulfite-seq

Methylation rate:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/methyl/[Experimental_ID].methyl.bw

Coverage:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/cover/[Experimental_ID].cover.bw

Example

ChIP-seq:

https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw

Bisulfite-seq (Methylation rate):

https://chip-atlas.dbcls.jp/data/hg38/eachData/bs/methyl/SRX1651655.methyl.bw

Peak-call (BED) (Download URL)

ChIP-seq, ATAC-seq, and DNase-seq

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bed[Threshold]/[Experimental_ID].[Threshold].bed

Threshold = 05, 10, or 20

Bisulfite-seq

Hypo MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/Bed/[Experimental_ID].hmr.bed

Partial MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/Bed/[Experimental_ID].pmd.bed

Hyper MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/Bed/[Experimental_ID].hypermr.bed

Example

ChIP-seq:

Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.

https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed

Bisulfite-seq (Hypo MR):

Hypo-methylated region data of SRX1651655.

https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/Bed/SRX1651655.hmr.bed

Peak-call (BigBed) (Download URL)

ChIP-seq, ATAC-seq, and DNase-seq

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bb[Threshold]/[Experimental_ID].[Threshold].bb

Threshold = 05, 10, or 20

Bisulfite-seq

Hypo MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/BigBed/[Experimental_ID].hmr.bb

Partial MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/BigBed/[Experimental_ID].pmd.bb

Hyper MR:

https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/BigBed/[Experimental_ID].hypermr.bb

Example

ChIP-seq:

Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.

https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb

Bisulfite-seq (Hypo MR):

Hypo-methylated region data of SRX1651655.

https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/BigBed/SRX1651655.hmr.bb

Assembled Peak-call data used in “Peak Browser”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/assembled/[File_name].bed

Available Genome and File_name are listed in fileList.tab (Download, Table schema)

Example

All peak-call data of GATA2 in all cell types with Q-value < 1E-05.

https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed

Note

As the assembled peak-call data used in “Peak Browser” are extremely large, we recommend downloading the lighter versions of all peak-call data (see below) and joining SRXs with sample metadata described in experimentList.tab on a command-line interface.

Lighter version of all peak-call data

Genome	Q < 1E-05	Q < 1E-10	Q < 1E-20	Q < 1E-50	WGBS
hg38	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
hg19	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
mm10	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
mm9	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
rn6	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
dm6	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
dm3	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
ce11	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
ce10	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
sacCer3	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎

Q: MACS2 Q-value thresholds

Table schema of the lighter version of all peak-call data

Column	Description	Example
Column 1	Chromosome	chr12
Column 2	Begin	1234
Column 3	End	5678
Column 4	SRX	SRX344646
Column 5	-10 × log₁₀(MACS2 Q-value)	345

Analyzed data used in “Target Genes”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/target/[Protein].[Distance].tsv

Protein is listed in analysisList.tab (Download, Table schema)

Distance = 1, 5, or 10 [kb from TSS])

Example

https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv

Analyzed data used in “Colocalization”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/colo/[Protein].[Cell_type_class].tsv

Protein and Cell_type_class are listed in analysisList.tab ( Download, Table schema)

Example

https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv

Spaces in cell type class names must be replaced with underscores.

Tables summarizing metadata and files

experimentList.tab

All experiments recorded in ChIP-Atlas.

Column	Description	Example
1	Experimental ID	SRX097088
2	Genome assembly	hg19
3	Track type class	TFs and others
4	Track type	GATA2
5	Cell type class	Blood
6	Cell type	K-562
7	Cell type description	Primary Tissue=Blood\|Tissue Diagnosis=Leukemia
8	Processing logs (ChIP/ATAC/DNase-seq)	30180878,82.3,42.1,6691
8	Processing logs (Bisulfite-seq)	132179672,88.1,3.4,311292
9	Title	GSM722415: GATA2 K562
10-	Metadata	source_name=GATA2 ChIP-seq K562

fileList.tab

All assembled peak-call data used in Peak Browser.

Column	Description	Example
1	File name	Oth.ALL.05.GATA2.AllCell
2	Genome assembly	hg19
3	Track type class	TFs and others
4	Track type	GATA2
5	Cell type class	All cell types
6	Cell type	-
7	Threshold	05
8	Experimental IDs included	SRX070877,SRX150427,...

analysisList.tab

Column	Description	Example
1	Antigen	POU5F1
2	Cell type class in Colocalization	Epidermis, Pluripotent stem cell
3	Recorded in Target Genes	+
4	Genome assembly	hg19

antigenList.tab

Column	Description	Example
1	Genome assembly	hg19
2	Track type class	TFs and others
3	Track type	POU5F1
4	Number of experiments	24
5	Experimental IDs included	SRX011571,...

celltypeList.tab

Column	Description	Example
1	Genome assembly	hg19
2	Cell type class	Prostate
3	Cell type	VCaP
4	Number of experiments	185
5	Experimental IDs included	SRX020917,...

11. External Genome Browser

BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.

Currently track hub feature is only provided based on files for each individual experiment, but we are working on to browse files assembled by antigen and cell types. See Using UCSC Genome Browser Track Hubs for more details.

Cell-type class	Cell type
Blood	K-562; CD4-Positive T-Lymphocytes
Breast	MCF-7; T-47D
Pluripotent stem cell	hESC H1; iPS cells

SRX_1	SRX_2	Score
H	H	9
H	M	6
H	L	3
M	H	6
M	M	4
M	L	2
L	H	3
L	M	2
L	L	1

Home

ChIP-Atlas / Documents

Table of Contents

1. Data source

2. Primary processing

Introduction

Methods

3. Data Annotation

Introduction

Methods

4. Peak Browser

BED9 file schema

Annotation tracks

5. Target Genes

Introduction

Methods

6. Colocalization

Introduction

Algorithms

Methods

7. Enrichment Analysis

Introduction

Requirements and acceptable data

Reference peak-call data (panels 1–3)

Genomic regions (BED; panels 4–5)

Gene list (panels 4–5)

Gene count table (panel 4)

Methods

A. Genomic regions and gene lists

B. Gene count table

API of Enrichment Analysis

8. Diff Analysis

Introduction

Requirements and acceptable data

Experiment type (panel 1)

Experiment IDs (panels 2–3)

URL format for ChIP/ATAC/DNase-seq

URL format for Bisulfite-seq

Methods

Detection of differential peak regions (DPRs)

Detection of differentially methylated regions (DMRs)

Output files

API of Diff Analysis

9. Experiment Comparative Profile

Read and Peak Distribution

Correlation-Based Clustering

10. Downloads

Data for each SRX

BigWig (Download URL)

Peak-call (BED) (Download URL)

Peak-call (BigBed) (Download URL)

Assembled Peak-call data used in “Peak Browser”

Lighter version of all peak-call data

Table schema of the lighter version of all peak-call data

Analyzed data used in “Target Genes”

Analyzed data used in “Colocalization”

Tables summarizing metadata and files

experimentList.tab

fileList.tab

analysisList.tab

antigenList.tab

celltypeList.tab

11. External Genome Browser

Uh oh!

Uh oh!

Clone this wiki locally