Skip to content
zouzhaonan edited this page Feb 8, 2026 · 101 revisions

ChIP-Atlas / Documents

Documents for computational processing in ChIP-Atlas.

Table of Contents

  1. Data source
  2. Primary processing
  3. Data Annotation
  4. Peak Browser
  5. Target Genes
  6. Colocalization
  7. Enrichment Analysis
  8. Diff Analysis
  9. Experiment Comparative Profile
  10. Downloads
  11. External Genome Browser

1. Data source

Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:

  • LIBRARY STRATEGY == ChIP-Seq, ATAC-Seq, DNase-Hypersensitivity, or Bisulfite-Seq
  • LIBRARY_SOURCE == GENOMIC
  • taxonomy_name == Homo sapiens, Mus musculus, Rattus norvegicus, Caenorhabditis elegans, Drosophila melanogaster, or Saccharomyces cerevisiae
  • INSTRUMENT_MODEL ~ Illumina, NextSeq or HiSeq

2. Primary processing

Introduction

Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.

Methods

  1. Binarized sequence raw data (.sra) for each SRX were downloaded and decoded into Fastq format with the fastq-dump command of SRA Toolkit (ver 2.3.2-4) with default parameters, except for paired-end reads, which were decoded with the --split-files option. In an SRX including multiple runs, decoded Fastq files were concatenated into a single file.

  2. Fastq files were then aligned with Bowtie 2 (ver 2.2.2) using default parameters, except for paired-end reads, for which two Fastq files were specified with -1 and -2 options. The following genome assemblies were used for alignment and subsequent processing:

    • hg38, hg19 (H. sapiens)
    • mm10, mm9 (M. musculus)
    • rn6 (R. norvegicus)
    • dm6, dm3 (D. melanogaster)
    • ce11, ce10 (C. elegans)
    • sacCer3 (S. cerevisiae)
  3. Resultant SAM-formatted files were converted into BAM format with SAMtools (ver 0.1.19; samtools view) and sorted (samtools sort) before removing PCR duplicates (samtools rmdup).

  4. BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0; genomeCoverageBed) in RPM (Reads Per Million mapped reads) units with the -scale 1000000/N option, where N is the number of mapped reads after removing PCR duplicates.

  5. BedGraph files were converted into BigWig format with the UCSC bedGraphToBigWig tool (ver 4).

  6. BAM files generated in step (3) were used for peak calling with MACS2 (ver 2.1.0; macs2 callpeak) in BED4 format. Q-value thresholds were set to 1e-05, 1e-10, or 1e-20, with genome size parameters specified as follows:

    • hg38, hg19: -g hs
    • mm10, mm9: -g mm
    • rn6: -g 2.15e9
    • dm6, dm3: -g dm
    • ce11, ce10: -g ce
    • sacCer3: -g 12100000

    Each row in the BED4 files includes genomic coordinates in columns 1–3 and the MACS2 score (−10 × log10[MACS2 Q-value]) in column 4.

  7. BED4 files were converted into BigBed format with the UCSC bedToBigBed tool (ver 2.5).

3. Data Annotation

Introduction

Experimental materials used for each SRX were manually annotated to allow extraction of data using keywords for track types and cell types.

Methods

  1. Sample metadata for all SRXs (biosample_set.xml) were downloaded from the NCBI FTP site to extract attributes for antigens and antibodies (see here) as well as cell types and tissues (see here).
  2. According to the attribute values assigned to each SRX, antigens and cell types were manually annotated by curators trained in molecular and developmental biology. Each annotation is assigned a ‘Class’ and ‘Subclass’ as described in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema).
  3. Guidelines for antigen annotation:
    • Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
    • Gene-encoded proteins
      • Gene symbols were recorded according to the following gene nomenclature databases (e.g., OCT3/4 → POU5F1; p53 → TP53):
      • Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)
      • If an antibody recognizes multiple molecules within a family, the first in ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)
  4. Criteria for cell type annotation:
    • H. sapiens, M. musculus, and R. norvegicus: Cell types were mainly classified by tissue of origin. ES and iPS cells were exceptionally classified under the ‘Pluripotent stem cell’ class.

      Cell-type class Cell type
      Blood K-562; CD4-Positive T-Lymphocytes
      Breast MCF-7; T-47D
      Pluripotent stem cell hESC H1; iPS cells
    • D. melanogaster: Cell types were mainly classified by cell lines and developmental stages.

    • C. elegans: Mainly classified by developmental stages.

    • S. cerevisiae: Classified by yeast strains.

    • Standardized nomenclatures

      Nomenclatures of cell lines and tissue names were standardized according to the following resources:

      • Supplementary Table S2 in Yu et al. 2015 (PMID: 25877200) → Proposed unified cell-line names
      • ATCC → A nonprofit repository providing standardized cell line information
      • MeSH (Medical Subject Headings) → Controlled vocabulary for tissue and anatomical terms
      • FlyBase → Authoritative resource for D. melanogaster cell lines (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
  5. Antigens or cell types were classified into the ‘Uncategorized’ class if curators could not interpret attribute values.
  6. Antigens or cell types were classified into the ‘No description’ class if no attribute values were provided.

4. Peak Browser

ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-call datasets, including transcription factors (TFs) and histone modifications, as well as ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This functionality facilitates the identification of cis-regulatory elements, regulatory proteins, and epigenetic states of genomic regions of interest.

BED4-formatted peak-call data generated in Section 2 were concatenated and converted into BED9 + GFF3-compatible format for visualization on IGV. The resulting BED9 files are available for download from the Peak Browser web page.

BED9 file schema

Column Description Example
Header Track name and link URL (Strings)
Column 1 Chromosome chr12
Column 2 Begin 1234
Column 3 End 5678
Column 4* Sample metadata (Strings)
Column 5 –10 × log10(MACS2 Q-value) 345
Column 6 . .
Column 7 Begin (= Column 2) 1234
Column 8 End (= Column 3) 5678
Column 9** Color code 255,61,0
  • * Column 4

    Sample metadata are described in GFF3 attribute format, enabling IGV to display annotated antigens and cell types. When hovering over a peak, IGV shows the accession number, experiment title, and all attribute values provided in the Biosample metadata for the corresponding SRX.

  • ** Column 9

    Heatmap color codes represent the MACS2 score in Column 5. If the MACS2 score is 0, 500, or 1000, the corresponding colors are blue, green, or red, respectively.

To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” (here) in Section 10.

Annotation tracks

In addition to experimental peak-call tracks, users can overlay Annotation Tracks in the Peak Browser to visualize functional genomic annotations within regions of interest.

Available annotation tracks are summarized below:

Genome hg38 hg19 mm10 mm9 rn6 dm6 dm3 ce11 ce10 sacCer3
ENCODE Hi-C
GTEx eQTL
ChromHMM
CAGE
FANTOM5 enhancers
JASPAR TF motifs
GWAS Catalog
ClinVar
Orphanet
MGI Phenotype
PhastCons
RepeatMasker
RNA-seq1,2,3,4
Ensembl genes
GENCODE genes
ENCODE Blacklist
CpG Islands

5. Target Genes

Introduction

The ChIP-Atlas Target Genes feature predicts genes directly regulated by a given protein, based on binding profiles of all public ChIP-seq data around gene loci. Target genes are defined as those whose transcription start sites (TSSs) overlap with peak-call intervals of the queried protein within a window of ± N kb (N = 1, 5, or 10).

Methods

  1. Peak-call data

    BED4-formatted peak-call data for each SRX generated in Section 2 were used (MACS2 Q-value < 1e-05; antigen class = ‘TFs and others’).

  2. Preparation of TSS library

    Locations of TSSs and corresponding gene symbols were obtained from refFlat files distributed via the UCSC FTP site. Only protein-coding genes were used for this analysis.

  3. Preparation of STRING library

    Protein–gene interaction data (protein.actions.v10.txt.gz) were downloaded from the STRING database. Protein identifiers were converted to gene symbols using protein.aliases.v10.txt.gz from the same source.

  4. Processing

  • The bedtools window command (bedtools, ver 2.17.0) was used to identify genes whose TSSs overlapped with peak-call intervals within windows of ± 1 kb, ± 5 kb, or ± 10 kb, using the -w option (-w 1000, -w 5000, or -w 10000, respectively).

  • Peak-call data derived from the same antigen were aggregated.

  • MACS2 scores (−10 × log10[MACS2 Q-value]) were visualized as heatmap colors (MACS2 score = 0, 500, 1000 → blue, green, red).

  • If multiple peaks from a single SRX overlapped the same gene, the highest MACS2 score was selected.

  • The Average column at the far left of the result table represents the mean MACS2 score for each gene.

  • The STRING column at the far right represents STRING interaction scores between the protein and the target gene.

  • Protein–gene interactions were extracted from protein.actions.v10.txt.gz when all of the following conditions were satisfied:

    • Column 1 (item_id_a) == query antigen
    • Column 2 (item_id_b) == target gene
    • Column 3 (mode) == "expression"
    • Column 5 (a_is_acting) == "1"

6. Colocalization

Introduction

Many TFs form complexes that cooperatively regulate gene expression (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). Such TFs often exhibit highly similar ChIP-seq binding profiles across the genome.

The ChIP-Atlas Colocalization feature predicts potential co-association partners of a given TF by evaluating similarities among all public ChIP-seq datasets using a dedicated algorithm termed CoLo.

Algorithms

BED4-formatted peak-call data generated in Section 2 were analyzed to assess pairwise similarities among experiments within the same cell-type class.

CoLo has two main advantages:

  • (a) Compensation for biases arising from differences in experimental conditions
  • (b) Adjustment for differences in peak numbers and genomic distributions intrinsic to individual TFs

To achieve (a), MACS2 scores within each BED4 file were fitted to a Gaussian distribution and classified into three binding-level groups:

  • H (High binding): Z-score > 0.5
  • M (Middle binding): −0.5 ≤ Z-score ≤ 0.5
  • L (Low binding): Z-score < −0.5

These groups are treated as independent strata when evaluating similarity (b).

For two SRXs (SRX1 and SRX2), CoLo evaluates similarity across all nine combinations:

[H / M / L of SRX_1] × [H / M / L of SRX_2]

Each combination yields a Boolean result (similar or not), resulting in a total of nine Boolean similarity indicators.


Methods

  1. Peak-call data: Same as Section 5.1.

  2. STRING library: Same as Section 5.3.

  3. Similarity scoring: Similarity scores were calculated by multiplying the binding-level weights assigned to each combination:

    SRX_1 SRX_2 Score
    H H 9
    H M 6
    H L 3
    M H 6
    M M 4
    M L 2
    L H 3
    L M 2
    L L 1

    If multiple H/M/L combinations were observed between SRX1 and SRX2, the highest score was adopted.

    • Scores from 1 to 9 are visualized using a color gradient from blue to green to red.

    • If all nine combinations were false, the result is shown in gray.

    • The Average column at the far left represents the mean CoLo score for each protein.

    • The STRING column at the far right represents STRING protein–protein interaction scores.

    • Protein–protein interactions were extracted from protein.actions.v10.txt.gz when all of the following conditions were satisfied:

      • Column 1 (item_id_a) == query antigen
      • Column 2 (item_id_b) == co-association partner
      • Column 3 (mode) == "binding"

7. Enrichment Analysis

Introduction

ChIP-Atlas Enrichment Analysis accepts users’ data in the following formats:

  • Genomic regions (BED) to search features enriched to the regions
  • Gene list (gene symbols or IDs) to search features enriched to the genes
  • Gene count table (CSV or TSV) to search features showing concordant differences between two biological states

In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:

Data in panel 4 Data in panel 5 Aims and analyses
BED Random permutation Features overlapping with BED intervals more often than by chance
BED BED Features differentially overlapping between the two sets of BED intervals
Gene list RefSeq coding genes Features overlapping with genes more often than other RefSeq genes
Gene list Gene list Features differentially overlapping between the two sets of gene lists
Gene count table Not required Features showing concordant differences between two biological states

Requirements and acceptable data

Reference peak-call data (panels 1–3)

Reference peak-call data specified in the upper panels (1 to 3) of the [submission form][Enrichment_Analysis_submission] consist of comprehensive peak-call data described in Section 4.

The result will be returned more quickly if antigen classes and cell-type classes are specified.

Genomic regions (BED; panels 4–5)

Submitted BED files must follow the UCSC BED format and minimally contain three tab-delimited columns describing chromosome, start, and end positions:

Header lines and columns beyond column 3 may be included but are ignored.

chr1<tab>1435385<tab>1436458
chrX<tab>4634643<tab>4635798

Only BED files using the following genome assemblies are supported. BED files in any other genome assemblies MUST be converted using the UCSC liftOver tool prior to submission.

  • hg38, hg19 (H. sapiens)
  • mm10, mm9 (M. musculus)
  • rn6 (R. norvegicus)
  • dm6, dm3 (D. melanogaster)
  • ce11, ce10 (C. elegans)
  • sacCer3 (S. cerevisiae)

Gene list (panels 4–5)

Gene lists may be provided using official gene symbols or supported identifiers.

If gene lists are described using other formats, batch conversion tools such as DAVID should be used to convert them into official gene symbols or supported IDs.

  • Official gene symbols must follow standardized nomenclatures:

    Examples: OCT3/4 → POU5F1, p53 → TP53

  • In addition to official gene symbols, the following identifiers are also acceptable:

    • Ensembl IDs (e.g., ENSG00000204531)
    • UniProt IDs (e.g., Q01860)
    • RefSeq IDs (e.g., NM_002701)

Gene count table (panel 4)

An integer-valued gene count table obtained from RNA-seq experiments (CSV or TSV) with a header is required.

  • The first column of the table contains gene identifiers.

  • Remaining columns represent samples; sample names must include replicate numbers appended with an underscore (e.g., wt_1, wt_2).

    Example (CSV):

    Gene ID,treated_1,treated_2,treated_3,untreated_1,untreated_2,untreated_3
    DDX11L1,8,7,5,12,8,13
    WASH7P,1512,985,1236,2342,1600,2075
    FAM138A,0,0,0,0,0,3
    OR4F5,0,0,0,0,0,0
    LOC100996442,279,208,234,402,285,370
    :
    

Methods

A. Genomic regions and gene lists

  1. Conversion to BED format

    Submitted data are converted to BED format depending on the data type:

    • BED

      Submitted BED files are used directly for downstream processing. If Random permutation is selected, BED intervals are randomly permuted across chromosomes using bedtools shuffle (bedtools; ver 2.17.0).

    • Gene list

      Unique TSSs of submitted genes are defined using xxxCanonical.txt.gz libraries distributed from the UCSC FTP site, where xxx denotes:

      • known (H. sapiens and M. musculus)
      • flyBase (D. melanogaster)
      • sanger (C. elegans)
      • sgd (S. cerevisiae)

      Unique TSSs of Rattus norvegicus genes are defined using gene lists distributed by RGD.

      TSS coordinates are converted to BED format with widths specified by the Distance range from TSS parameter on the submission form. When RefSeq coding genes are selected as background, RefSeq coding genes excluding submitted genes are processed in the same manner.

  2. Overlap counting

    Overlaps between BED intervals derived from panels 4–5 and reference peak-call data specified in panels 1–3 are counted using bedtools intersect (bedtools; ver 2.23.0).

  3. Statistical testing

    Two-tailed Fisher’s exact probability tests are performed (see [example][insilicoChIPsample]).

    The null hypothesis assumes that the proportion of reference peaks overlapping submitted data in panel 4 is equal to that overlapping data in panel 5.

    Q-values are calculated using the Benjamini–Hochberg procedure.

  4. Fold enrichment

    Fold enrichment is calculated as:

    Fold enrichment = Column 6 / Column 7
    

    If the ratio exceeds 1, the feature is considered preferentially associated with the data in panel 4.

B. Gene count table

  1. Upon receiving a two-group integer-valued gene count table, log2 fold changes (log2FC) for all genes between the two experimental groups are estimated using the DESeq2 package in R.

  2. Gene identifiers provided in column 1 as Ensembl, UniProt, or RefSeq IDs are mapped to official gene symbols, and duplicate symbols are merged by summing their counts.

  3. For each ChIP-seq, ATAC-seq, DNase-seq, or Bisulfite-seq experiment, overlaps between peak regions and gene loci within the user-defined TSS window are assessed, thereby constructing experiment-specific target gene sets.

  4. In accordance with the original PAGE framework, Z-scores are calculated as:

    Z = (S_m − μ) × √m / δ
    

    where μ and δ represent the mean and standard deviation of genome-wide log2FC values, respectively, and Sm denotes the mean log2FC of a target gene set of size m.

    Two-tailed P-values are derived from the Z-scores, followed by multiple-testing correction using the Benjamini–Hochberg procedure to obtain Q-values.

API of Enrichment Analysis

An API is available to perform Enrichment Analysis programmatically. Please see here for details.


8. Diff Analysis

Introduction

ChIP-Atlas Diff Analysis is a feature identifies differential peak regions (DPRs) or differentially methylated regions (DMRs) from two sets of queried ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.

Requirements and acceptable data

Experiment type (panel 1)

  • ChIP/ATAC/DNase-seq: Detection of DPRs
  • Bisulfite-seq: Detection of DMRs

Experiment IDs (panels 2–3)

Accepted identifiers include:

  • Experiment IDs from NCBI, ENA, or DDBJ (e.g., SRX18419259, ERX1103210, DRX335588)
  • GEO accessions from NCBI GEO (e.g., GSM6765200)
  • Publicly accessible URLs are pointing to user-hosted datasets
URL format for ChIP/ATAC/DNase-seq

Each line must include:

  • BigWig file (integer-valued read coverage)
  • BED file (peak-call data)
  • Total number of mapped reads

Example (hg38):

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed	205201674
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed	208332830
URL format for Bisulfite-seq

BigWig files describing methylation rates (0–1):

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw

The Dataset Search tool is useful for finding experiment IDs of interest.

Methods

Detection of differential peak regions (DPRs)

  1. BigWig and BED files recorded in ChIP-Atlas are identified based on submitted IDs.
  2. BigWig files are converted to bedGraph format.
  3. Raw read coverage is reconstructed using total mapped read counts.
  4. The genome is segmented based on peak regions from the query datasets.
  5. Read counts are aggregated into an m × n matrix (m: regions, n: experiments).
  6. Differential analysis is performed using the edgeR package in R.
  7. Results are summarized in BED format containing genomic coordinates and statistics.

This approach is conceptually related to DiffBind but does not require BAM files.

Detection of differentially methylated regions (DMRs)

  1. BigWig methylation data are converted to bedGraph format.
  2. Methylation levels are aggregated using metilene_input.pl from the metilene package (PMID: 26631489).
  3. DMRs are detected using the metilene command with default parameters.
  4. Results are reported in BED format with statistical annotations.

Output files

Results are returned as a ZIP archive containing:

  • .igv.xml: IGV session file
  • .log: Analysis log
  • .bed: DPRs or DMRs in BED9 format
  • .igv.bed: BED9 + GFF3 format for IGV visualization

API of Diff Analysis

An API is available for programmatic execution. See here for details.

9. Experiment Comparative Profile

Experiment Comparative Profile is an experiment-level quality-control panel located at the bottom of each detailed experiment page (e.g., https://chip-atlas.org/view?id=SRX018625). This panel provides quantitative quality metrics for individual experiments by contextualizing each dataset relative to all other experiments of the same assay type. The panel consists of two components: Read and Peak Distribution and Correlation-Based Clustering.

Read and Peak Distribution

For each experiment type, sequencing read counts and the number of detected peaks were summarized across all experiments. For ChIP-seq experiments, datasets were further subcategorized by antigen class, including histone marks, TFs and others, RNA polymerase, and input controls.

For ChIP-seq, ATAC-seq, and DNase-seq experiments, peak counts were calculated using peaks with MACS2 scores < 50. For Bisulfite-seq experiments, “peaks” corresponded to hypermethylated regions identified using MethPipe.

These distributions were visualized as violin plots overlaid with box plots. The position of each individual experiment within its corresponding distribution is indicated by an orange horizontal line, allowing users to assess sequencing depth and signal yield in a cohort-level context.

Correlation-Based Clustering

For experiments sharing the same biological context—defined by the same genome assembly and cell type, as well as the same antigen in the case of ChIP-seq—pairwise Pearson correlations of BigWig signal profiles were computed and visualized using deepTools.

Briefly, BigWig files were segmented into 10-kb genomic windows, and signal intensities were summarized across bins using the multiBigwigSummary subcommand, for example:

multiBigwigSummary bins -b srx1.bw srx2.bw srx3.bw -o results.npz

The resulting matrix was subsequently used as input for plotCorrelation to calculate correlation coefficients, generate heatmaps, and perform hierarchical clustering, for example:

plotCorrelation -in results.npz -c pearson -p heatmap -o plot.png --outFileCorMatrix

Within the heatmap, arrowheads indicate the selected experiment, and their colors represent the median correlation coefficient with other experiments belonging to the same cluster.

10. Downloads

Data for each SRX

All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema).

BigWig (Download URL)

  • ChIP-seq, ATAC-seq, and DNase-seq

    https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bw/[Experimental_ID].bw
    
  • Bisulfite-seq

    • Methylation rate:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/methyl/[Experimental_ID].methyl.bw
      
    • Coverage:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/cover/[Experimental_ID].cover.bw
      
  • Example

    • ChIP-seq:
      https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw
      
    • Bisulfite-seq (Methylation rate):
      https://chip-atlas.dbcls.jp/data/hg38/eachData/bs/methyl/SRX1651655.methyl.bw
      

Peak-call (BED) (Download URL)

  • ChIP-seq, ATAC-seq, and DNase-seq

    https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bed[Threshold]/[Experimental_ID].[Threshold].bed
    

    Threshold = 05, 10, or 20

  • Bisulfite-seq

    • Hypo MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/Bed/[Experimental_ID].hmr.bed
      
    • Partial MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/Bed/[Experimental_ID].pmd.bed
      
    • Hyper MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/Bed/[Experimental_ID].hypermr.bed
      
  • Example

    • ChIP-seq:

      Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.

      https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed
      
    • Bisulfite-seq (Hypo MR):

      Hypo-methylated region data of SRX1651655.

      https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/Bed/SRX1651655.hmr.bed
      

Peak-call (BigBed) (Download URL)

  • ChIP-seq, ATAC-seq, and DNase-seq

    https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bb[Threshold]/[Experimental_ID].[Threshold].bb
    

    Threshold = 05, 10, or 20

  • Bisulfite-seq

    • Hypo MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/BigBed/[Experimental_ID].hmr.bb
      
    • Partial MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/BigBed/[Experimental_ID].pmd.bb
      
    • Hyper MR:
      https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/BigBed/[Experimental_ID].hypermr.bb
      
  • Example

    • ChIP-seq:

      Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.

      https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb
      
    • Bisulfite-seq (Hypo MR):

      Hypo-methylated region data of SRX1651655.

      https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/BigBed/SRX1651655.hmr.bb
      

Assembled Peak-call data used in “Peak Browser”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/assembled/[File_name].bed

Available Genome and File_name are listed in fileList.tab (Download, Table schema)

Example

All peak-call data of GATA2 in all cell types with Q-value < 1E-05.

https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed

Note

As the assembled peak-call data used in “Peak Browser” are extremely large, we recommend downloading the lighter versions of all peak-call data (see below) and joining SRXs with sample metadata described in experimentList.tab on a command-line interface.

Lighter version of all peak-call data

Genome Q < 1E-05 Q < 1E-10 Q < 1E-20 Q < 1E-50 WGBS
hg38 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
hg19 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
mm10 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
mm9 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
rn6 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
dm6 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
dm3 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
ce11 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
ce10 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
sacCer3 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎

Q: MACS2 Q-value thresholds

Table schema of the lighter version of all peak-call data

Column Description Example
Column 1 Chromosome chr12
Column 2 Begin 1234
Column 3 End 5678
Column 4 SRX SRX344646
Column 5 -10 × log10(MACS2 Q-value) 345

Analyzed data used in “Target Genes”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/target/[Protein].[Distance].tsv

Protein is listed in analysisList.tab (Download, Table schema)

Distance = 1, 5, or 10 [kb from TSS])

Example

https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv

Analyzed data used in “Colocalization”

Download URL

https://chip-atlas.dbcls.jp/data/[Genome]/colo/[Protein].[Cell_type_class].tsv

Protein and Cell_type_class are listed in analysisList.tab ( Download, Table schema)

Example

https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv

Spaces in cell type class names must be replaced with underscores.

Tables summarizing metadata and files

experimentList.tab

All experiments recorded in ChIP-Atlas.

Column Description Example
1 Experimental ID SRX097088
2 Genome assembly hg19
3 Track type class TFs and others
4 Track type GATA2
5 Cell type class Blood
6 Cell type K-562
7 Cell type description Primary Tissue=Blood|Tissue Diagnosis=Leukemia
8 Processing logs (ChIP/ATAC/DNase-seq) 30180878,82.3,42.1,6691
8 Processing logs (Bisulfite-seq) 132179672,88.1,3.4,311292
9 Title GSM722415: GATA2 K562
10- Metadata source_name=GATA2 ChIP-seq K562

fileList.tab

All assembled peak-call data used in Peak Browser.

Column Description Example
1 File name Oth.ALL.05.GATA2.AllCell
2 Genome assembly hg19
3 Track type class TFs and others
4 Track type GATA2
5 Cell type class All cell types
6 Cell type -
7 Threshold 05
8 Experimental IDs included SRX070877,SRX150427,...

analysisList.tab

Column Description Example
1 Antigen POU5F1
2 Cell type class in Colocalization Epidermis, Pluripotent stem cell
3 Recorded in Target Genes +
4 Genome assembly hg19

antigenList.tab

Column Description Example
1 Genome assembly hg19
2 Track type class TFs and others
3 Track type POU5F1
4 Number of experiments 24
5 Experimental IDs included SRX011571,...

celltypeList.tab

Column Description Example
1 Genome assembly hg19
2 Cell type class Prostate
3 Cell type VCaP
4 Number of experiments 185
5 Experimental IDs included SRX020917,...

11. External Genome Browser

BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.

Currently track hub feature is only provided based on files for each individual experiment, but we are working on to browse files assembled by antigen and cell types. See Using UCSC Genome Browser Track Hubs for more details.

Clone this wiki locally