-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Documents for computational processing in ChIP-Atlas.
- Data source
- Primary processing
- Data Annotation
- Peak Browser
- Target Genes
- Colocalization
- Enrichment Analysis
- Diff Analysis
- Experiment Comparative Profile
- Downloads
- External Genome Browser
Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:
-
LIBRARY STRATEGY ==
ChIP-Seq,ATAC-Seq,DNase-Hypersensitivity, orBisulfite-Seq -
LIBRARY_SOURCE ==
GENOMIC -
taxonomy_name ==
Homo sapiens,Mus musculus,Rattus norvegicus,Caenorhabditis elegans,Drosophila melanogaster, orSaccharomyces cerevisiae -
INSTRUMENT_MODEL ~
Illumina,NextSeqorHiSeq
Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.
-
Binarized sequence raw data (
.sra) for each SRX were downloaded and decoded into Fastq format with thefastq-dumpcommand of SRA Toolkit (ver 2.3.2-4) with default parameters, except for paired-end reads, which were decoded with the--split-filesoption. In an SRX including multiple runs, decoded Fastq files were concatenated into a single file. -
Fastq files were then aligned with Bowtie 2 (ver 2.2.2) using default parameters, except for paired-end reads, for which two Fastq files were specified with
-1and-2options. The following genome assemblies were used for alignment and subsequent processing:- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
-
Resultant SAM-formatted files were converted into BAM format with SAMtools (ver 0.1.19;
samtools view) and sorted (samtools sort) before removing PCR duplicates (samtools rmdup). -
BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0;
genomeCoverageBed) in RPM (Reads Per Million mapped reads) units with the-scale 1000000/Noption, where N is the number of mapped reads after removing PCR duplicates. -
BedGraph files were converted into BigWig format with the UCSC
bedGraphToBigWigtool (ver 4). -
BAM files generated in step (3) were used for peak calling with MACS2 (ver 2.1.0;
macs2 callpeak) in BED4 format. Q-value thresholds were set to1e-05,1e-10, or1e-20, with genome size parameters specified as follows:- hg38, hg19:
-g hs - mm10, mm9:
-g mm - rn6:
-g 2.15e9 - dm6, dm3:
-g dm - ce11, ce10:
-g ce - sacCer3:
-g 12100000
Each row in the BED4 files includes genomic coordinates in columns 1–3 and the MACS2 score (−10 × log10[MACS2 Q-value]) in column 4.
- hg38, hg19:
-
BED4 files were converted into BigBed format with the UCSC
bedToBigBedtool (ver 2.5).
Experimental materials used for each SRX were manually annotated to allow extraction of data using keywords for track types and cell types.
- Sample metadata for all SRXs (
biosample_set.xml) were downloaded from the NCBI FTP site to extract attributes for antigens and antibodies (see here) as well as cell types and tissues (see here). - According to the attribute values assigned to each SRX, antigens and cell types were manually annotated by curators trained in molecular and developmental biology. Each annotation is assigned a ‘Class’ and ‘Subclass’ as described in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema).
- Guidelines for antigen annotation:
- Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
-
Gene-encoded proteins
- Gene symbols were recorded according to the following gene nomenclature databases (e.g., OCT3/4 → POU5F1; p53 → TP53):
- Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)
- If an antibody recognizes multiple molecules within a family, the first in ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)
- Criteria for cell type annotation:
-
H. sapiens, M. musculus, and R. norvegicus: Cell types were mainly classified by tissue of origin. ES and iPS cells were exceptionally classified under the ‘Pluripotent stem cell’ class.
Cell-type class Cell type Blood K-562; CD4-Positive T-Lymphocytes Breast MCF-7; T-47D Pluripotent stem cell hESC H1; iPS cells -
D. melanogaster: Cell types were mainly classified by cell lines and developmental stages.
-
C. elegans: Mainly classified by developmental stages.
-
S. cerevisiae: Classified by yeast strains.
-
Standardized nomenclatures
Nomenclatures of cell lines and tissue names were standardized according to the following resources:
- Supplementary Table S2 in Yu et al. 2015 (PMID: 25877200) → Proposed unified cell-line names
- ATCC → A nonprofit repository providing standardized cell line information
- MeSH (Medical Subject Headings) → Controlled vocabulary for tissue and anatomical terms
- FlyBase → Authoritative resource for D. melanogaster cell lines (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
-
- Antigens or cell types were classified into the ‘Uncategorized’ class if curators could not interpret attribute values.
- Antigens or cell types were classified into the ‘No description’ class if no attribute values were provided.
ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-call datasets, including transcription factors (TFs) and histone modifications, as well as ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This functionality facilitates the identification of cis-regulatory elements, regulatory proteins, and epigenetic states of genomic regions of interest.
BED4-formatted peak-call data generated in Section 2 were concatenated and converted into BED9 + GFF3-compatible format for visualization on IGV. The resulting BED9 files are available for download from the Peak Browser web page.
| Column | Description | Example |
|---|---|---|
| Header | Track name and link URL | (Strings) |
| Column 1 | Chromosome | chr12 |
| Column 2 | Begin | 1234 |
| Column 3 | End | 5678 |
| Column 4* | Sample metadata | (Strings) |
| Column 5 | –10 × log10(MACS2 Q-value) | 345 |
| Column 6 | . | . |
| Column 7 | Begin (= Column 2) | 1234 |
| Column 8 | End (= Column 3) | 5678 |
| Column 9** | Color code | 255,61,0 |
-
* Column 4
Sample metadata are described in GFF3 attribute format, enabling IGV to display annotated antigens and cell types. When hovering over a peak, IGV shows the accession number, experiment title, and all attribute values provided in the Biosample metadata for the corresponding SRX.
-
** Column 9
Heatmap color codes represent the MACS2 score in Column 5. If the MACS2 score is 0, 500, or 1000, the corresponding colors are blue, green, or red, respectively.
To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” (here) in Section 10.
In addition to experimental peak-call tracks, users can overlay Annotation Tracks in the Peak Browser to visualize functional genomic annotations within regions of interest.
Available annotation tracks are summarized below:
| Genome | hg38 | hg19 | mm10 | mm9 | rn6 | dm6 | dm3 | ce11 | ce10 | sacCer3 |
|---|---|---|---|---|---|---|---|---|---|---|
| ENCODE Hi-C | ○ | ○ | ○ | ○ | ||||||
| GTEx eQTL | ○ | ○ | ||||||||
| ChromHMM | ○ | ○ | ○ | ○ | ||||||
| CAGE | ○ | ○ | ○ | ○ | ||||||
| FANTOM5 enhancers | ○ | ○ | ||||||||
| JASPAR TF motifs | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |
| GWAS Catalog | ○ | ○ | ||||||||
| ClinVar | ○ | ○ | ||||||||
| Orphanet | ○ | ○ | ||||||||
| MGI Phenotype | ○ | ○ | ||||||||
| PhastCons | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| RepeatMasker | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |
| RNA-seq1,2,3,4 | ○ | ○ | ○ | ○ | ○ | ○ | ||||
| Ensembl genes | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| GENCODE genes | ○ | ○ | ○ | ○ | ||||||
| ENCODE Blacklist | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||
| CpG Islands | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
The ChIP-Atlas Target Genes feature predicts genes directly regulated by a given protein, based on binding profiles of all public ChIP-seq data around gene loci. Target genes are defined as those whose transcription start sites (TSSs) overlap with peak-call intervals of the queried protein within a window of ± N kb (N = 1, 5, or 10).
-
Peak-call data
BED4-formatted peak-call data for each SRX generated in Section 2 were used (MACS2 Q-value < 1e-05; antigen class = ‘TFs and others’).
-
Preparation of TSS library
Locations of TSSs and corresponding gene symbols were obtained from refFlat files distributed via the UCSC FTP site. Only protein-coding genes were used for this analysis.
-
Preparation of STRING library
Protein–gene interaction data (
protein.actions.v10.txt.gz) were downloaded from the STRING database. Protein identifiers were converted to gene symbols usingprotein.aliases.v10.txt.gzfrom the same source. -
Processing
-
The
bedtools windowcommand (bedtools, ver 2.17.0) was used to identify genes whose TSSs overlapped with peak-call intervals within windows of ± 1 kb, ± 5 kb, or ± 10 kb, using the-woption (-w 1000,-w 5000, or-w 10000, respectively). -
Peak-call data derived from the same antigen were aggregated.
-
MACS2 scores (−10 × log10[MACS2 Q-value]) were visualized as heatmap colors (MACS2 score = 0, 500, 1000 → blue, green, red).
-
If multiple peaks from a single SRX overlapped the same gene, the highest MACS2 score was selected.
-
The Average column at the far left of the result table represents the mean MACS2 score for each gene.
-
The STRING column at the far right represents STRING interaction scores between the protein and the target gene.
-
Protein–gene interactions were extracted from
protein.actions.v10.txt.gzwhen all of the following conditions were satisfied:- Column 1 (
item_id_a) == query antigen - Column 2 (
item_id_b) == target gene - Column 3 (
mode) =="expression" - Column 5 (
a_is_acting) =="1"
- Column 1 (
Many TFs form complexes that cooperatively regulate gene expression (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). Such TFs often exhibit highly similar ChIP-seq binding profiles across the genome.
The ChIP-Atlas Colocalization feature predicts potential co-association partners of a given TF by evaluating similarities among all public ChIP-seq datasets using a dedicated algorithm termed CoLo.
BED4-formatted peak-call data generated in Section 2 were analyzed to assess pairwise similarities among experiments within the same cell-type class.
CoLo has two main advantages:
- (a) Compensation for biases arising from differences in experimental conditions
- (b) Adjustment for differences in peak numbers and genomic distributions intrinsic to individual TFs
To achieve (a), MACS2 scores within each BED4 file were fitted to a Gaussian distribution and classified into three binding-level groups:
- H (High binding): Z-score > 0.5
- M (Middle binding): −0.5 ≤ Z-score ≤ 0.5
- L (Low binding): Z-score < −0.5
These groups are treated as independent strata when evaluating similarity (b).
For two SRXs (SRX1 and SRX2), CoLo evaluates similarity across all nine combinations:
[H / M / L of SRX_1] × [H / M / L of SRX_2]
Each combination yields a Boolean result (similar or not), resulting in a total of nine Boolean similarity indicators.
-
Peak-call data: Same as Section 5.1.
-
STRING library: Same as Section 5.3.
-
Similarity scoring: Similarity scores were calculated by multiplying the binding-level weights assigned to each combination:
SRX_1 SRX_2 Score H H 9 H M 6 H L 3 M H 6 M M 4 M L 2 L H 3 L M 2 L L 1 If multiple H/M/L combinations were observed between SRX1 and SRX2, the highest score was adopted.
-
Scores from 1 to 9 are visualized using a color gradient from blue to green to red.
-
If all nine combinations were false, the result is shown in gray.
-
The Average column at the far left represents the mean CoLo score for each protein.
-
The STRING column at the far right represents STRING protein–protein interaction scores.
-
Protein–protein interactions were extracted from
protein.actions.v10.txt.gzwhen all of the following conditions were satisfied:- Column 1 (
item_id_a) == query antigen - Column 2 (
item_id_b) == co-association partner - Column 3 (
mode) =="binding"
- Column 1 (
-
ChIP-Atlas Enrichment Analysis accepts users’ data in the following formats:
- Genomic regions (BED) to search features enriched to the regions
- Gene list (gene symbols or IDs) to search features enriched to the genes
- Gene count table (CSV or TSV) to search features showing concordant differences between two biological states
In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:
| Data in panel 4 | Data in panel 5 | Aims and analyses |
|---|---|---|
| BED | Random permutation | Features overlapping with BED intervals more often than by chance |
| BED | BED | Features differentially overlapping between the two sets of BED intervals |
| Gene list | RefSeq coding genes | Features overlapping with genes more often than other RefSeq genes |
| Gene list | Gene list | Features differentially overlapping between the two sets of gene lists |
| Gene count table | Not required | Features showing concordant differences between two biological states |
Reference peak-call data specified in the upper panels (1 to 3) of the [submission form][Enrichment_Analysis_submission] consist of comprehensive peak-call data described in Section 4.
The result will be returned more quickly if antigen classes and cell-type classes are specified.
Submitted BED files must follow the UCSC BED format and minimally contain three tab-delimited columns describing chromosome, start, and end positions:
Header lines and columns beyond column 3 may be included but are ignored.
chr1<tab>1435385<tab>1436458
chrX<tab>4634643<tab>4635798Only BED files using the following genome assemblies are supported. BED files in any other genome assemblies MUST be converted using the UCSC liftOver tool prior to submission.
-
hg38,hg19(H. sapiens) -
mm10,mm9(M. musculus) -
rn6(R. norvegicus) -
dm6,dm3(D. melanogaster) -
ce11,ce10(C. elegans) -
sacCer3(S. cerevisiae)
Gene lists may be provided using official gene symbols or supported identifiers.
If gene lists are described using other formats, batch conversion tools such as DAVID should be used to convert them into official gene symbols or supported IDs.
-
Official gene symbols must follow standardized nomenclatures:
- HGNC (H. sapiens)
- MGI (M. musculus)
- RGD (R. norvegicus)
- FlyBase (D. melanogaster)
- WormBase (C. elegans)
- SGD (S. cerevisiae)
Examples:
OCT3/4 → POU5F1,p53 → TP53 -
In addition to official gene symbols, the following identifiers are also acceptable:
- Ensembl IDs (e.g., ENSG00000204531)
- UniProt IDs (e.g., Q01860)
- RefSeq IDs (e.g., NM_002701)
An integer-valued gene count table obtained from RNA-seq experiments (CSV or TSV) with a header is required.
-
The first column of the table contains gene identifiers.
-
Remaining columns represent samples; sample names must include replicate numbers appended with an underscore (e.g.,
wt_1,wt_2).Example (CSV):
Gene ID,treated_1,treated_2,treated_3,untreated_1,untreated_2,untreated_3 DDX11L1,8,7,5,12,8,13 WASH7P,1512,985,1236,2342,1600,2075 FAM138A,0,0,0,0,0,3 OR4F5,0,0,0,0,0,0 LOC100996442,279,208,234,402,285,370 :
-
Conversion to BED format
Submitted data are converted to BED format depending on the data type:
-
BED
Submitted BED files are used directly for downstream processing. If Random permutation is selected, BED intervals are randomly permuted across chromosomes using
bedtools shuffle(bedtools; ver 2.17.0). -
Gene list
Unique TSSs of submitted genes are defined using
xxxCanonical.txt.gzlibraries distributed from the UCSC FTP site, wherexxxdenotes:-
known(H. sapiens and M. musculus) -
flyBase(D. melanogaster) -
sanger(C. elegans) -
sgd(S. cerevisiae)
Unique TSSs of Rattus norvegicus genes are defined using gene lists distributed by RGD.
TSS coordinates are converted to BED format with widths specified by the Distance range from TSS parameter on the submission form. When RefSeq coding genes are selected as background, RefSeq coding genes excluding submitted genes are processed in the same manner.
-
-
-
Overlap counting
Overlaps between BED intervals derived from panels 4–5 and reference peak-call data specified in panels 1–3 are counted using
bedtools intersect(bedtools; ver 2.23.0). -
Statistical testing
Two-tailed Fisher’s exact probability tests are performed (see [example][insilicoChIPsample]).
The null hypothesis assumes that the proportion of reference peaks overlapping submitted data in panel 4 is equal to that overlapping data in panel 5.
Q-values are calculated using the Benjamini–Hochberg procedure.
-
Fold enrichment
Fold enrichment is calculated as:
Fold enrichment = Column 6 / Column 7If the ratio exceeds 1, the feature is considered preferentially associated with the data in panel 4.
-
Upon receiving a two-group integer-valued gene count table, log2 fold changes (log2FC) for all genes between the two experimental groups are estimated using the DESeq2 package in R.
-
Gene identifiers provided in column 1 as Ensembl, UniProt, or RefSeq IDs are mapped to official gene symbols, and duplicate symbols are merged by summing their counts.
-
For each ChIP-seq, ATAC-seq, DNase-seq, or Bisulfite-seq experiment, overlaps between peak regions and gene loci within the user-defined TSS window are assessed, thereby constructing experiment-specific target gene sets.
-
In accordance with the original PAGE framework, Z-scores are calculated as:
Z = (S_m − μ) × √m / δwhere μ and δ represent the mean and standard deviation of genome-wide log2FC values, respectively, and Sm denotes the mean log2FC of a target gene set of size m.
Two-tailed P-values are derived from the Z-scores, followed by multiple-testing correction using the Benjamini–Hochberg procedure to obtain Q-values.
An API is available to perform Enrichment Analysis programmatically. Please see here for details.
ChIP-Atlas Diff Analysis is a feature identifies differential peak regions (DPRs) or differentially methylated regions (DMRs) from two sets of queried ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.
- ChIP/ATAC/DNase-seq: Detection of DPRs
- Bisulfite-seq: Detection of DMRs
Accepted identifiers include:
- Experiment IDs from NCBI, ENA, or DDBJ
(e.g.,
SRX18419259,ERX1103210,DRX335588) - GEO accessions from NCBI GEO
(e.g.,
GSM6765200) - Publicly accessible URLs are pointing to user-hosted datasets
Each line must include:
- BigWig file (integer-valued read coverage)
- BED file (peak-call data)
- Total number of mapped reads
Example (hg38):
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed 205201674
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed 208332830
BigWig files describing methylation rates (0–1):
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw
The Dataset Search tool is useful for finding experiment IDs of interest.
- BigWig and BED files recorded in ChIP-Atlas are identified based on submitted IDs.
- BigWig files are converted to bedGraph format.
- Raw read coverage is reconstructed using total mapped read counts.
- The genome is segmented based on peak regions from the query datasets.
- Read counts are aggregated into an m × n matrix (m: regions, n: experiments).
- Differential analysis is performed using the edgeR package in R.
- Results are summarized in BED format containing genomic coordinates and statistics.
This approach is conceptually related to DiffBind but does not require BAM files.
- BigWig methylation data are converted to bedGraph format.
- Methylation levels are aggregated using
metilene_input.plfrom the metilene package (PMID: 26631489). - DMRs are detected using the
metilenecommand with default parameters. - Results are reported in BED format with statistical annotations.
Results are returned as a ZIP archive containing:
-
.igv.xml: IGV session file -
.log: Analysis log -
.bed: DPRs or DMRs in BED9 format -
.igv.bed: BED9 + GFF3 format for IGV visualization
An API is available for programmatic execution. See here for details.
Experiment Comparative Profile is an experiment-level quality-control panel located at the bottom of each detailed experiment page (e.g., https://chip-atlas.org/view?id=SRX018625). This panel provides quantitative quality metrics for individual experiments by contextualizing each dataset relative to all other experiments of the same assay type. The panel consists of two components: Read and Peak Distribution and Correlation-Based Clustering.
For each experiment type, sequencing read counts and the number of detected peaks were summarized across all experiments. For ChIP-seq experiments, datasets were further subcategorized by antigen class, including histone marks, TFs and others, RNA polymerase, and input controls.
For ChIP-seq, ATAC-seq, and DNase-seq experiments, peak counts were calculated using peaks with MACS2 scores < 50. For Bisulfite-seq experiments, “peaks” corresponded to hypermethylated regions identified using MethPipe.
These distributions were visualized as violin plots overlaid with box plots. The position of each individual experiment within its corresponding distribution is indicated by an orange horizontal line, allowing users to assess sequencing depth and signal yield in a cohort-level context.
For experiments sharing the same biological context—defined by the same genome assembly and cell type, as well as the same antigen in the case of ChIP-seq—pairwise Pearson correlations of BigWig signal profiles were computed and visualized using deepTools.
Briefly, BigWig files were segmented into 10-kb genomic windows, and signal intensities were summarized across bins using the multiBigwigSummary subcommand, for example:
multiBigwigSummary bins -b srx1.bw srx2.bw srx3.bw -o results.npzThe resulting matrix was subsequently used as input for plotCorrelation to calculate correlation coefficients, generate heatmaps, and perform hierarchical clustering, for example:
plotCorrelation -in results.npz -c pearson -p heatmap -o plot.png --outFileCorMatrixWithin the heatmap, arrowheads indicate the selected experiment, and their colors represent the median correlation coefficient with other experiments belonging to the same cluster.
All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema).
-
ChIP-seq, ATAC-seq, and DNase-seq
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bw/[Experimental_ID].bw -
Bisulfite-seq
- Methylation rate:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/methyl/[Experimental_ID].methyl.bw - Coverage:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/cover/[Experimental_ID].cover.bw
- Methylation rate:
-
Example
- ChIP-seq:
https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw - Bisulfite-seq (Methylation rate):
https://chip-atlas.dbcls.jp/data/hg38/eachData/bs/methyl/SRX1651655.methyl.bw
- ChIP-seq:
-
ChIP-seq, ATAC-seq, and DNase-seq
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bed[Threshold]/[Experimental_ID].[Threshold].bedThreshold = 05, 10, or 20
-
Bisulfite-seq
- Hypo MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/Bed/[Experimental_ID].hmr.bed - Partial MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/Bed/[Experimental_ID].pmd.bed - Hyper MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/Bed/[Experimental_ID].hypermr.bed
- Hypo MR:
-
Example
- ChIP-seq:
Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.
https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed - Bisulfite-seq (Hypo MR):
Hypo-methylated region data of SRX1651655.
https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/Bed/SRX1651655.hmr.bed
- ChIP-seq:
-
ChIP-seq, ATAC-seq, and DNase-seq
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bb[Threshold]/[Experimental_ID].[Threshold].bbThreshold = 05, 10, or 20
-
Bisulfite-seq
- Hypo MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hmr/BigBed/[Experimental_ID].hmr.bb - Partial MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/pmd/BigBed/[Experimental_ID].pmd.bb - Hyper MR:
https://chip-atlas.dbcls.jp/data/[Genome]/eachData/bs/hypermr/BigBed/[Experimental_ID].hypermr.bb
- Hypo MR:
-
Example
- ChIP-seq:
Peak-call data of SRX097088 with MACS2 Q-value < 1E-05.
https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb - Bisulfite-seq (Hypo MR):
Hypo-methylated region data of SRX1651655.
https://chip-atlas.dbcls.jp/data/hg19/eachData/bs/hmr/BigBed/SRX1651655.hmr.bb
- ChIP-seq:
Download URL
https://chip-atlas.dbcls.jp/data/[Genome]/assembled/[File_name].bed
Available Genome and File_name are listed in fileList.tab (Download, Table schema)
Example
All peak-call data of GATA2 in all cell types with Q-value < 1E-05.
https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed
Note
As the assembled peak-call data used in “Peak Browser” are extremely large, we recommend downloading the lighter versions of all peak-call data (see below) and joining SRXs with sample metadata described in experimentList.tab on a command-line interface.
| Genome | Q < 1E-05 | Q < 1E-10 | Q < 1E-20 | Q < 1E-50 | WGBS |
|---|---|---|---|---|---|
| hg38 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| hg19 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| mm10 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| mm9 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| rn6 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| dm6 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| dm3 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| ce11 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| ce10 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
| sacCer3 | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ | ⬇︎ |
Q: MACS2 Q-value thresholds
| Column | Description | Example |
|---|---|---|
| Column 1 | Chromosome | chr12 |
| Column 2 | Begin | 1234 |
| Column 3 | End | 5678 |
| Column 4 | SRX | SRX344646 |
| Column 5 | -10 × log10(MACS2 Q-value) | 345 |
Download URL
https://chip-atlas.dbcls.jp/data/[Genome]/target/[Protein].[Distance].tsv
Protein is listed in analysisList.tab (Download, Table schema)
Distance = 1, 5, or 10 [kb from TSS])
Example
https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv
Download URL
https://chip-atlas.dbcls.jp/data/[Genome]/colo/[Protein].[Cell_type_class].tsv
Protein and Cell_type_class are listed in analysisList.tab ( Download, Table schema)
Example
https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv
Spaces in cell type class names must be replaced with underscores.
All experiments recorded in ChIP-Atlas.
| Column | Description | Example |
|---|---|---|
| 1 | Experimental ID | SRX097088 |
| 2 | Genome assembly | hg19 |
| 3 | Track type class | TFs and others |
| 4 | Track type | GATA2 |
| 5 | Cell type class | Blood |
| 6 | Cell type | K-562 |
| 7 | Cell type description | Primary Tissue=Blood|Tissue Diagnosis=Leukemia |
| 8 | Processing logs (ChIP/ATAC/DNase-seq) | 30180878,82.3,42.1,6691 |
| 8 | Processing logs (Bisulfite-seq) | 132179672,88.1,3.4,311292 |
| 9 | Title | GSM722415: GATA2 K562 |
| 10- | Metadata | source_name=GATA2 ChIP-seq K562 |
All assembled peak-call data used in Peak Browser.
| Column | Description | Example |
|---|---|---|
| 1 | File name | Oth.ALL.05.GATA2.AllCell |
| 2 | Genome assembly | hg19 |
| 3 | Track type class | TFs and others |
| 4 | Track type | GATA2 |
| 5 | Cell type class | All cell types |
| 6 | Cell type | - |
| 7 | Threshold | 05 |
| 8 | Experimental IDs included | SRX070877,SRX150427,... |
| Column | Description | Example |
|---|---|---|
| 1 | Antigen | POU5F1 |
| 2 | Cell type class in Colocalization | Epidermis, Pluripotent stem cell |
| 3 | Recorded in Target Genes | + |
| 4 | Genome assembly | hg19 |
| Column | Description | Example |
|---|---|---|
| 1 | Genome assembly | hg19 |
| 2 | Track type class | TFs and others |
| 3 | Track type | POU5F1 |
| 4 | Number of experiments | 24 |
| 5 | Experimental IDs included | SRX011571,... |
| Column | Description | Example |
|---|---|---|
| 1 | Genome assembly | hg19 |
| 2 | Cell type class | Prostate |
| 3 | Cell type | VCaP |
| 4 | Number of experiments | 185 |
| 5 | Experimental IDs included | SRX020917,... |
BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.
Currently track hub feature is only provided based on files for each individual experiment, but we are working on to browse files assembled by antigen and cell types. See Using UCSC Genome Browser Track Hubs for more details.