Skip to content

Commit cda8e25

Browse files
committed
Updated info about cron jobs
1 parent 36c8d55 commit cda8e25

File tree

2 files changed

+46
-0
lines changed

2 files changed

+46
-0
lines changed

docs/bedbase/bedbase-loader.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# 🔄 BEDbase loader
2+
3+
BEDbase loader is an automated tool and cron job that continuously fetches, processes, and integrates new BED files from public repositories into the BEDbase database. This ensures that BEDbase remains up-to-date with the latest genomic data available.
4+
5+
BEDbase loader repository: [https://github.com/databio/bedbase-loader](https://github.com/databio/bedbase-loader)
6+
7+
## Key Features
8+
- **Automated GEO Retrival**
9+
- **Automated BED heavy processing**
10+
- **Automated Genomes Updater**
11+
- **Umap creator**
12+
13+
### Automated GEO Retrieval
14+
15+
Main and the most important part of the bedbase-loader is automated retrieval of the GEO data.
16+
**Steps**:
17+
1. First, it is done by fetching metadata from PEPhub API, from bedbase repository: [https://pephub.databio.org/bedbase](https://pephub.databio.org/bedbase). And selects GSE projects that were uploaded in certain period of time (e.g. last 2 days).
18+
2. After list of GSE projects is fetched, BEDboss checks if this projects were already processed. If not, it is going to the next step.
19+
3. Then, it is fetching metadata for all the projects from PEPhub, including urls to the files.
20+
4. Next, files are being downloaded and metadata is inserted into the BEDbase database.
21+
5. Finally, the status flag is updated to "downloaded" and the project is ready for the next step - heavy processing (Next section).
22+
23+
24+
### Automated BED heavy processing
25+
26+
A lot of files are downloaded from GEO using automated GEO retrival.
27+
But to speed up downloading and inserting time we are skipping heavy processing on the initial step.
28+
Heavy processing is happening in AWS using AWS Fargate and automated cron job, after the files are downloaded and inserted into the database.
29+
Docker image for heavy processing: [https://github.com/databio/bedboss/blob/main/Dockerfile](https://github.com/databio/bedboss/blob/main/Dockerfile)
30+
31+
32+
### Automated Genomes Updater
33+
34+
BEDbase loader includes automated genomes updater that is fetching genomes from Refgenie server.
35+
We are storing information about all genomes available on the Refgenie server to make links between BED file stored in the BEDbase
36+
and the exact reference genome used to create this BED file.
37+
To automatically update genomes we are using cron job located here: https://github.com/databio/bedbase-loader/blob/master/.github/workflows/update_genomes.yml
38+
39+
40+
### Umap Creator
41+
42+
One of the important parts of the BEDbase is embeddings of the BED files.
43+
Visualization of the embedding provides insights into the data stored in the BEDbase.
44+
To create embeddings we are using BEDbase package, it automatically creates umap file, that later is visualized here: [https://bedbase.org/umap](https://bedbase.org/umap)
45+
To provide up-to-date umap we are using cron job located here: https://github.com/databio/bedbase-loader/blob/master/.github/workflows/update_umap.yml

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ nav:
6565
- Guide: bbconf/bbc_api.md
6666
- Changelog: bbconf/changelog.md
6767
- 🧰 BEDBoss processing pipeline: ../bedboss
68+
- 🔄 BEDbase auto uploader: bedbase/bedbase-loader.md
6869
- 📜 Configuration file: bedbase/how-to-configure.md
6970
- Reference:
7071
- How to cite: citations.md

0 commit comments

Comments
 (0)