Skip to content

Commit 734d698

Browse files
committed
Unify env var naming and make URLs configurable
1 parent 869e0d1 commit 734d698

File tree

15 files changed

+122
-91
lines changed

15 files changed

+122
-91
lines changed

Dockerfile

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
FROM python:3.12-bookworm
22

33
LABEL org.opencontainers.image.title="ddbj-search-converter" \
4-
org.opencontainers.image.description="Data converter for DDBJ Search" \
5-
org.opencontainers.image.version="0.1.0" \
6-
org.opencontainers.image.authors="Bioinformatics and DDBJ Center" \
7-
org.opencontainers.image.url="https://github.com/ddbj/ddbj-search-converter" \
8-
org.opencontainers.image.source="https://github.com/ddbj/ddbj-search-converter" \
9-
org.opencontainers.image.documentation="https://github.com/ddbj/ddbj-search-converter/blob/main/README.md" \
10-
org.opencontainers.image.licenses="Apache-2.0"
4+
org.opencontainers.image.description="Data converter for DDBJ Search" \
5+
org.opencontainers.image.version="0.1.0" \
6+
org.opencontainers.image.authors="Bioinformatics and DDBJ Center" \
7+
org.opencontainers.image.url="https://github.com/ddbj/ddbj-search-converter" \
8+
org.opencontainers.image.source="https://github.com/ddbj/ddbj-search-converter" \
9+
org.opencontainers.image.documentation="https://github.com/ddbj/ddbj-search-converter/blob/main/README.md" \
10+
org.opencontainers.image.licenses="Apache-2.0"
1111

1212
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
1313

README.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ DDBJ-Search Converter は、生命科学データベース間の関連情報(D
1818
|-------------|------|
1919
| BioProject | NCBI/DDBJ BioProject |
2020
| BioSample | NCBI/DDBJ BioSample |
21-
| SRA/DRA | INCSD SRA/DRA |
21+
| SRA/DRA | INSDC SRA/DRA |
2222
| JGA | Japan Genotype-Phenotype Archive |
2323
| GEA | Gene Expression Archive |
2424
| MetaboBank | MetaboBank |
@@ -41,13 +41,14 @@ DDBJ-Search Converter は、生命科学データベース間の関連情報(D
4141
### 環境起動(Staging / Production)
4242

4343
```bash
44-
# 1. Podman network 作成(初回のみ)
45-
podman network create ddbj-search-network
46-
47-
# 2. 環境変数と override を設定
44+
# 1. 環境変数と override を設定
4845
cp env.staging .env # または env.production
4946
cp compose.override.podman.yml compose.override.yml
5047

48+
# 2. Podman network 作成(初回のみ、既に存在していてもエラーにならない)
49+
podman network create ddbj-search-network-staging || true
50+
# production の場合: podman network create ddbj-search-network-production || true
51+
5152
# 3. 起動
5253
podman-compose up -d --build
5354

@@ -107,7 +108,7 @@ es_bulk_insert --index jga-policy
107108

108109
## データアーキテクチャ
109110

110-
```
111+
```plain
111112
+-----------------------------------------------------------------------------+
112113
| External Resources |
113114
| BioProject XML, BioSample XML, SRA/DRA Accessions.tab, |
@@ -159,19 +160,24 @@ es_bulk_insert --index jga-policy
159160
### .env の主要設定
160161

161162
```bash
163+
# === Environment ===
164+
DDBJ_SEARCH_ENV=production # dev, staging, production
165+
162166
# === Elasticsearch Settings ===
163-
ES_MEM_LIMIT=128g # コンテナメモリ上限
164-
ES_JAVA_OPTS=-Xms64g -Xmx64g # JVM ヒープサイズ
167+
DDBJ_SEARCH_ES_MEM_LIMIT=128g # コンテナメモリ上限
168+
DDBJ_SEARCH_ES_JAVA_OPTS=-Xms64g -Xmx64g # JVM ヒープサイズ
165169

166170
# === Volume Paths (for compose.yml) ===
167-
RESULT_PATH=./ddbj_search_converter_results # 結果出力先
168-
CONST_PATH=/home/w3ddbjld/const # blacklist, preserved 等
171+
DDBJ_SEARCH_CONVERTER_RESULT_PATH=./ddbj_search_converter_results # 結果出力先
172+
DDBJ_SEARCH_CONVERTER_CONST_PATH=/home/w3ddbjld/const # blacklist, preserved 等
169173
DBLINK_PATH=/usr/local/shared_data/dblink # DBLink TSV 出力先
170174
BIOPROJECT_PATH=/usr/local/resources/bioproject
171175
BIOSAMPLE_PATH=/usr/local/resources/biosample
172176
# ... 他のマウントパス
173177
```
174178

179+
`DDBJ_SEARCH_ENV` により、コンテナ名(`ddbj-search-converter-{env}`, `ddbj-search-es-{env}`)と Docker network 名(`ddbj-search-network-{env}`)が自動決定される。
180+
175181
**DATE 固定**: 過去日のデータで再現・検証する場合は `DDBJ_SEARCH_CONVERTER_DATE=YYYYMMDD` を設定する。
176182

177183
## 開発
@@ -181,12 +187,12 @@ Development Container を前提とする。
181187
### 環境起動(Dev)
182188

183189
```bash
184-
# 1. Docker network 作成(初回のみ)
185-
docker network create ddbj-search-network
186-
187-
# 2. 環境変数を設定
190+
# 1. 環境変数を設定
188191
cp env.dev .env
189192

193+
# 2. Docker network 作成(初回のみ、既に存在していてもエラーにならない)
194+
docker network create ddbj-search-network-dev || true
195+
190196
# 3. 起動
191197
docker compose up -d --build
192198

compose.yml

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ services:
33
build:
44
context: .
55
dockerfile: Dockerfile
6-
container_name: ${APP_CONTAINER_NAME}
6+
container_name: ddbj-search-converter-${DDBJ_SEARCH_ENV}
77
volumes:
88
- .:/app:rw
99
- app-venv:/app/.venv
10-
- ${RESULT_PATH}:/app/ddbj_search_converter_results:rw
11-
- ${CONST_PATH}:/home/w3ddbjld/const:rw
10+
- ${DDBJ_SEARCH_CONVERTER_RESULT_PATH}:/app/ddbj_search_converter_results:rw
11+
- ${DDBJ_SEARCH_CONVERTER_CONST_PATH}:/home/w3ddbjld/const:rw
1212
# SRA / DRA Accessions
1313
- ${SRA_ACCESSIONS_PATH}:/lustre9/open/database/ddbj-dbt/dra-private/mirror/SRA_Accessions:ro
1414
- ${DRA_ACCESSIONS_PATH}:/lustre9/open/database/ddbj-dbt/dra-private/tracesys/batch/logs/livelist/ReleaseData/public:ro
@@ -24,11 +24,12 @@ services:
2424
- ${GEA_PATH}:/usr/local/resources/gea/experiment:ro
2525
- ${METABOBANK_PATH}:/usr/local/shared_data/metabobank/study:ro
2626
environment:
27-
TZ: "Asia/Tokyo"
27+
TZ: ${TZ:-Asia/Tokyo}
2828
DDBJ_SEARCH_CONVERTER_RESULT_DIR: ${DDBJ_SEARCH_CONVERTER_RESULT_DIR}
2929
DDBJ_SEARCH_CONVERTER_CONST_DIR: ${DDBJ_SEARCH_CONVERTER_CONST_DIR}
3030
DDBJ_SEARCH_CONVERTER_POSTGRES_URL: ${DDBJ_SEARCH_CONVERTER_POSTGRES_URL:-}
3131
DDBJ_SEARCH_CONVERTER_ES_URL: ${DDBJ_SEARCH_CONVERTER_ES_URL}
32+
DDBJ_SEARCH_HOST: ${DDBJ_SEARCH_HOST}
3233
working_dir: /app
3334
command: ["sleep", "infinity"]
3435
depends_on:
@@ -40,13 +41,13 @@ services:
4041

4142
elasticsearch:
4243
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.1
43-
container_name: ${ES_CONTAINER_NAME}
44+
container_name: ddbj-search-es-${DDBJ_SEARCH_ENV}
4445
environment:
45-
TZ: "Asia/Tokyo"
46+
TZ: ${TZ:-Asia/Tokyo}
4647
discovery.type: "single-node"
4748
xpack.security.enabled: "false"
4849
bootstrap.memory_lock: "true"
49-
ES_JAVA_OPTS: ${ES_JAVA_OPTS}
50+
ES_JAVA_OPTS: ${DDBJ_SEARCH_ES_JAVA_OPTS}
5051
path.repo: "/usr/share/elasticsearch/backup"
5152
volumes:
5253
- es-data:/usr/share/elasticsearch/data
@@ -56,7 +57,7 @@ services:
5657
memlock:
5758
soft: -1
5859
hard: -1
59-
mem_limit: ${ES_MEM_LIMIT}
60+
mem_limit: ${DDBJ_SEARCH_ES_MEM_LIMIT}
6061
healthcheck:
6162
test:
6263
[
@@ -72,7 +73,7 @@ services:
7273

7374
networks:
7475
ddbj-search-network:
75-
name: ddbj-search-network
76+
name: ddbj-search-network-${DDBJ_SEARCH_ENV}
7677
external: true
7778

7879
volumes:

ddbj_search_converter/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
CONST_DIR = Path("/home/w3ddbjld/const") # Path to store constant/shared resources
1212
DATE_FORMAT = "%Y%m%d"
1313
LOCAL_TZ = ZoneInfo(os.environ.get("TZ", "Asia/Tokyo"))
14+
SEARCH_BASE_URL = f"https://{os.environ.get('DDBJ_SEARCH_HOST', 'ddbj.nig.ac.jp')}"
1415
_date_override = os.environ.get("DDBJ_SEARCH_CONVERTER_DATE")
1516
if _date_override:
1617
TODAY = datetime.strptime(_date_override, DATE_FORMAT).date()

ddbj_search_converter/jsonl/bp.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
TMP_XML_DIR_NAME, TODAY_STR, Config,
1111
apply_margin, get_config,
1212
read_last_run, write_last_run)
13+
from ddbj_search_converter.config import SEARCH_BASE_URL
1314
from ddbj_search_converter.dblink.utils import load_blacklist
1415
from ddbj_search_converter.jsonl.utils import get_dbxref_map, write_jsonl
1516
from ddbj_search_converter.logging.logger import (log_debug, log_error,
@@ -465,13 +466,13 @@ def xml_entry_to_bp_instance(entry: Dict[str, Any], is_ddbj: bool) -> BioProject
465466
distribution=[Distribution(
466467
type="DataDownload",
467468
encodingFormat="JSON",
468-
contentUrl=f"https://ddbj.nig.ac.jp/search/entries/bioproject/{accession}.json",
469+
contentUrl=f"{SEARCH_BASE_URL}/search/entries/bioproject/{accession}.json",
469470
)],
470471
isPartOf="BioProject",
471472
type="bioproject",
472473
objectType=parse_object_type(project),
473474
name=None,
474-
url=f"https://ddbj.nig.ac.jp/search/entries/bioproject/{accession}",
475+
url=f"{SEARCH_BASE_URL}/search/entries/bioproject/{accession}",
475476
organism=parse_organism(project, is_ddbj, accession),
476477
title=parse_title(project, accession),
477478
description=parse_description(project, accession),

ddbj_search_converter/jsonl/bs.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
TMP_XML_DIR_NAME, TODAY_STR, Config,
1111
apply_margin, get_config,
1212
read_last_run, write_last_run)
13+
from ddbj_search_converter.config import SEARCH_BASE_URL
1314
from ddbj_search_converter.dblink.utils import load_blacklist
1415
from ddbj_search_converter.jsonl.utils import get_dbxref_map, write_jsonl
1516
from ddbj_search_converter.logging.logger import (log_debug, log_error,
@@ -198,7 +199,7 @@ def parse_same_as(sample: Dict[str, Any], accession: str = "") -> List[Xref]:
198199
xrefs.append(Xref(
199200
identifier=content,
200201
type="sra-sample",
201-
url=f"https://ddbj.nig.ac.jp/search/entries/sra-sample/{content}",
202+
url=f"{SEARCH_BASE_URL}/search/entries/sra-sample/{content}",
202203
))
203204
except Exception as e:
204205
log_warn(f"failed to parse same_as: {e}", accession=accession)
@@ -316,12 +317,12 @@ def xml_entry_to_bs_instance(entry: Dict[str, Any], is_ddbj: bool) -> BioSample:
316317
distribution=[Distribution(
317318
type="DataDownload",
318319
encodingFormat="JSON",
319-
contentUrl=f"https://ddbj.nig.ac.jp/search/entries/biosample/{accession}.json",
320+
contentUrl=f"{SEARCH_BASE_URL}/search/entries/biosample/{accession}.json",
320321
)],
321322
isPartOf="BioSample",
322323
type="biosample",
323324
name=parse_name(sample, accession),
324-
url=f"https://ddbj.nig.ac.jp/search/entries/biosample/{accession}",
325+
url=f"{SEARCH_BASE_URL}/search/entries/biosample/{accession}",
325326
organism=parse_organism(sample, is_ddbj, accession),
326327
title=parse_title(sample, accession),
327328
description=parse_description(sample, accession),

ddbj_search_converter/jsonl/jga.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
JSONL_DIR_NAME, TODAY_STR, Config,
1212
get_config)
1313
from ddbj_search_converter.dblink.db import AccessionType
14+
from ddbj_search_converter.config import SEARCH_BASE_URL
1415
from ddbj_search_converter.dblink.utils import load_jga_blacklist
1516
from ddbj_search_converter.jsonl.utils import get_dbxref_map, write_jsonl
1617
from ddbj_search_converter.logging.logger import (log_debug, log_error,
@@ -158,13 +159,13 @@ def jga_entry_to_jga_instance(entry: Dict[str, Any], index_name: IndexName) -> J
158159
Distribution(
159160
type="DataDownload",
160161
encodingFormat="JSON",
161-
contentUrl=f"https://ddbj.nig.ac.jp/search/entries/{index_name}/{accession}.json",
162+
contentUrl=f"{SEARCH_BASE_URL}/search/entries/{index_name}/{accession}.json",
162163
)
163164
],
164165
isPartOf="jga",
165166
type=index_name,
166167
name=_get_name_from_alias(accession, entry.get("alias")),
167-
url=f"https://ddbj.nig.ac.jp/search/entries/{index_name}/{accession}",
168+
url=f"{SEARCH_BASE_URL}/search/entries/{index_name}/{accession}",
168169
organism=Organism(identifier="9606", name="Homo sapiens"),
169170
title=extract_title(entry, index_name),
170171
description=extract_description(entry, index_name),

ddbj_search_converter/jsonl/sra.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from ddbj_search_converter.config import (JSONL_DIR_NAME, SRA_BASE_DIR_NAME,
2323
TODAY_STR, Config, get_config,
2424
read_last_run, write_last_run)
25+
from ddbj_search_converter.config import SEARCH_BASE_URL
2526
from ddbj_search_converter.dblink.utils import load_sra_blacklist
2627
from ddbj_search_converter.jsonl.utils import get_dbxref_map, write_jsonl
2728
from ddbj_search_converter.logging.logger import (log_debug, log_info,
@@ -302,13 +303,13 @@ def _make_distribution(entry_type: str, identifier: str) -> List[Distribution]:
302303
return [Distribution(
303304
type="DataDownload",
304305
encodingFormat="JSON",
305-
contentUrl=f"https://ddbj.nig.ac.jp/search/entries/{entry_type}/{identifier}.json",
306+
contentUrl=f"{SEARCH_BASE_URL}/search/entries/{entry_type}/{identifier}.json",
306307
)]
307308

308309

309310
def _make_url(entry_type: str, identifier: str) -> str:
310311
"""URL を作成する。"""
311-
return f"https://ddbj.nig.ac.jp/search/entries/{entry_type}/{identifier}"
312+
return f"{SEARCH_BASE_URL}/search/entries/{entry_type}/{identifier}"
312313

313314

314315
def _get_name_from_alias(accession: str, alias: Optional[str]) -> Optional[str]:

ddbj_search_converter/jsonl/utils.py

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22
from pathlib import Path
33
from typing import Any, Dict, List, Optional
44

5-
from ddbj_search_converter.config import Config
5+
from ddbj_search_converter.config import SEARCH_BASE_URL, Config
66
from ddbj_search_converter.dblink.db import (AccessionType,
77
get_related_entities_bulk)
88
from ddbj_search_converter.id_patterns import ID_PATTERN_MAP
99
from ddbj_search_converter.schema import Xref, XrefType
1010

1111
URL_TEMPLATE: Dict[XrefType, str] = {
12-
"biosample": "https://ddbj.nig.ac.jp/search/entries/biosample/{id}",
13-
"bioproject": "https://ddbj.nig.ac.jp/search/entries/bioproject/{id}",
14-
"umbrella-bioproject": "https://ddbj.nig.ac.jp/search/entries/bioproject/{id}",
15-
"sra-submission": "https://ddbj.nig.ac.jp/search/entries/sra-submission/{id}",
16-
"sra-study": "https://ddbj.nig.ac.jp/search/entries/sra-study/{id}",
17-
"sra-experiment": "https://ddbj.nig.ac.jp/search/entries/sra-experiment/{id}",
18-
"sra-run": "https://ddbj.nig.ac.jp/search/entries/sra-run/{id}",
19-
"sra-sample": "https://ddbj.nig.ac.jp/search/entries/sra-sample/{id}",
20-
"sra-analysis": "https://ddbj.nig.ac.jp/search/entries/sra-analysis/{id}",
21-
"jga-study": "https://ddbj.nig.ac.jp/search/entries/jga-study/{id}",
22-
"jga-dataset": "https://ddbj.nig.ac.jp/search/entries/jga-dataset/{id}",
23-
"jga-dac": "https://ddbj.nig.ac.jp/search/entries/jga-dac/{id}",
24-
"jga-policy": "https://ddbj.nig.ac.jp/search/entries/jga-policy/{id}",
25-
"gea": "https://ddbj.nig.ac.jp/public/ddbj_database/gea/experiment/{prefix}/{id}/",
12+
"biosample": f"{SEARCH_BASE_URL}/search/entries/biosample/{{id}}",
13+
"bioproject": f"{SEARCH_BASE_URL}/search/entries/bioproject/{{id}}",
14+
"umbrella-bioproject": f"{SEARCH_BASE_URL}/search/entries/bioproject/{{id}}",
15+
"sra-submission": f"{SEARCH_BASE_URL}/search/entries/sra-submission/{{id}}",
16+
"sra-study": f"{SEARCH_BASE_URL}/search/entries/sra-study/{{id}}",
17+
"sra-experiment": f"{SEARCH_BASE_URL}/search/entries/sra-experiment/{{id}}",
18+
"sra-run": f"{SEARCH_BASE_URL}/search/entries/sra-run/{{id}}",
19+
"sra-sample": f"{SEARCH_BASE_URL}/search/entries/sra-sample/{{id}}",
20+
"sra-analysis": f"{SEARCH_BASE_URL}/search/entries/sra-analysis/{{id}}",
21+
"jga-study": f"{SEARCH_BASE_URL}/search/entries/jga-study/{{id}}",
22+
"jga-dataset": f"{SEARCH_BASE_URL}/search/entries/jga-dataset/{{id}}",
23+
"jga-dac": f"{SEARCH_BASE_URL}/search/entries/jga-dac/{{id}}",
24+
"jga-policy": f"{SEARCH_BASE_URL}/search/entries/jga-policy/{{id}}",
25+
"gea": f"{SEARCH_BASE_URL}/public/ddbj_database/gea/experiment/{{prefix}}/{{id}}/",
2626
"geo": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc={id}",
2727
"insdc-assembly": "https://www.ncbi.nlm.nih.gov/datasets/genome/{id}",
2828
"insdc-master": "https://www.ncbi.nlm.nih.gov/nuccore/{id}",

docs/cli-pipeline.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ DDBJ-Search Converter のパイプライン実行と差分更新。
66

77
パイプラインは 3 フェーズで構成される。
88

9-
```
9+
```plain
1010
Phase 1: 前処理 + DBLink 構築
1111
外部リソース -> 前処理コマンド -> DBLink DB -> TSV
1212
@@ -188,7 +188,7 @@ generate_jga_jsonl
188188

189189
### ステップ一覧
190190

191-
```
191+
```plain
192192
=== PHASE 0: Pre-check ===
193193
check_resources Check external resources availability
194194
@@ -220,7 +220,7 @@ generate_jga_jsonl
220220

221221
### 実行フロー
222222

223-
```
223+
```plain
224224
PHASE 0: Pre-check
225225
check_external_resources
226226
@@ -348,6 +348,7 @@ es_bulk_insert --index jga-study \
348348
| `DDBJ_SEARCH_CONVERTER_CONST_DIR` | const ディレクトリ(blacklist, DB 等) |
349349
| `DDBJ_SEARCH_CONVERTER_DATE` | 処理日付 (YYYYMMDD) |
350350
| `DDBJ_SEARCH_CONVERTER_ES_URL` | Elasticsearch URL |
351+
| `DDBJ_SEARCH_HOST` | DDBJ Search ホスト名 |
351352
| `DDBJ_SEARCH_CONVERTER_POSTGRES_URL` | PostgreSQL URL |
352353

353354
### 外部リソース確認・前処理

0 commit comments

Comments
 (0)