Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .copier-answers.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
_commit: adada77
_src_path: gh:monarch-initiative/koza-ingest-template
copyright_year: '2026'
email: kevin@tislab.org
full_name: Kevin Schaper
github_handle: kevinschaper
github_org: monarch-initiative
license: BSD-3-Clause
project_description: Ingest of genes from NCBI
project_name: ncbi-gene
project_slug: ncbi_gene
25 changes: 0 additions & 25 deletions .cruft.json

This file was deleted.

58 changes: 0 additions & 58 deletions .github/workflows/create-release.yaml

This file was deleted.

38 changes: 0 additions & 38 deletions .github/workflows/deploy-docs.yaml

This file was deleted.

73 changes: 73 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Release

on:
schedule:
# Run monthly on the 3rd at midnight UTC
- cron: '0 0 3 * *'
workflow_dispatch:
inputs:
tag:
description: 'Release tag (leave empty for auto-generated date tag)'
required: false
type: string

env:
NCBI_API_KEY: ${{ secrets.NCBI_API_KEY }}
NCBI_MAIL: ${{ secrets.NCBI_MAIL }}
TRANSFORMS: "transform"

jobs:
release:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: "latest"

- name: Set up Python
run: uv python install 3.11

- name: Install just
uses: extractions/setup-just@v2

- name: Run pipeline
run: just run

- name: Postprocess
run: just postprocess

- name: Generate release tag
id: tag
run: |
if [ -n "${{ inputs.tag }}" ]; then
echo "tag=${{ inputs.tag }}" >> $GITHUB_OUTPUT
else
echo "tag=v$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
fi

- name: Create Release
uses: softprops/action-gh-release@v2
with:
tag_name: ${{ steps.tag.outputs.tag }}
name: Release ${{ steps.tag.outputs.tag }}
body: |
Automated monthly release of NCBI Gene ingest data.

## Transforms included
- transform (NCBI Gene)

## Output
- Gene nodes with postprocessed files split by taxon
draft: false
prerelease: false
files: |
output/*_nodes.tsv
output/*_edges.tsv
output/*.nt.gz
output/by_taxon/*
44 changes: 13 additions & 31 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,32 @@ on:
push:
branches: [main]
pull_request:
workflow_dispatch:
branches: [main]

env:
NCBI_API_KEY: ${{ secrets.NCBI_API_KEY }}
NCBI_MAIL: ${{ secrets.NCBI_MAIL }}

jobs:
test-backend:
runs-on: ${{ matrix.os }}

test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
os: [ubuntu-latest]
#os: [ ubuntu-latest, windows-latest ]

steps:
- uses: actions/checkout@v4

- name: Debug Secrets
run: |
if [ -z "$NCBI_API_KEY" ]; then echo "NCBI_API_KEY is NOT set"; else echo "NCBI_API_KEY is SET"; fi
if [ -z "$NCBI_MAIL" ]; then echo "NCBI_MAIL is NOT set"; else echo "NCBI_MAIL is SET"; fi
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: "latest"

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

#----------------------------------------------
# install & configure poetry
#----------------------------------------------
- name: Install Poetry
uses: snok/install-poetry@v1

#----------------------------------------------
# install your root project, if required
#----------------------------------------------
- name: Install library
run: poetry install --no-interaction

#----------------------------------------------
# run pytest
#----------------------------------------------
- name: Run tests
run: poetry run pytest tests
run: uv python install ${{ matrix.python-version }}

- name: Install just
uses: extractions/setup-just@v2

- name: Run tests
run: just test
49 changes: 0 additions & 49 deletions .github/workflows/update-docs.yaml

This file was deleted.

36 changes: 36 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# ncbi-gene

This is a Koza ingest repository for transforming NCBI Gene data into Biolink model format.

## Project Structure

- `download.yaml` - Configuration for downloading NCBI gene_info data
- `src/` - Transform code and configuration
- `transform.py` / `transform.yaml` - Main transform for NCBI genes
- `taxon_lookup.py` - Helper module for fetching taxon names via NCBI E-utilities
- `tests/` - Unit tests for transforms
- `output/` - Generated nodes and edges (gitignored)
- `data/` - Downloaded source data (gitignored)

## Key Commands

- `just run` - Full pipeline (download -> transform -> postprocess)
- `just download` - Download NCBI gene_info data
- `just transform-all` - Run all transforms
- `just postprocess` - Split output by taxon
- `just test` - Run tests

## Postprocessing

This ingest includes a postprocessing step to split the output nodes file by taxon:
```bash
uv run koza split output/ncbi_gene_nodes.tsv in_taxon --remove-prefixes --output-dir output/by_taxon
```

This creates separate files per species in `output/by_taxon/`.

## Environment Variables

The taxon lookup module uses NCBI E-utilities and can be configured with:
- `NCBI_API_KEY` - NCBI API key for higher rate limits
- `NCBI_MAIL` - Email for NCBI E-utilities identification
31 changes: 0 additions & 31 deletions CONTRIBUTING.md

This file was deleted.

Loading