Translations: 日本語 | 한국어 | 简体中文 | 繁體中文 | हिन्दी | Bahasa Indonesia | Tiếng Việt | Español | Français | Deutsch | Português
TSUMUGI (Trait-driven Surveillance for Mutation-based Gene module Identification) is a web tool that uses knockout (KO) mouse phenotype data from the International Mouse Phenotyping Consortium (IMPC) to extract and visualize gene modules based on phenotypic similarity.
TSUMUGI (紡ぎ in Japanese) comes from the idea of “weaving together gene groups that form phenotypes.”
This web app is available to everyone online👇️
🔗 https://larc-tsukuba.github.io/tsumugi/
TSUMUGI supports three kinds of input.
Enter a phenotype of interest to search for genes whose KO mice have similar overall phenotype profiles.
Phenotype names follow Mammalian Phenotype Ontology (MPO).
Specify one gene to search for other genes whose KO mice show similar phenotypes.
Gene symbols follow MGI.
Paste multiple genes (one per line). This extracts phenotypically similar genes among the genes in the list.
Caution
If no similar genes are found: No similar phenotypes were found among the entered genes.
If more than 200 similar genes are found: Too many genes submitted. Please limit the number to 200 or fewer.
TSUMUGI reports gzipped JSONL files.
- Gene symbol (e.g., "1110059G10Rik")
- Marker accession ID (e.g., "MGI:1913452")
- Phenotype term name/ID (e.g., "fused joints", "MP:0000137")
- Effect size (e.g., 0.0, 1.324)
- Significance flag (true/false)
- Zygosity ("Homo", "Hetero", "Hemi")
- Life stage ("Embryo", "Early", "Interval", "Late")
- Sexual dimorphism ("None", "Male", "Female")
- Disease annotation (e.g., [] or "Premature Ovarian Failure 18")
Example:
{"significant": true, "sexual_dimorphism": "Female", "effect_size": 0.0119677350763567, "marker_symbol": "4930447C04Rik", "zygosity": "Homo", "marker_accession_id": "MGI:1923051", "mp_term_id": "MP:0000063", "disease_annotation": ["Male Infertility With Azoospermia Or Oligozoospermia Due To Single Gene Mutation", "Premature Ovarian Failure 18", "Spermatogenic Failure 52"], "life_stage": "Early", "mp_term_name": "decreased bone mineral density"}- Gene pair (
gene1_symbol,gene2_symbol) phenotype_shared_annotations(per-phenotype metadata: life stage, zygosity, sexual dimorphism)phenotype_similarity_score(Phenodigm score, 0–100)
Example:
{"gene1_symbol": "1500009L16Rik", "gene2_symbol": "Aak1", "phenotype_shared_annotations": [{"mp_term_name": "increased circulating enzyme level", "life_stage": "Early", "zygosity": "Homo", "sexual_dimorphism": "None"}], "phenotype_similarity_score": 47}The page transitions and draws the network automatically.
Important
Gene pairs with 3 or more shared abnormal phenotypes and phenotypic similarity > 0.0 are visualized.
Nodes represent genes. Click to see the list of abnormal phenotypes observed in that KO mouse; drag to rearrange positions.
Edges show shared phenotypes; click to view details.
Modules outline subnetworks of genes. Click a module to list phenotypes involving its member genes; drag modules to reposition them and avoid overlap.
Adjust network display from the left panel.
Phenotypes similarity slider thresholds edges by Resnik→Phenodigm score.
Note
For how we compute similarity, see: 👉 🔍 How We Calculate Phenotypically Similar Genes
Phenotype severity slider filters nodes by effect size (severity in KO mice). Higher values mean stronger impact.
Note
Hidden for binary phenotypes (e.g., abnormal embryo development; binary list: 👉 here) or gene(s) input.
Choose the genotype in which phenotypes appear:
Homo: homozygousHetero: heterozygousHemi: hemizygous
Extract sex-specific phenotypes:
FemaleMale
Filter by life stage in which phenotypes appear:
EmbryoEarly(0–16 weeks)Interval(17–48 weeks)Late(49+ weeks)
Highlight genes linked to human disease (IMPC Disease Models Portal data).
Search gene names within the network.
Adjust layout, font size, edge width, and node repulsion (Cose layout).
Export the current network as PNG/CSV/GraphML.
CSV includes connected-component (module) IDs and phenotype lists per gene; GraphML is Cytoscape-compatible.
The TSUMUGI CLI allows you to use the latest IMPC data downloaded locally, and provides more fine-grained filtering and output options than the web tool.
- Recompute with IMPC
statistical-results-ALL.csv.gz(optionallymp.obo,impc_phenodigm.csv). - Filter by presence/absence of MP terms.
- Filter by gene list (comma-separated or text file).
- Outputs: GraphML (
tsumugi build-graphml), offline webapp bundle (tsumugi build-webapp).
BioConda:
conda install -c conda-forge -c bioconda tsumugiPyPI:
pip install tsumugiYou are ready if tsumugi --version prints the version.
tsumugi run: Recompute the network from IMPC datatsumugi mp --include/--exclude (--pairwise/--genewise): Filter gene pairs or genes that contain / do not show an MP termtsumugi count --pairwise/--genewise (--min/--max): Filter by phenotype counts (pairwise or per gene)tsumugi score (--min/--max): Filter by phenotype similarity score (pairwise)tsumugi genes --keep/--drop: Keep/drop by gene list (comma-separated or text file)tsumugi life-stage --keep/--drop: Filter by life stage (Embryo/Early/Interval/Late)tsumugi sex --keep/--drop: Filter by sex (Male/Female/None)tsumugi zygosity --keep/--drop: Filter by zygosity (Homo/Hetero/Hemi)tsumugi build-graphml: Generate GraphML (Cytoscape, etc.)tsumugi build-webapp: Generate TSUMUGI webapp assets (local HTML/CSS/JS)
Note
All filtering subcommands stream JSONL to STDOUT.
Redirect with > if you want to save results to a file.
Important
All commands except tsumugi run require either pairwise_similarity_annotation.jsonl.gz or genewise_phenotype_annotation.jsonl.gz.
Both files can be downloaded from the TSUMUGI top page.
If --mp_obo is omitted, TSUMUGI uses the bundled data-version: releases/2025-08-27/mp.obo.
If --impc_phenodigm is omitted, it uses the file fetched on 2025-10-01 from the IMPC Disease Models Portal.
tsumugi run \
--output_dir ./tsumugi-output \
--statistical_results ./statistical-results-ALL.csv.gz \
--threads 8Outputs: ./tsumugi-output contains genewise annotations (genewise_phenotype_annotations.jsonl.gz), pairwise similarity data (pairwise_similarity_annotations.jsonl.gz), and visualization assets (TSUMUGI-webapp).
Important
The TSUMUGI-webapp directory includes OS-specific launch scripts; double-click to open the local web app:
- Windows:
open_webapp_windows.bat - macOS:
open_webapp_mac.command - Linux:
open_webapp_linux.sh
Extract gene pairs (or genes) that include phenotypes of interest, or pairs whose relevant phenotypes were measured but did not show significant abnormalities.
tsumugi mp [-h] (-i MP_ID | -e MP_ID) [-g | -p] [-m PATH_MP_OBO] [-a PATH_GENEWISE_ANNOTATIONS] [--in PATH_PAIRWISE_ANNOTATIONS]
[--life_stage LIFE_STAGE] [--sex SEX] [--zygosity ZYGOSITY]Include genes/gene pairs that have the specified MP term (descendants included).
Return genes/gene pairs that were measured for the specified MP term (descendants included) and did not show a significant phenotype. Requires -a/--genewise_annotations.
Filter at gene level. Reads genewise_phenotype_annotations.jsonl(.gz). When using --genewise, specify -a/--genewise_annotations.
Filter at gene-pair level. Targets pairwise_similarity_annotations.jsonl(.gz). If --in is omitted, reads from STDIN.
Path to Mammalian Phenotype ontology (mp.obo). If omitted, uses the bundled data/mp.obo.
Path to the genewise annotation file (JSONL/.gz). Required for --exclude; also specify when using --genewise.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
Additional filter by life stage. Available values: Embryo, Early, Interval, Late.
Additional filter by sexual dimorphism. Use the values present in annotations (e.g., Male, Female, None).
Additional filter by zygosity. Available values: Homo, Hetero, Hemi.
# Extract only gene pairs that include MP:0001146 (abnormal testis morphology) or descendant terms (e.g., MP:0004849 abnormal testis size)
tsumugi mp --include MP:0001146 \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_filtered.jsonl
# Extract gene pairs whose measured genes include MP:0001146 and descendant terms and did not show a significant abnormality
tsumugi mp --exclude MP:0001146 \
--genewise genewise_phenotype_annotations.jsonl.gz \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_filtered.jsonl
# Extract significant gene-level annotations containing MP:0001146 (descendants included)
tsumugi mp --include MP:0001146 \
--genewise \
--genewise_annotations genewise_phenotype_annotations.jsonl.gz \
> genewise_filtered.jsonl
# Extract genes measured for MP:0001146 (descendants included) that did not show a significant abnormality
tsumugi mp --exclude MP:0001146 \
--genewise \
--genewise_annotations genewise_phenotype_annotations.jsonl.gz \
> genewise_no_phenotype.jsonlImportant
Descendant MP terms of the specified ID are also handled.
For example, if you specify MP:0001146 (abnormal testis morphology), descendant terms such as MP:0004849 (abnormal testis size) are considered as well.
tsumugi count [-h] (-g | -p) [--min MIN] [--max MAX] [--in PATH_PAIRWISE_ANNOTATIONS] [-a PATH_GENEWISE_ANNOTATIONS]Filter genes or gene pairs by the number of phenotypes. At least one of --min or --max is required.
Filter by the number of significant phenotypes per gene. Requires -a/--genewise_annotations with genewise_phenotype_annotations.jsonl(.gz).
Filter by the number of shared phenotypes per gene pair. If --in is omitted, reads pairwise_similarity_annotations.jsonl(.gz) from STDIN.
Lower/upper bounds for phenotype counts. Use either flag alone for one-sided filtering.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
Path to the genewise annotation file (JSONL/.gz). Required with --genewise.
- Shared phenotypes per pair:
tsumugi count --pairwise --min 3 --max 20 \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_min3_max20.jsonl- Phenotypes per gene (genewise required):
tsumugi count --genewise --min 5 --max 50 \
--genewise genewise_phenotype_annotations.jsonl.gz \
--in pairwise_similarity_annotations.jsonl.gz \
> genewise_min5_max50.jsonl--min or --max alone is fine.
tsumugi score [-h] [--min MIN] [--max MAX] [--in PATH_PAIRWISE_ANNOTATIONS]Filter gene pairs by phenotype_similarity_score (0–100). At least one of --min or --max is required.
Lower/upper bounds for phenotype similarity score. Use either flag alone for one-sided filtering.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
tsumugi score --min 50 --max 80 \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_score50_80.jsonl--min or --max alone is fine.
tsumugi genes [-h] (-k GENE_SYMBOL | -d GENE_SYMBOL) [-g | -p] [--in PATH_PAIRWISE_ANNOTATIONS]Keep only pairs containing specified genes in a text file.
Drop pairs containing specified genes in a text file.
Filter by user-provided gene symbols.
Filter by user-provided gene pairs.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
cat << EOF > genes.txt
Maf
Aamp
Cacna1c
EOF
tsumugi genes --genewise --keep genes.txt \
--in "$directory"/pairwise_similarity_annotations.jsonl.gz \
> pairwise_keep_genes.jsonl
cat << EOF > gene_pairs.csv
Maf,Aamp
Maf,Cacna1c
EOF
tsumugi genes --pairwise --drop gene_pairs.csv \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_drop_genes.jsonl
tsumugi life-stage [-h] (-k LIFE_STAGE | -d LIFE_STAGE) [--in PATH_PAIRWISE_ANNOTATIONS]Keep only annotations with the specified life stage (Embryo, Early, Interval, Late).
Drop annotations with the specified life stage.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
tsumugi life-stage --keep Early \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_lifestage_early.jsonltsumugi sex [-h] (-k SEX | -d SEX) [--in PATH_PAIRWISE_ANNOTATIONS]Keep only annotations with the specified sexual dimorphism (Male, Female, None).
Drop annotations with the specified sexual dimorphism.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
tsumugi sex --drop Male \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_no_male.jsonltsumugi zygosity [-h] (-k ZYGOSITY | -d ZYGOSITY) [--in PATH_PAIRWISE_ANNOTATIONS]Keep only annotations with the specified zygosity (Homo, Hetero, Hemi).
Drop annotations with the specified zygosity.
Path to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
tsumugi zygosity --keep Homo \
--in pairwise_similarity_annotations.jsonl.gz \
> pairwise_homo.jsonltsumugi build-graphml [-h] [--in PATH_PAIRWISE_ANNOTATIONS] -a PATH_GENEWISE_ANNOTATIONSPath to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
Path to the genewise annotation file (JSONL/.gz). Required.
tsumugi build-graphml \
--in pairwise_similarity_annotations.jsonl.gz \
--genewise genewise_phenotype_annotations.jsonl.gz \
> network.graphmltsumugi build-webapp [-h] [--in PATH_PAIRWISE_ANNOTATIONS] -a PATH_GENEWISE_ANNOTATIONS -o OUTPath to the pairwise annotation file (JSONL/.gz). If omitted, reads from STDIN.
Path to the genewise annotation file (JSONL/.gz). Required.
Output directory for the webapp bundle (HTML/CSS/JS + network data). Do not specify a filename with an extension.
tsumugi build-webapp \
--in pairwise_similarity_annotations.jsonl.gz \
--genewise genewise_phenotype_annotations.jsonl.gz \
--output_dir ./webapp_outputCLI supports STDIN/STDOUT, so you can chain commands:
zcat pairwise_similarity_annotations.jsonl.gz | tsumugi mp ... | tsumugi genes ... > out.jsonl
We use the IMPC dataset Release-23.0 statistical-results-ALL.csv.gz.
See dataset columns: Data fields
Extract gene–phenotype pairs whose KO mouse P-values (p_value, female_ko_effect_p_value, or male_ko_effect_p_value) are ≤ 0.0001.
- Annotate genotype-specific phenotypes as
homo,hetero, orhemi. - Annotate sex-specific phenotypes as
femaleormale.
TSUMUGI adopts a Phenodigm-like approach (Smedley D, et al. (2013)).
Note
Differences from the original Phenodigm are as follows.
- Terms below the 5th percentile of IC are set to IC=0, so overly general phenotypes (e.g., embryo phenotype) are not evaluated.
- We apply weighting based on metadata matches in genotype, life stage, and sex.
-
Build the MP ontology and compute Information Content (IC) for each term:
IC(term) = -log((|Descendants(term)| + 1) / |All MP terms|)
Terms below the 5th percentile of IC are set to IC=0. -
For each MP term pair, find the most specific common ancestor (MICA) and use its IC as Resnik similarity.
-
For two MP terms, compute the Jaccard index of their ancestor sets.
-
Define MP term-pair similarity as
sqrt(Resnik * Jaccard).
-
Apply weights based on phenotype metadata: genotype, life stage, and sex.
-
For each gene pair, build an MP-term × MP-term similarity matrix.
-
Multiply by weights 0.2, 0.5, 0.75, 1.0 for 0, 1, 2, 3 matches of genotype/life stage/sex.
- Apply Phenodigm-style scaling to normalize each KO mouse phenotype similarity to 0–100:
Compute observed max/mean, then normalize by theoretical max/mean.
Score = 100 * (normalized_max + normalized_mean) / 2
If the denominator is 0, the score is set to 0.
- Google Form: https://forms.gle/ME8EJZZHaRNgKZ979
- GitHub Issues: https://github.com/akikuno/TSUMUGI-dev/issues/new/choose