Skip to content

LewisLabUCSD/iCHO3K

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iCHO3K

iCHO3K metabolic network hamster

A community-driven, genome-scale metabolic reconstruction and analysis toolkit for Chinese Hamster Ovary (CHO) cells.

Highlights

  • Scope: 11,004 reactions • 7,377 metabolites • 3,597 genes • 3,489 mapped protein structures
  • Use cases: context-specific modeling (WT vs ZeLa), pFBA/ecFBA, flux sampling, subsystem coverage, structure-guided hypotheses (e.g., putative PEP allosteric inhibition of PFK)
  • Artifacts: curated datasets, notebooks (Python & MATLAB), figures, and utilities to reproduce key analyses

If you use this repository or the iCHO3K model, please see Citing below.


Table of contents


Repository layout

Analyses/
├── conf_score_distribution.png            # Confidence Score distribution across all reactions from iCHO3K
├── data_preprocessing                     # ZeLa vs WT growth rate and spent media data analysis
├── growth_rate_pred/                      # pFBA simulations from ZeLa and WT context-specific models
├── recons_comparisons/                    # Plot comparisions between iCHO3K and previous CHO reconstructions
├── Relevant_mets/                         # Analysis of subsystems related to metabolites relevant to biomass sysnthesis
├── sampled_results/
├── subsystem_overview/                    #Subsystem/System classification sunburst plot
└── tSNE/                                  #tSNE embedding analysis

Data/                         
├── Context_specific_models/              # Context-specific ZeLa and WT models (MAT, JSON)
│   ├── ecModels/                         # Context-specific ec models
│   └── unblocked_ecModel_generic/        # Generic iCHO3K ec model
│
├── GPR_Curation/                         # Supplementary data for GPR Mapping from Recon3D to iCHO3K
├── Gene_Essentiality/                    # Set of experimentally validated CHO essential genes
├── Metabolites/                          # Supplementary data for metabolites information
├── Orthologs/                            # Ortholog mapping information from Human to Chinese Hamster
├── Reconciliation/                       # Source reconstructions & derived models and datasets
│   ├── datasets/
│   └── models/
│
├── Uptake_Secretion_Rates/               # Pre-processed uptake and secretion rates from ZeLa fed-batch data
└── ZeLa Data/                            # ZeLa 14 fed-batch raw transcriptomics and spent media data

iCHO3K/
├── Dataset/                              # iCHO3K source dataset for model generation
└── Model/                                # iCHO3K generic model variants

Matlab/
├── ecFBA/                               # ecFBA scripts
├── Model_Extraction/                    # Model Extraction with mCADRE scripts
├── Standep/                             # ubiquityScore calculation with Standep
├── main_Model_Extraction.m              # Main code for mCADRE model extraction
└── main_standep.m                       # Main code for Standep data preprocessing

Python/
├── Network Reconstruction/              # Notebooks related to the reconciliation of previous reconstructions and building of iCHO3K
│   ├── Genes.ipynb                      # Retrieval of Gene information from databases
│   ├── Metabolites.ipynb                # Integration of metabolite information, de-duplication and analysis
│   ├── Reactions.ipynb                  # Reconciliation of previous CHO and Recon3D reconstructions, de-duplication, subsytem re-organization 
│   └── retrieveTurnoverNumber.ipynb      # Fetch turnover numbers and molecular weights from the BRENDA
│                 
├── Supplementary Notebooks/             # Supplementary Notebooks with extra information of previous reconstructions
├── Comparison..Reconstructions.ipynb    # Comparison of iCHO3K with previous CHO reconstructions
├── Computational_Tests.ipynb            # 
├── Final CHO Model.ipynb
├── Calculate_specific_rates.ipynb       # Preprocess of spent media data into GEM fluxes
└── ZeLa_fluxomics.ipynb                 # ZeLa fluxomics data

Large files: Some assets may use Git LFS. If you see pointer files, run:

git lfs install && git lfs pull

Installation & setup

Recommended: conda environment

conda create -n icho3k python=3.11 -y
conda activate icho3k
pip install cobra pandas numpy scipy matplotlib optlang networkx jupyterlab escher seaborn

Optional (for graph utilities & network export):

pip install ndex2 pygraphviz

If environment.yml or requirements.txt is provided in the repo or a release, prefer installing from those for exact reproducibility.

MATLAB (optional)

  • MATLAB R2022b+ recommended (earlier versions likely workable).
  • Add Notebooks/Matlab/ to your MATLAB path.

Quickstart (Python)

1) Load a context-specific model and run pFBA

import cobra
from cobra.flux_analysis import pfba

model = cobra.io.load_json_model("Data/Context_specific_models/ZeLa_model.json")
solution = pfba(model)
print(f"Objective ({model.objective.direction}): {solution.objective_value:.4f}")

# Top 10 absolute fluxes
top = sorted(solution.fluxes.items(), key=lambda x: abs(x[1]), reverse=True)[:10]
for rxn, v in top:
    print(f"{rxn:25s} {v:10.3f}")

2) Compare WT vs ZeLa growth under the same media

import cobra, pandas as pd

wt   = cobra.io.load_json_model("Data/Context_specific_models/WT_model.json")
zela = cobra.io.load_json_model("Data/Context_specific_models/ZeLa_model.json")

# Example: harmonize key exchange bounds
for ex in ["EX_glc__D_e", "EX_gln__L_e", "EX_o2_e"]:
    for m in (wt, zela):
        if ex in m.reactions:
            m.reactions.get_by_id(ex).lower_bound = -10.0

res = []
for name, m in [("WT", wt), ("ZeLa", zela)]:
    sol = m.optimize()
    res.append({"model": name, "mu": sol.objective_value})

print(pd.DataFrame(res))

3) Flux sampling (optlang-compatible solver required)

from cobra.sampling import sample
model = cobra.io.load_json_model("Data/Context_specific_models/WT_model.json")
samples = sample(model, n=1000)   # DataFrame
samples.to_csv("Analyses/sampled_results/wt_samples.csv", index=False)

Quickstart (MATLAB)

% Ensure COBRA Toolbox is installed & initialized
initCobraToolbox(false)  % without updates
changeCobraSolver('glpk', 'LP');

% Load a JSON context-specific model
model = readCbModel('Data/Context_specific_models/WT_model.json');

% Optimize & print objective
solution = optimizeCbModel(model);
fprintf('Growth objective: %.4f\n', solution.f);

MATLAB scripts for extraction, flux sampling, and context-specific modeling are under Notebooks/Matlab/.


Reproducing key analyses

Many figures in Analyses/ are generated from notebooks in Notebooks/:

  • Reconstruction comparisons
    Notebooks/Comparison of Metabolic Reconstructions.ipynbAnalyses/recons_comparisons/
  • Subsystem coverage & sunbursts
    Analyses/subsystem_overview/
  • Flux enrichment & sampling
    Analyses/flux_enrichment_analysis/, Analyses/sampled_results/
  • Growth rate prediction (WT vs ZeLa)
    Analyses/growth_rate_pred/
  • Topology & t-SNE
    Analyses/tSNE/

Most notebooks begin with a “Paths & Environment” cell—update paths as needed. For strict reproducibility, pin exact package versions via environment.yml and use releases/DOI snapshots.


Data & provenance

Curated inputs and derived artifacts are organized under Data/. Key elements:

  • Source reconstructionsReconciliation/datasets/ (inputs) and Reconciliation/models/ (intermediate models).
  • Annotations & mappingsMetabolites/, Subsystem/, Orthologs/.
  • Evidence & curationGPR_Curation/, Gene_Essentiality/, kcat_values/.
  • Experimental constraintsUptake_Secretion_Rates/, ZeLa Data/.
  • Secretory overlapSec_Recon_shared_genes/.
  • Final modeliCHO3K_final/ (Excel format; conversion notebooks provided).

During manual curation, compartment and subsystem information were inherited from source reconstructions; discrepancies were resolved using authoritative resources (see notes within notebooks).


Model formats & I/O

  • Excel: Final iCHO3K lives in Data/iCHO3K_final/ for inspection and conversion.
  • SBML / JSON: Preferred for simulation. Use conversion notebooks (e.g., Notebooks/Final CHO Model.ipynb) or COBRApy I/O:
    import cobra
    m = cobra.io.load_json_model("path/to/model.json")
    cobra.io.save_json_model(m, "out.json")
    cobra.io.write_sbml_model(m, "out.xml")

Some scripts expect standardized BiGG-style IDs. See Notebooks/metabolite_identifiers.py for mapping helpers.


Solvers & performance tips

  • LP/QP solvers: GLPK (free), CPLEX/Gurobi (commercial, academic licenses). Set via COBRApy:
    import cobra
    cobra.Configuration().solver = "glpk"  # or "gurobi", "cplex"
  • Speed: Prefer commercial solvers for large sampling tasks; reduce model size using context-specific models; cache solutions where possible.
  • Numerics: Tighten feasibility/optimality tolerances for sensitive analyses.

Contributing

Contributions are welcome!

  1. Issues: Report bugs, request features, or flag data discrepancies.
  2. PRs: Use feature branches; include a clear description, minimal reproducible example or notebook, and updated docs.
  3. Style: PEP 8 for Python; strip heavy notebook outputs before committing.
  4. Data: Avoid committing large binaries; use Git LFS or attach to Releases/Zenodo.

If contributing new datasets or model variants, please include:

  • Data dictionary (column descriptions, units)
  • Provenance (source links/versions)
  • Minimal script/notebook to regenerate derived artifacts

Citing

If you use iCHO3K or materials from this repository, please cite the bioRxiv preprint:

Di Giusto, P., Choi, D.-H., et al. (2025). A community-consensus reconstruction of Chinese Hamster metabolism enables structural systems biology analyses to decipher metabolic rewiring in lactate-free CHO cells. bioRxiv. https://doi.org/10.1101/2025.04.10.647063 (v1 posted April 17, 2025).


License

See the LICENSE file in this repository for terms of use.
If no license is present, usage defaults to “all rights reserved” until a license is added.


Maintainers & contact


Acknowledgments

We thank the iCHO3K community contributors and collaborators (including secRecon curators). This work builds upon public resources: Recon3D, BiGG, MetaNetX, Rhea, UniProt, BRENDA, and others referenced throughout the notebooks.


FAQ / Troubleshooting

I see .gitattributes LFS pointers instead of files.
Run:

git lfs install
git lfs pull

Solver not found / poor performance.
Install an LP solver (GLPK works; Gurobi/CPLEX recommended for speed). Set cobra.Configuration().solver = "gurobi" once installed.

Model won’t optimize (infeasible).

  • Harmonize exchange bounds across conditions.
  • Check blocked reactions / dead-end metabolites.
  • For comparative runs (WT vs ZeLa), ensure identical media constraints.

Notebook paths are wrong.
Edit the first “Paths & Environment” cell—most notebooks expose a single place to set root paths.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published