
A community-driven, genome-scale metabolic reconstruction and analysis toolkit for Chinese Hamster Ovary (CHO) cells.
Highlights
- Scope: 11,004 reactions • 7,377 metabolites • 3,597 genes • 3,489 mapped protein structures
- Use cases: context-specific modeling (WT vs ZeLa), pFBA/ecFBA, flux sampling, subsystem coverage, structure-guided hypotheses (e.g., putative PEP allosteric inhibition of PFK)
- Artifacts: curated datasets, notebooks (Python & MATLAB), figures, and utilities to reproduce key analyses
If you use this repository or the iCHO3K model, please see Citing below.
- Repository layout
- Installation & setup
- Quickstart (Python)
- Quickstart (MATLAB)
- Reproducing key analyses
- Data & provenance
- Model formats & I/O
- Solvers & performance tips
- Contributing
- Citing
- License
- Maintainers & contact
- Acknowledgments
- FAQ / Troubleshooting
Analyses/
├── conf_score_distribution.png # Confidence Score distribution across all reactions from iCHO3K
├── data_preprocessing # ZeLa vs WT growth rate and spent media data analysis
├── growth_rate_pred/ # pFBA simulations from ZeLa and WT context-specific models
├── recons_comparisons/ # Plot comparisions between iCHO3K and previous CHO reconstructions
├── Relevant_mets/ # Analysis of subsystems related to metabolites relevant to biomass sysnthesis
├── sampled_results/
├── subsystem_overview/ #Subsystem/System classification sunburst plot
└── tSNE/ #tSNE embedding analysis
Data/
├── Context_specific_models/ # Context-specific ZeLa and WT models (MAT, JSON)
│ ├── ecModels/ # Context-specific ec models
│ └── unblocked_ecModel_generic/ # Generic iCHO3K ec model
│
├── GPR_Curation/ # Supplementary data for GPR Mapping from Recon3D to iCHO3K
├── Gene_Essentiality/ # Set of experimentally validated CHO essential genes
├── Metabolites/ # Supplementary data for metabolites information
├── Orthologs/ # Ortholog mapping information from Human to Chinese Hamster
├── Reconciliation/ # Source reconstructions & derived models and datasets
│ ├── datasets/
│ └── models/
│
├── Uptake_Secretion_Rates/ # Pre-processed uptake and secretion rates from ZeLa fed-batch data
└── ZeLa Data/ # ZeLa 14 fed-batch raw transcriptomics and spent media data
iCHO3K/
├── Dataset/ # iCHO3K source dataset for model generation
└── Model/ # iCHO3K generic model variants
Matlab/
├── ecFBA/ # ecFBA scripts
├── Model_Extraction/ # Model Extraction with mCADRE scripts
├── Standep/ # ubiquityScore calculation with Standep
├── main_Model_Extraction.m # Main code for mCADRE model extraction
└── main_standep.m # Main code for Standep data preprocessing
Python/
├── Network Reconstruction/ # Notebooks related to the reconciliation of previous reconstructions and building of iCHO3K
│ ├── Genes.ipynb # Retrieval of Gene information from databases
│ ├── Metabolites.ipynb # Integration of metabolite information, de-duplication and analysis
│ ├── Reactions.ipynb # Reconciliation of previous CHO and Recon3D reconstructions, de-duplication, subsytem re-organization
│ └── retrieveTurnoverNumber.ipynb # Fetch turnover numbers and molecular weights from the BRENDA
│
├── Supplementary Notebooks/ # Supplementary Notebooks with extra information of previous reconstructions
├── Comparison..Reconstructions.ipynb # Comparison of iCHO3K with previous CHO reconstructions
├── Computational_Tests.ipynb #
├── Final CHO Model.ipynb
├── Calculate_specific_rates.ipynb # Preprocess of spent media data into GEM fluxes
└── ZeLa_fluxomics.ipynb # ZeLa fluxomics data
Large files: Some assets may use Git LFS. If you see pointer files, run:
git lfs install && git lfs pull
conda create -n icho3k python=3.11 -y
conda activate icho3k
pip install cobra pandas numpy scipy matplotlib optlang networkx jupyterlab escher seaborn
Optional (for graph utilities & network export):
pip install ndex2 pygraphviz
If
environment.yml
orrequirements.txt
is provided in the repo or a release, prefer installing from those for exact reproducibility.
- MATLAB R2022b+ recommended (earlier versions likely workable).
- Add
Notebooks/Matlab/
to your MATLAB path.
import cobra
from cobra.flux_analysis import pfba
model = cobra.io.load_json_model("Data/Context_specific_models/ZeLa_model.json")
solution = pfba(model)
print(f"Objective ({model.objective.direction}): {solution.objective_value:.4f}")
# Top 10 absolute fluxes
top = sorted(solution.fluxes.items(), key=lambda x: abs(x[1]), reverse=True)[:10]
for rxn, v in top:
print(f"{rxn:25s} {v:10.3f}")
import cobra, pandas as pd
wt = cobra.io.load_json_model("Data/Context_specific_models/WT_model.json")
zela = cobra.io.load_json_model("Data/Context_specific_models/ZeLa_model.json")
# Example: harmonize key exchange bounds
for ex in ["EX_glc__D_e", "EX_gln__L_e", "EX_o2_e"]:
for m in (wt, zela):
if ex in m.reactions:
m.reactions.get_by_id(ex).lower_bound = -10.0
res = []
for name, m in [("WT", wt), ("ZeLa", zela)]:
sol = m.optimize()
res.append({"model": name, "mu": sol.objective_value})
print(pd.DataFrame(res))
from cobra.sampling import sample
model = cobra.io.load_json_model("Data/Context_specific_models/WT_model.json")
samples = sample(model, n=1000) # DataFrame
samples.to_csv("Analyses/sampled_results/wt_samples.csv", index=False)
% Ensure COBRA Toolbox is installed & initialized
initCobraToolbox(false) % without updates
changeCobraSolver('glpk', 'LP');
% Load a JSON context-specific model
model = readCbModel('Data/Context_specific_models/WT_model.json');
% Optimize & print objective
solution = optimizeCbModel(model);
fprintf('Growth objective: %.4f\n', solution.f);
MATLAB scripts for extraction, flux sampling, and context-specific modeling are under
Notebooks/Matlab/
.
Many figures in Analyses/ are generated from notebooks in Notebooks/:
- Reconstruction comparisons →
Notebooks/Comparison of Metabolic Reconstructions.ipynb
→ Analyses/recons_comparisons/ - Subsystem coverage & sunbursts →
Analyses/subsystem_overview/ - Flux enrichment & sampling →
Analyses/flux_enrichment_analysis/, Analyses/sampled_results/ - Growth rate prediction (WT vs ZeLa) →
Analyses/growth_rate_pred/ - Topology & t-SNE →
Analyses/tSNE/
Most notebooks begin with a “Paths & Environment” cell—update paths as needed. For strict reproducibility, pin exact package versions via
environment.yml
and use releases/DOI snapshots.
Curated inputs and derived artifacts are organized under Data/. Key elements:
- Source reconstructions →
Reconciliation/datasets/
(inputs) andReconciliation/models/
(intermediate models). - Annotations & mappings →
Metabolites/
,Subsystem/
,Orthologs/
. - Evidence & curation →
GPR_Curation/
,Gene_Essentiality/
,kcat_values/
. - Experimental constraints →
Uptake_Secretion_Rates/
,ZeLa Data/
. - Secretory overlap →
Sec_Recon_shared_genes/
. - Final model →
iCHO3K_final/
(Excel format; conversion notebooks provided).
During manual curation, compartment and subsystem information were inherited from source reconstructions; discrepancies were resolved using authoritative resources (see notes within notebooks).
- Excel: Final iCHO3K lives in
Data/iCHO3K_final/
for inspection and conversion. - SBML / JSON: Preferred for simulation. Use conversion notebooks (e.g., Notebooks/Final CHO Model.ipynb) or COBRApy I/O:
import cobra m = cobra.io.load_json_model("path/to/model.json") cobra.io.save_json_model(m, "out.json") cobra.io.write_sbml_model(m, "out.xml")
Some scripts expect standardized BiGG-style IDs. See
Notebooks/metabolite_identifiers.py
for mapping helpers.
- LP/QP solvers: GLPK (free), CPLEX/Gurobi (commercial, academic licenses). Set via COBRApy:
import cobra cobra.Configuration().solver = "glpk" # or "gurobi", "cplex"
- Speed: Prefer commercial solvers for large sampling tasks; reduce model size using context-specific models; cache solutions where possible.
- Numerics: Tighten feasibility/optimality tolerances for sensitive analyses.
Contributions are welcome!
- Issues: Report bugs, request features, or flag data discrepancies.
- PRs: Use feature branches; include a clear description, minimal reproducible example or notebook, and updated docs.
- Style: PEP 8 for Python; strip heavy notebook outputs before committing.
- Data: Avoid committing large binaries; use Git LFS or attach to Releases/Zenodo.
If contributing new datasets or model variants, please include:
- Data dictionary (column descriptions, units)
- Provenance (source links/versions)
- Minimal script/notebook to regenerate derived artifacts
If you use iCHO3K or materials from this repository, please cite the bioRxiv preprint:
Di Giusto, P., Choi, D.-H., et al. (2025). A community-consensus reconstruction of Chinese Hamster metabolism enables structural systems biology analyses to decipher metabolic rewiring in lactate-free CHO cells. bioRxiv. https://doi.org/10.1101/2025.04.10.647063 (v1 posted April 17, 2025).
See the LICENSE
file in this repository for terms of use.
If no license is present, usage defaults to “all rights reserved” until a license is added.
- Pablo Di Giusto — [email protected] · [email protected]
Systems Biology & Cell Engineering Lab (Lewis Lab), UC San Diego & University of Georgia
We thank the iCHO3K community contributors and collaborators (including secRecon curators). This work builds upon public resources: Recon3D, BiGG, MetaNetX, Rhea, UniProt, BRENDA, and others referenced throughout the notebooks.
I see .gitattributes LFS pointers instead of files.
Run:
git lfs install
git lfs pull
Solver not found / poor performance.
Install an LP solver (GLPK works; Gurobi/CPLEX recommended for speed). Set cobra.Configuration().solver = "gurobi"
once installed.
Model won’t optimize (infeasible).
- Harmonize exchange bounds across conditions.
- Check blocked reactions / dead-end metabolites.
- For comparative runs (WT vs ZeLa), ensure identical media constraints.
Notebook paths are wrong.
Edit the first “Paths & Environment” cell—most notebooks expose a single place to set root paths.