Assessing the Relationship between Polyphenol Intake and the Gut Microbiome

This repository contains scripts for the analysis of dietary polyphenol intake and gut microbial taxonomic and polyphenol utilization capacity (Figure 1). Methods for deriving polyphenol intake have been previously described by Wilson et al 2024.¹ Code for ASA24 to FooDB polyphenol intake estimation are available on GitHub.

Figure 1: Simplified visual overview of study methods. Current scripts analyze relationships between dietary polyphenol intake and microbial data.

Software

RStudio 2025.05.0+496 "Mariposa Orchid" using R 4.4.2, for data cleaning, analysis, and visualizations
TaxaHFE version 2.0, for feature reduction using taxaHFE-ML
- taxaHFE Paper
- GitHub Repository
dietML - For machine learning

Required Files

Data Availability

Food composition database data that are publicly available (FooDB, PhenolExplorer) are provided in this GitHub Repository. Requests for non-metagenomic data from the USDA ARS WHNRC Nutritional Phenotyping Study used in this analysis should be made via an email to the senior WHNRC author on the publication of interest. Requests are reviewed quarterly by a committee consisting of the study investigators.

Data

Dietary Recalls, Not Publicly Available, Originally downloaded as Items Analysis File from the ASA24 Researcher Site. The dietary data utilized in the USDA Phenotyping Study underwent quality control² then meal disaggregation.
FooDB, Publicly Available, Download from foodb.ca/downloads. FooDB Data Dictionary for Content, Compound, and Food csv files are provided here.
Phenol Explorer Version 3.6, Publicly Available, Download from phenol-explorer.eu/downloads
dbPUP, Publicly Available, Downloadable from the Yin Lab at the University of Nebraska Lincoln here. The citation for dbPUP is available in their publication "Polyphenol Utilization Proteins in the Human Gut Microbiome".
USDA Phenotyping Study Fecal Metagenomes, Publicly Available, Download from:
- NCBI Sequence Read Archive SRP354271, 290 participants
- NCBI Sequence Read Archive SRP497208, 40 participants

Overview of Scripts

Scripts in each set are intended to be run sequentially.

Data Preparation. This set of scripts cleans and prepares cohort metadata, microbial, and polyphenol intake data for downstream analyses.
- A0a_metadata_prep.R
- A0b_microbiome_prep.R
- A0c_taxaHFE_prep.R
- A0d_alpha_diversity_prep.Rmd
- A0e_LPS_microbes_prep.Rmd
- A1a_dbPUP_substrate_prep.R
- A1b_dbPUP_file_prep.R
- A1c_dbPUP_class_prep.Rmd
- A2a_Substrate_MatchFooDBids.R
- A2b_Substrate_FooDB_Taxonomy.Rmd
- A2c_Substrate_Extract_Comp_Data.Rmd
- A2d_Substrate_ASA24_Match.Rmd
- A2e_dbPUP_class_compound_intake_prep.Rmd
Taxonomy Analyses. These scripts perform a statistical overview of the cohort by low and high polyphenol intake quartiles followed by analyses between polyphenol intake (total) and the alpha and beta diversity of the gut microbiome.
- B1_Participant_table.Rmd
- B2_LPS_producer_modelling.Rmd
- B3_alphaDiversity_models.Rmd
- B4a_taxaHFE_run.sh
- B4b_BetaDiv_permanova_loop.R
- B4c_BetaDiv_quartilenosf.R
Functional Analyses. These scripts perform analyses between polyphenol intake (total, class, compound) and PUP gene counts as well as the abundance of PUP-containing microbial genera. Analyses were run with (rank-based estimation regression) and without covariates (Spearman correlation). Covariates included: Age, Sex, BMI, Fiber Intake (g/1000 kcal), and diet quality (Total HEI-2015 Score). Summary plots for covariate-adjusted models were created.
- C0a_PUP_Modelling_DataPrep.Rmd
- C1a_PUP_Correlation_Count.Rmd
- C1b_PUP_Correlation_Abundance.Rmd
- C1c_PUP_Correlation_Summary.Rmd
- C2a_PUP_Modelling_Count.Rmd
- C2b_PUP_Modelling_Abundance.Rmd
- C2c_PUP_Modelling_Summary.Rmd
- C3_PUP_Model_Comparisons.Rmd
- C4a_PUP_Taxonomy_Prevalence_PLOT.Rmd
- C4b_PUP_Modelling_PLOT.Rmd
Inflammation Modelling. These scripts perform analyses related to lipopolysacharride binding protein (LBP). Linear models are first run to determine whether polyphenol intake alpha diversity predicts PUP gene alpha diversity and if PUP gene alpha diversity predicts LBP. Regarding machine learning models, hierarchical feature engineering is performed on PUP genes and PUP-containing microbes. Engineered features are combined with covariates and run in random forest machine learning models. Performance and SHAP plots are generated.
- D0_PUP_diversity_ML_prep.Rmd
- D1_LBP_PUP_Alpha_diversity.Rmd
- D2_ML_HFE_PUP_run.sh
- D2a_ML_HFE_convert_output.Rmd
- D3_ML_dietML_multiseed_loop.sh
- D4a_ML_PUP_Performance.Rmd
- D4b_ML_PUP_SHAP.Rmd
- D5a_ML_PUP_MB_Performance.Rmd
- D5b_ML_PUP_MB_SHAP.Rmd
- D6a_ML_Publication_SHAP_Plot.Rmd
- D6b_ML_Publication_Correlation_Plot.Rmd

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data_preparation		data_preparation
functional_analyses		functional_analyses
functions		functions
images		images
inflammation_modelling		inflammation_modelling
taxonomy_analyses		taxonomy_analyses
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessing the Relationship between Polyphenol Intake and the Gut Microbiome

Software

Required Files

Overview of Scripts

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assessing the Relationship between Polyphenol Intake and the Gut Microbiome

Software

Required Files

Overview of Scripts

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages