GitHub - NTWKKM/stat-shiny

title	emoji	colorFrom	colorTo	sdk	app_port	pinned
shinystat	📊	blue	green	docker	7860	false

Important

🤖 FOR AI AGENTS & BOTS (Antigravity, Cursor, etc.):

Architecture First: ALWAYS Read docs/ARCHITECTURE.md to understand the Styling System and Data Processing Pipeline before editing.
Environment: This project requires Python 3.12+. Versions < 3.12 will cause SyntaxErrors due to the use of PEP 695 (Type Parameter Syntax/Type Aliases).
Context: See docs/deep-ai-bot-optimization.md for detailed implementation rules.

Requirement	Version
Python	≥ 3.12
Environment	`.venv`

Available StatioMed:
👉 HuggingFace Space
👉 Posit Cloud

[--- REPOSITORY-TREE-START ---]

📂 Repository Contents (File Structure)

This content reflects the repository structure (updated by GitHub Actions):

|-- AI_CONTEXT.md
|-- Dockerfile
|-- app.py
|-- asgi.py
|-- config.py
|-- docker-compose.yml
|-- docs /
|   |-- ARCHITECTURE.md
|   `-- assets /
|       `-- navigation_sequence.png
|-- logger.py
|-- pyproject.toml
|-- pytest.ini
|-- requirements-prod.txt
|-- requirements.txt
|-- static /
|   |-- js /
|   |   `-- custom_handlers.js
|   `-- styles.css
|-- tabs /
|   |-- __init__.py
|   |-- _common.py
|   |-- _styling.py
|   |-- _tvc_components.py
|   |-- tab_advanced_inference.py
|   |-- tab_advanced_stats.py
|   |-- tab_agreement.py
|   |-- tab_baseline_matching.py
|   |-- tab_causal_inference.py
|   |-- tab_core_regression.py
|   |-- tab_corr.py
|   |-- tab_data.py
|   |-- tab_diag.py
|   |-- tab_home.py
|   |-- tab_sample_size.py
|   |-- tab_settings.py
|   `-- tab_survival.py
|-- tests /
|   |-- benchmarks /
|   |   |-- python_results /
|   |   |   |-- benchmark_firth_cox.csv
|   |   |   |-- benchmark_firth_logistic.csv
|   |   |   |-- dataset_breast.csv
|   |   |   `-- dataset_sex2.csv
|   |   `-- r_scripts /
|   |       `-- test_firth.R
|   |-- conftest.py
|   |-- e2e /
|   |   |-- test_app_flow.py
|   |   |-- test_load_example_data.py
|   |   `-- test_smart_defaults.py
|   |-- integration /
|   |   |-- test_advanced_features.py
|   |   |-- test_corr_cleaning.py
|   |   |-- test_corr_pipeline.py
|   |   |-- test_data_cleaning_pipeline.py
|   |   |-- test_diag_cleaning.py
|   |   |-- test_diag_pipeline.py
|   |   |-- test_forest_plot_lib.py
|   |   |-- test_interaction_pipeline.py
|   |   |-- test_logic_pipeline.py
|   |   |-- test_poisson_cleaning.py
|   |   |-- test_poisson_pipeline.py
|   |   |-- test_psm_pipeline.py
|   |   |-- test_robustness_check.py
|   |   |-- test_subgroup_pipeline.py
|   |   |-- test_survival_cleaning.py
|   |   |-- test_survival_landmark.py
|   |   |-- test_survival_pipeline.py
|   |   `-- test_table_one_pipeline.py
|   `-- unit /
|       |-- output /
|       |   `-- test_output_2.html
|       |-- test_advanced_stats.py
|       |-- test_bland_altman.py
|       |-- test_causal.py
|       |-- test_chi_html.py
|       |-- test_collinearity.py
|       |-- test_color_palette.py
|       |-- test_correlation_returns.py
|       |-- test_data_cleaning.py
|       |-- test_data_cleaning_advanced.py
|       |-- test_data_cleaning_workflow.py
|       |-- test_data_quality.py
|       |-- test_data_quality_report.py
|       |-- test_dca.py
|       |-- test_diag_returns.py
|       |-- test_diagnostic_advanced.py
|       |-- test_download_helpers.py
|       |-- test_effect_sizes.py
|       |-- test_firth_regression.py
|       |-- test_formatting.py
|       |-- test_formatting_styles.py
|       |-- test_glm.py
|       |-- test_heterogeneity.py
|       |-- test_linear_lib.py
|       |-- test_mediation.py
|       |-- test_mi_reporting.py
|       |-- test_missing_data.py
|       |-- test_model_diagnostics.py
|       |-- test_model_diagnostics_plots.py
|       |-- test_multiple_imputation.py
|       |-- test_phase3_features.py
|       |-- test_plotly_html_rendering.py
|       |-- test_poisson_lib.py
|       |-- test_regression_publication.py
|       |-- test_repeated_measures.py
|       |-- test_reporting_checklists.py
|       |-- test_sample_size.py
|       |-- test_sensitivity.py
|       |-- test_sensitivity_fixes.py
|       |-- test_statistical_assumptions.py
|       |-- test_statistics.py
|       |-- test_survival_assumptions.py
|       |-- test_survival_lib_patch.py
|       |-- test_tab_diag_html_logic.py
|       |-- test_tvc_lib.py
|       |-- test_ui_ux_styles.py
|       `-- verify_table_one.py
`-- utils /
    |-- __init__.py
    |-- advanced_stats_lib.py
    |-- agreement_lib.py
    |-- calibration_lib.py
    |-- collinearity_lib.py
    |-- correlation.py
    |-- data_cleaning.py
    |-- data_quality.py
    |-- decision_curve_lib.py
    |-- diag_test.py
    |-- diagnostic_advanced_lib.py
    |-- download_helpers.py
    |-- effect_sizes.py
    |-- forest_plot_lib.py
    |-- formatting.py
    |-- heterogeneity_lib.py
    |-- interaction_lib.py
    |-- linear_lib.py
    |-- logic.py
    |-- mediation_lib.py
    |-- model_diagnostics_lib.py
    |-- multiple_imputation.py
    |-- plotly_html_renderer.py
    |-- poisson_lib.py
    |-- psm_lib.py
    |-- rcs_lib.py
    |-- repeated_measures_lib.py
    |-- reporting_checklists.py
    |-- sample_size_lib.py
    |-- sensitivity_lib.py
    |-- statistical_assumptions.py
    |-- stratified_lib.py
    |-- subgroup_analysis_module.py
    |-- survival_lib.py
    |-- table_one.py
    |-- table_one_advanced.py
    |-- tvc_lib.py
    |-- ui_helpers.py
    |-- update_css.py
    `-- visualizations.py

[--- REPOSITORY-TREE-END ---]

🏥 Medical Statistical Tool (Shiny for Python)

A comprehensive, interactive web application for medical statistical analysis, built with Shiny for Python. This tool simplifies the process of data management, cohort matching, and advanced statistical modeling for medical researchers.

🚀 Key Features

This application is a complete statistical workbench organized into modular tabs:

📁 Data Management

Comprehensive Data Control: Upload CSV/Excel or load example datasets.
Data Health Report: Automated deep checks via utils/data_quality.py for:
- Quality Scorecard: Instant rating of Completeness, Consistency, Uniqueness, and Validity.
- Missing Data: Detailed reporting of missing values with row positions.
- Non-standard Numeric: Smart detection of medical strings like "<5", ">10", or currency.
- Categorical Integrity: Identifies numeric values in categorical text and flags rare categories (< 5 occurrences).
Variable Configuration: Interactive type casting and missing value handling.
Advanced Cleaning:
- Imputation: Support for Mean, Median, KNN, and MICE strategies.
- Multiple Imputation (NEW): Full MICE workflow with Auto-Pooled Regression (Rubin's Rules, FMI reporting).
- Transformation: Log, Sqrt, and Z-Score standardization with normality assumption checks (Shapiro-Wilk/K-S).
- Outlier Handling: Detection (IQR/Z-Score) and treatment (Winsorize, Cap, Remove).

📋 Baseline & Matching

Table 1 Generation: Publication-ready baseline tables with Intelligent Variable Classification, automated statistical testing, and Odds Ratios with 95% CI (Configurable: "All Levels" vs Reference or "Simple" Target vs Ref).
Propensity Score Matching (PSM): Advanced matching with customizable calipers and variable selection.
Balance Diagnostics: Enhanced Love Plots with Green/Yellow zones (<0.1/<0.2 SMD) for assessing covariate balance.
Common Support: Propensity Score overlap assessment with automated distribution plots and exclusion recommendations.
Weight Truncation: Option to handling extreme weights (1%/99% trimming) for stable IPW estimates.
Matched Data Export: seamless integration of matched cohorts into other analyses.

🔢 Sample Size & Power

Calculators: Power and sample size estimation for:
- Means (T-test)
- Proportions (Chi-Square)
- Survival (Log-Rank based on HR or Median)
- Correlation (Pearson)

📈 Core Regression Models

GLM Framework:
- Logistic Regression: Standard, Auto, Firth's Regression (rare events), and Subgroup Analysis (Forest Plots with Interaction P-value annotation and ICEMAN credibility).
- Count Models: Poisson and Negative Binomial regression.
- Linear Regression: OLS with options for robust standard errors.
Repeated Measures: Generalized Estimating Equations (GEE) and Linear Mixed Models (LMM).
NEJM/Lancet Publication Standards (NEW):
- Model Diagnostics: C-statistic with 95% CI, Brier Score, Calibration Slope, Hosmer-Lemeshow test, Calibration plots with LOWESS.
- Absolute Measures: ARD (Absolute Risk Difference) with Newcombe CI, NNT/NNH with Altman method.
- Sensitivity Analysis: E-value for unmeasured confounding assessment.
- STROBE Checklist: Auto-populated reporting checklist with markdown export.
Effect Sizes (NEW): Cohen's d, Hedges' g, η² (Eta-squared), ω² (Omega-squared) with interpretation badges.
MI Auto-Pooling (NEW): When Multiple Imputation is active, regression automatically pools results using Rubin's Rules with FMI (Fraction Missing Information) reporting. Supported analyses: Logistic, Linear, Cox PH, and Mediation.

⏳ Survival Analysis

Visualizations: Kaplan-Meier curves and Nelson-Aalen cumulative hazard plots.
Cox Modeling: Cox Proportional Hazards regression with forest plots and Firth's Penalized Cox for rare events/small samples.
Advanced Techniques:
- Time-Varying Cox: Handle covariates that change over time.
- Landmark Analysis: Address immortal time bias.
- Subgroup Analysis: Explore treatment effect heterogeneity with Formal Interaction Tests (LRT) and annotated Forest Plots.
Model Diagnostics:
- Assumption Checks: Automated Schoenfeld residuals with remedies.
- Extended Plots: Martingale (linearity) and Deviance (outliers) residuals.

🎯 Causal Inference

Propensity Methods: IPW (Inverse Probability Weighting) and PSM integration.
Stratified Analysis: Mantel-Haenszel odds ratios and Breslow-Day homogeneity tests.
Sensitivity Analysis: E-Value calculation for unmeasured confounding.
Subgroup Credibility (NEW): ICEMAN framework for assessing heterogeneity claims with Bonferroni adjustment.
Diagnostics: Comprehensive Propensity Score Diagnostics including overlapping density plots and summary statistics.

🧪 Diagnostic Tests & Agreement

Diagnostic Accuracy: Advanced ROC Analysis with DeLong's Test for comparison, Optimal Thresholds (Youden/F1), and Confidence Intervals for all metrics.
Decision Curve Analysis (DCA): Assess clinical net benefit.
Agreement Statistics: Cohen's Kappa (with CI), Fleiss' Kappa (Multi-rater), Bland-Altman (with LoA CI bands), and enhanced ICC (pingouin integration) with interpretation badges.
Contingency Analysis: Chi-Square, Fisher's Exact Test, Risk Ratios, and Odds Ratios.

🧩 Advanced Inference

Mediation Analysis: Decomposition into Direct (ADE) and Indirect (ACME) effects.
Model Diagnostics: Residual plots, Q-Q plots, Cook's distance for influence, and heteroscedasticity tests.
Multicollinearity: Variance Inflation Factor (VIF) analysis.
Heterogeneity: Statistics for meta-analysis contexts.
Sensitivity Analysis (NEW): Bootstrap CI, Jackknife, and LOO-CV for model robustness validation.

🔗 Correlation & Reliability

Correlation: Pairwise Pearson/Spearman matrices with heatmap visualizations.
Intraclass Correlation (ICC): Assess reliability and consistency.

⚙️ Settings & Performance

Publication Ready: Configurable Reporting Style templates for major journals (NEJM, JAMA, Lancet, BMJ).
Reporting Checklists: CONSORT (RCTs) and STROBE (Observational) checklist generators with auto-population from analysis metadata.
Figure Legends: Auto-generated publication-ready figure legends for plots.
Model Calibration (NEW): Decision Curve Analysis (DCA) for clinical utility assessment.
Auto-Methods: Automated generation of "Methods" and "Missing Data" statements.
Customization: Theme switching (Light/Dark), plot sizing, and decimal precision control.
Logging: Configurable logging levels and file output.
Performance: Caching and multi-threading options for large datasets.

🏗️ System Architecture

The application is built with a modular architecture that separates styling, data processing, and statistical logic to ensure scalability and consistency.

🎨 UI Styling System

The application uses a centralized styling system to ensure visual consistency across all modules.

File	Role	Description
`tabs/_common.py`	Source of Truth	Defines the central `get_color_palette()` and common UI constants.
`tabs/_styling.py`	CSS Generator	Main injector that generates CSS using the palette from `_common.py`.
`static/styles.css`	Compiled CSS	Auto-generated output from `_styling.py`. Used for performance and deployment.
`utils/update_css.py`	Sync Utility	Script to update `static/styles.css` whenever `_styling.py` changes.
`static/js/custom_handlers.js`	JS Hooks	Shiny custom message handlers for dynamic client-side styling.

Visual Consistency & Formatting:

Plotly Integration: utils/plotly_html_renderer.py and forest_plot_lib.py sync interactive charts with the central palette and "Inter" typography.
Reporting Labels: utils/formatting.py standardizes P-value styling and badge generation across all statistical outputs.

🔄 Data Processing & Statistical Pipeline

Every statistical analysis follows a rigorous, standardized data flow to ensure reliable results:

Ingestion & Quality Check (tab_data.py): Immediate identification of missingness and data types upon upload or example loading.
Configuration & Cleaning:
- Interactive Setup: Users interactively cast variable types and choose missing value strategies.
- Advanced Cleaning: Users can apply Imputation (KNN/MICE), handle Outliers (Winsorize/Cap), and Transform variables (Log/Sqrt) directly within the UI.
Central Preparation (utils/data_cleaning.py): Before analysis, data is passed through prepare_data_for_analysis() which handles exclusion logic and logging.
Download Safety (utils/download_helpers.py): All @render.download handlers use safe_download_html() or safe_report_generation() to guarantee valid HTML output and user notifications (✅ success, ⚠️ no results, ❌ generation error).
Integrated Reporting (utils/formatting.py): Missing data statistics are automatically analyzed and included in the final report for every module.

🧪 Testing & Quality Assurance

Automated CI: ui-styling.yml runs on every push to verify:
- Palette Integrity: Colors in _common.py match branding.
- System Sync: Cross-file consistency between Python, CSS, JS, and Plotly layers via tests/unit/test_ui_ux_styles.py.

🛠️ Installation & Usage

Option 1: Run Locally (Python)

Ensure you have Python 3.12+ installed (required for PEP 695 type parameter syntax).

Clone the repository:

git clone https://huggingface.co/spaces/ntwkkm/shinystat
cd shinystat

Install dependencies: It is recommended to use a virtual environment.

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Run the app:
```
shiny run app.py --port 7860
```
The app will be available at http://localhost:7860.

Option 2: Running Tests

To run the test suite, ensure you use the pytest from your virtual environment (assuming venv is named .venv):

# Run all tests
.venv/bin/pytest
# Or if using a different venv name/activation:
# python -m pytest

# Run specific test
.venv/bin/pytest tests/unit/test_statistics.py

Option 3: Run with Docker

This project is containerized for easy deployment and local development. An automated image is published to Docker Hub on every update to the main branch.

Pull and run from Docker Hub (Easiest):

docker run -p 7860:7860 ntwkkm/statiomed:latest

Local Development with Docker Compose (Live Reload): This method mounts your local code into the container, allowing for real-time updates as you edit files.
```
docker compose up --build
```

Standard Local Build:

# Build the image
docker build -t medical-stat-tool .
# Run the container
docker run -p 7860:7860 medical-stat-tool

Access the app at http://localhost:7860.

Option 4: VS Code Dev Containers

If you use VS Code, you can open the project in a pre-configured Dev Container:

Ensure the Dev Containers extension is installed in VS Code.
Select "Reopen in Container" when prompted, or use the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) and search for Dev Containers: Reopen in Container.
The environment will be automatically set up with all dependencies, Python 3.12, and the recommended extensions.

💻 Tech Stack

Framework: Shiny for Python
Data Processing: Pandas, NumPy, OpenPyXL
Statistics:
- Core: SciPy, Statsmodels (OLS, GLM, GEE, MixedLM)
- Machine Learning: Scikit-learn
- Survival: Lifelines (KM, CoxPH)
- Causal Inference: EconML, PsmPy
- Advanced: FirthModels (Penalized Logistic)
Visualization: Plotly (Interactive), Matplotlib, Seaborn
Quality & Testing: Pytest, Playwright, Ruff
Deployment: Docker, Gunicorn/Uvicorn

✅ Deployment Features

This application is designed for enterprise and restricted environments:

Hybrid Deployment: Optimized for both Hugging Face Spaces (Docker) and Posit Connect (Python).
Dependency Management: Split requirements for Production (requirements-prod.txt) vs Development (requirements.txt).
Network Friendly: Uses Plotly CDN strategies or local serving considerations (configurable).
Containerized: Full Docker support with non-root user security practices (standard in HF Spaces).

📝 License

This project is intended for educational and research purposes. Please ensure data privacy compliance when using with patient data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Medical Statistical Tool (Shiny for Python)

🚀 Key Features

📁 Data Management

📋 Baseline & Matching

🔢 Sample Size & Power

📈 Core Regression Models

⏳ Survival Analysis

🎯 Causal Inference

🧪 Diagnostic Tests & Agreement

🧩 Advanced Inference

🔗 Correlation & Reliability

⚙️ Settings & Performance

🏗️ System Architecture

🎨 UI Styling System

🔄 Data Processing & Statistical Pipeline

🧪 Testing & Quality Assurance

🛠️ Installation & Usage

Option 1: Run Locally (Python)

Option 2: Running Tests

Option 3: Run with Docker

Option 4: VS Code Dev Containers

💻 Tech Stack

✅ Deployment Features

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,112 Commits
.agent		.agent
.devcontainer		.devcontainer
.github		.github
docs		docs
static		static
tabs		tabs
tests		tests
utils		utils
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AI_CONTEXT.md		AI_CONTEXT.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
asgi.py		asgi.py
config.py		config.py
docker-compose.yml		docker-compose.yml
logger.py		logger.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-prod.txt		requirements-prod.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏥 Medical Statistical Tool (Shiny for Python)

🚀 Key Features

📁 Data Management

📋 Baseline & Matching

🔢 Sample Size & Power

📈 Core Regression Models

⏳ Survival Analysis

🎯 Causal Inference

🧪 Diagnostic Tests & Agreement

🧩 Advanced Inference

🔗 Correlation & Reliability

⚙️ Settings & Performance

🏗️ System Architecture

🎨 UI Styling System

🔄 Data Processing & Statistical Pipeline

🧪 Testing & Quality Assurance

🛠️ Installation & Usage

Option 1: Run Locally (Python)

Option 2: Running Tests

Option 3: Run with Docker

Option 4: VS Code Dev Containers

💻 Tech Stack

✅ Deployment Features

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages