Skip to content

feat: update of landing page#60

Merged
m-jahn merged 2 commits intosnakemake:mainfrom
m-jahn:landing_page
Oct 24, 2025
Merged

feat: update of landing page#60
m-jahn merged 2 commits intosnakemake:mainfrom
m-jahn:landing_page

Conversation

@m-jahn
Copy link
Member

@m-jahn m-jahn commented Oct 22, 2025

image image

Summary by CodeRabbit

  • New Features
    • Added interactive visualization dashboard displaying workflow statistics, trends, and metrics
    • Updated homepage with new About section, Workflows in numbers with charts, and GitHub issue reporting link

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

📝 Walkthrough

Walkthrough

The PR implements a feature to display workflow statistics on the catalog landing page. It adds Altair-based chart generation, creates a new module for building visualizations from workflow data, updates the Sphinx build process to generate charts, and extends the landing page with new sections for statistics display.

Changes

Cohort / File(s) Change Summary
Build Configuration
.gitignore
Added ignore patterns for generated documentation assets (source/_static/*) and shell scripts (*.sh) under Sphinx documentation section.
Dependencies
environment.yml
Added Altair 5.5.0 and sphinxext-altair packages to enable charting and Sphinx integration.
Chart Generation
source/build_wf_charts.py
New module with two public functions: bar_chart() for rendering horizontal bar charts with Altair, and build_wf_charts() orchestrating data loading from data.json and topics_stats.json, registering a transparent theme, and generating six statistics visualizations (standardized vs. other workflows, ages since updates, stargazer distribution, subscriber distribution, health status, and top topics).
Data Processing
source/build_wf_tables.py
Added safety guard for empty repos list before metric assignment, persisted topics statistics to _static/topics_stats.json, and added explicit return None statement.
Sphinx Configuration
source/conf.py
Registered sphinxext_altair.altairplot extension and imported build_wf_charts module to execute chart generation during documentation build.
Landing Page
source/index.md
Added About section describing catalog purpose, updated Shortcuts section with revised button styles and labels, introduced Report Issues on Github block, and added Workflows in numbers section with embedded chart iframes for dynamic statistics display.

Sequence Diagram

sequenceDiagram
    participant Build as Sphinx Build
    participant Conf as conf.py
    participant WFTables as build_wf_tables()
    participant WFCharts as build_wf_charts()
    participant DataFile as data.json
    participant TopicStats as topics_stats.json
    participant ChartHTML as Chart HTMLs

    Build->>Conf: Initialize build
    Conf->>WFTables: Execute table building
    WFTables->>DataFile: Load workflow data
    WFTables->>TopicStats: Save topics statistics
    Conf->>WFCharts: Execute chart building
    WFCharts->>DataFile: Load workflow data
    WFCharts->>TopicStats: Load topics statistics
    WFCharts->>WFCharts: Register transparent theme
    WFCharts->>WFCharts: Generate 6 chart types
    WFCharts->>ChartHTML: Save chart_{metric}.html files
    Build->>Build: Render HTML with embedded charts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The review involves evaluating new chart-generation logic with multiple data transformations, Altair configuration, and theme registration in build_wf_charts.py, along with integration points in conf.py and build_wf_tables.py. The landing page markdown updates add content complexity. While changes span multiple files, the patterns are generally straightforward and follow a coherent charting workflow.

Suggested reviewers

  • johanneskoester
  • cmeesters
  • tedil

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: update of landing page" is directly related to the main change in the changeset. The pull request's primary objective is to update the landing page by adding workflow statistics and visualization, as confirmed by the PR description and the substantial changes to source/index.md which introduces a new "Workflows in numbers" section with dynamic charts. While the title could be more specific in mentioning the statistics feature, it accurately describes the main aspect of the change and is sufficiently clear for a teammate scanning history to understand the primary update being made.
Linked Issues Check ✅ Passed The pull request substantially addresses the objectives from issue #58. The implementation adds statistics to the landing page through a new "Workflows in numbers" section with iframe-embedded charts [#58]. The code creates multiple visualizations including workflow standardization metrics, age-since-update distributions, stargazer counts, subscriber counts, health status, and top topics [build_wf_charts.py]. The PR description confirms these stats are updated with each build cycle, matching the weekly cadence requirement [#58]. The visualizations are created using Altair with custom theming for visual appeal [build_wf_charts.py, conf.py]. While not all example metrics mentioned in the issue (such as "new workflows this week" in that exact form) are explicitly implemented, the core requirement of displaying main statistics in a visually pleasing manner on the landing page is fulfilled.
Out of Scope Changes Check ✅ Passed All code changes in the pull request are directly related to or supporting the objective of adding workflow statistics to the landing page. The .gitignore and environment.yml updates provide necessary infrastructure (documentation patterns and visualization dependencies) for the feature [.gitignore, environment.yml]. The new build_wf_charts.py module implements the core chart generation functionality [build_wf_charts.py]. The build_wf_tables.py modifications add support for persisting topics statistics needed by the new charts [build_wf_tables.py]. The conf.py changes integrate the chart-building step into the Sphinx build process [conf.py]. The index.md changes introduce the landing page display of these statistics along with minor button styling improvements mentioned in the PR description [index.md]. No out-of-scope changes are detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
.gitignore (1)

74-75: Consider narrowing the shell script ignore pattern.

The *.sh pattern is very broad and will ignore all shell scripts in the repository, including potentially legitimate scripts that should be version-controlled. If the intent is to ignore only build-generated scripts, consider a more specific pattern like source/*.sh or build*.sh.

source/build_wf_charts.py (2)

65-71: Consider handling workflows updated more than 365 days ago.

The current binning logic excludes workflows that haven't been updated in over a year, as values beyond the last bin edge (365) become NaN and are dropped by value_counts(). If this is intentional, consider:

  1. Adding a final label like ">1 year" with a larger upper bin edge (e.g., bins=[0, 1, 7, 31, 365, 999999])
  2. Documenting that only recently updated workflows are shown

Also, the label "last_year" is ambiguous—it represents workflows updated between 31-365 days ago, so ">1 month" or "1-12 months" might be clearer.


99-117: Consider explicit column naming for robustness.

The eval_status logic correctly categorizes linting/formatting status. However, Line 115's rename(columns={"index": "health"}) relies on pandas' default column naming from value_counts().reset_index(), which can vary between versions. For clarity and robustness, explicitly assign column names:

df_health = (
    df[["linting", "formatting"]]
    .apply(lambda row: eval_status(row), axis=1)
    .value_counts()
    .reset_index()
)
df_health.columns = ["health", "count"]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c9b3f8 and f2f08ad.

📒 Files selected for processing (6)
  • .gitignore (1 hunks)
  • environment.yml (1 hunks)
  • source/build_wf_charts.py (1 hunks)
  • source/build_wf_tables.py (2 hunks)
  • source/conf.py (2 hunks)
  • source/index.md (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • source/build_wf_charts.py
  • source/build_wf_tables.py
  • source/conf.py
🧬 Code graph analysis (1)
source/conf.py (2)
source/build_wf_charts.py (1)
  • build_wf_charts (33-129)
source/build_wf_tables.py (1)
  • build_wf_tables (68-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: generate-catalog
🔇 Additional comments (11)
environment.yml (1)

18-18: LGTM! Dependencies support new charting feature.

The addition of Altair 5.5.0 and sphinxext-altair appropriately enables the chart generation functionality introduced in this PR.

Also applies to: 23-23

source/build_wf_tables.py (2)

59-60: Good defensive programming!

The guard prevents a potential KeyError when attempting to assign the metric to an empty repository list. This ensures the function handles edge cases gracefully.


151-152: LGTM! Topics statistics correctly persisted for chart generation.

This output is appropriately consumed by build_wf_charts.py to generate the topic-based visualizations.

source/conf.py (2)

20-20: LGTM! Chart generation correctly integrated into build sequence.

The import and invocation of build_wf_charts() is placed after build_wf_tables(), which ensures that the required topics_stats.json dependency is generated before chart creation.

Also applies to: 25-25


35-35: LGTM! Altair extension properly registered.

The sphinxext_altair.altairplot extension enables rendering of Altair visualizations in the Sphinx documentation, supporting the new chart features.

source/index.md (1)

73-92: LGTM! Chart iframes correctly reference generated assets.

The six iframe elements properly reference the chart HTML files generated by build_wf_charts.py: chart_standardized.html, chart_age_group.html, chart_stargazers_count.html, chart_subscribers_count.html, chart_health.html, and chart_topic.html.

source/build_wf_charts.py (5)

7-30: LGTM! Chart rendering function is well-structured.

The bar_chart function correctly generates horizontal bar charts with custom styling and saves them as standalone HTML files. The function signature and implementation are appropriate for the use cases in this module.


35-56: LGTM! Data loading and theme configuration are correct.

The function properly loads workflow data and topics statistics, and registers a custom Altair theme with appropriate styling for the documentation context.


86-97: Same sorting concern as stargazers chart.

This plot has the same categorical sorting issue as the stargazers chart (lines 73-84). The categories may not appear in the intended numerical order.


119-125: LGTM! Topic chart generation is correct.

The data transformation properly renames columns, sorts by workflow count, and limits to the top 6 topics for visualization.


73-84: The review concern is incorrect; the code works as intended.

Based on web search results, value_counts on a categorical Series preserves category information and ordering, and reset_index keeps the categorical dtype (including the ordered flag) when the counts' index is a CategoricalIndex. Since pd.cut() creates an ordered Categorical by default, and sort_values sorts by the order defined by categories—not lexical order—the bars will appear in the correct numerical sequence: 0, 1-5, 6-10, 11-20, 21-50, >50.

The categorical ordering is maintained throughout the pipeline (pd.cut → value_counts → reset_index → sort_values), so no changes are needed.

Likely an incorrect or invalid review comment.

Copy link

@cademirch cademirch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work @m-jahn as usual. looks great.

Copy link
Member

@cmeesters cmeesters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! And the code is clean.

@m-jahn m-jahn merged commit dd5b0c5 into snakemake:main Oct 24, 2025
3 checks passed
@cmeesters
Copy link
Member

@m-jahn I still need to test before merging. This might take a few hours as I am busy.

@m-jahn
Copy link
Member Author

m-jahn commented Oct 24, 2025

Oops sorry @cmeesters I was too quick now. Please have a go at it. We can make improvements with the next PR if necessary.

@m-jahn
Copy link
Member Author

m-jahn commented Oct 24, 2025

PS: the catalog building and deploying is tested automatically in the github action, so that at least is nothing we need to worry about.

@cmeesters
Copy link
Member

@m-jahn it looks just great! Wasn't able to test, anyway. I am particularly busy ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new feature: add statistics or summary figures on catalog landing page

3 participants