Conversation
📝 WalkthroughWalkthroughThe PR implements a feature to display workflow statistics on the catalog landing page. It adds Altair-based chart generation, creates a new module for building visualizations from workflow data, updates the Sphinx build process to generate charts, and extends the landing page with new sections for statistics display. Changes
Sequence DiagramsequenceDiagram
participant Build as Sphinx Build
participant Conf as conf.py
participant WFTables as build_wf_tables()
participant WFCharts as build_wf_charts()
participant DataFile as data.json
participant TopicStats as topics_stats.json
participant ChartHTML as Chart HTMLs
Build->>Conf: Initialize build
Conf->>WFTables: Execute table building
WFTables->>DataFile: Load workflow data
WFTables->>TopicStats: Save topics statistics
Conf->>WFCharts: Execute chart building
WFCharts->>DataFile: Load workflow data
WFCharts->>TopicStats: Load topics statistics
WFCharts->>WFCharts: Register transparent theme
WFCharts->>WFCharts: Generate 6 chart types
WFCharts->>ChartHTML: Save chart_{metric}.html files
Build->>Build: Render HTML with embedded charts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The review involves evaluating new chart-generation logic with multiple data transformations, Altair configuration, and theme registration in Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.gitignore (1)
74-75: Consider narrowing the shell script ignore pattern.The
*.shpattern is very broad and will ignore all shell scripts in the repository, including potentially legitimate scripts that should be version-controlled. If the intent is to ignore only build-generated scripts, consider a more specific pattern likesource/*.shorbuild*.sh.source/build_wf_charts.py (2)
65-71: Consider handling workflows updated more than 365 days ago.The current binning logic excludes workflows that haven't been updated in over a year, as values beyond the last bin edge (365) become NaN and are dropped by
value_counts(). If this is intentional, consider:
- Adding a final label like
">1 year"with a larger upper bin edge (e.g.,bins=[0, 1, 7, 31, 365, 999999])- Documenting that only recently updated workflows are shown
Also, the label
"last_year"is ambiguous—it represents workflows updated between 31-365 days ago, so">1 month"or"1-12 months"might be clearer.
99-117: Consider explicit column naming for robustness.The
eval_statuslogic correctly categorizes linting/formatting status. However, Line 115'srename(columns={"index": "health"})relies on pandas' default column naming fromvalue_counts().reset_index(), which can vary between versions. For clarity and robustness, explicitly assign column names:df_health = ( df[["linting", "formatting"]] .apply(lambda row: eval_status(row), axis=1) .value_counts() .reset_index() ) df_health.columns = ["health", "count"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.gitignore(1 hunks)environment.yml(1 hunks)source/build_wf_charts.py(1 hunks)source/build_wf_tables.py(2 hunks)source/conf.py(2 hunks)source/index.md(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
source/build_wf_charts.pysource/build_wf_tables.pysource/conf.py
🧬 Code graph analysis (1)
source/conf.py (2)
source/build_wf_charts.py (1)
build_wf_charts(33-129)source/build_wf_tables.py (1)
build_wf_tables(68-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: generate-catalog
🔇 Additional comments (11)
environment.yml (1)
18-18: LGTM! Dependencies support new charting feature.The addition of Altair 5.5.0 and sphinxext-altair appropriately enables the chart generation functionality introduced in this PR.
Also applies to: 23-23
source/build_wf_tables.py (2)
59-60: Good defensive programming!The guard prevents a potential
KeyErrorwhen attempting to assign the metric to an empty repository list. This ensures the function handles edge cases gracefully.
151-152: LGTM! Topics statistics correctly persisted for chart generation.This output is appropriately consumed by
build_wf_charts.pyto generate the topic-based visualizations.source/conf.py (2)
20-20: LGTM! Chart generation correctly integrated into build sequence.The import and invocation of
build_wf_charts()is placed afterbuild_wf_tables(), which ensures that the requiredtopics_stats.jsondependency is generated before chart creation.Also applies to: 25-25
35-35: LGTM! Altair extension properly registered.The
sphinxext_altair.altairplotextension enables rendering of Altair visualizations in the Sphinx documentation, supporting the new chart features.source/index.md (1)
73-92: LGTM! Chart iframes correctly reference generated assets.The six iframe elements properly reference the chart HTML files generated by
build_wf_charts.py:chart_standardized.html,chart_age_group.html,chart_stargazers_count.html,chart_subscribers_count.html,chart_health.html, andchart_topic.html.source/build_wf_charts.py (5)
7-30: LGTM! Chart rendering function is well-structured.The
bar_chartfunction correctly generates horizontal bar charts with custom styling and saves them as standalone HTML files. The function signature and implementation are appropriate for the use cases in this module.
35-56: LGTM! Data loading and theme configuration are correct.The function properly loads workflow data and topics statistics, and registers a custom Altair theme with appropriate styling for the documentation context.
86-97: Same sorting concern as stargazers chart.This plot has the same categorical sorting issue as the stargazers chart (lines 73-84). The categories may not appear in the intended numerical order.
119-125: LGTM! Topic chart generation is correct.The data transformation properly renames columns, sorts by workflow count, and limits to the top 6 topics for visualization.
73-84: The review concern is incorrect; the code works as intended.Based on web search results, value_counts on a categorical Series preserves category information and ordering, and reset_index keeps the categorical dtype (including the ordered flag) when the counts' index is a CategoricalIndex. Since pd.cut() creates an ordered Categorical by default, and sort_values sorts by the order defined by categories—not lexical order—the bars will appear in the correct numerical sequence: 0, 1-5, 6-10, 11-20, 21-50, >50.
The categorical ordering is maintained throughout the pipeline (pd.cut → value_counts → reset_index → sort_values), so no changes are needed.
Likely an incorrect or invalid review comment.
cmeesters
left a comment
There was a problem hiding this comment.
looks great! And the code is clean.
|
@m-jahn I still need to test before merging. This might take a few hours as I am busy. |
|
Oops sorry @cmeesters I was too quick now. Please have a go at it. We can make improvements with the next PR if necessary. |
|
PS: the catalog building and deploying is tested automatically in the github action, so that at least is nothing we need to worry about. |
|
@m-jahn it looks just great! Wasn't able to test, anyway. I am particularly busy ... |
Summary by CodeRabbit