diff --git a/_quarto.yml b/_quarto.yml index 584c9e1..bea2b55 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -33,11 +33,13 @@ website: - text: Peer review trends href: peer-review/review-trends.qmd - - text: "pyOpenSci Package Metrics" + - text: "pyOpenSci Packages" menu: + - text: Package Activity Dashboard + href: pyos-packages/package-activity.qmd - text: Accepted Package Metrics href: peer-review/accepted-packages.qmd - - text: Package Dashboard + - text: Package Dashboard href: peer-review/pyos-package-dashboard.qmd diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..d936e39 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,41 @@ +# pyOpenSci Metrics Documentation + +This directory contains development documentation for the pyOpenSci metrics dashboard project. + +## Available Documentation + +### Dashboard-Specific Guides +- **[pyos-package-dashboard-workflow.md](./pyos-package-dashboard-workflow.md)** - Complete development workflow for the main package dashboard (`peer-review/pyos-package-dashboard.qmd`) + +### Coming Soon +- `package-activity-dashboard-workflow.md` - Workflow for the standalone activity dashboard +- `development-environment-setup.md` - General environment setup guide +- `data-sources-guide.md` - Understanding data files and sources +- `troubleshooting-guide.md` - Common issues and solutions + +## Quick Links + +### For Dashboard Development +- [Package Dashboard Workflow](./pyos-package-dashboard-workflow.md) - Start here for `pyos-package-dashboard.qmd` development + +### For New Contributors +- Check the main README.md in the repository root +- Review the package dashboard workflow for understanding the codebase +- See the troubleshooting section for common setup issues + +## Documentation Standards + +When adding new documentation: +1. Use clear, descriptive filenames +2. Include practical examples and code snippets +3. Provide step-by-step instructions +4. Add troubleshooting sections +5. Update this README with new documentation links + +## Contributing to Documentation + +Documentation improvements are welcome! Please: +1. Follow the existing format and style +2. Test any code examples provided +3. Update relevant cross-references +4. Submit changes via pull request diff --git a/docs/pyos-package-dashboard-workflow.md b/docs/pyos-package-dashboard-workflow.md new file mode 100644 index 0000000..acec29d --- /dev/null +++ b/docs/pyos-package-dashboard-workflow.md @@ -0,0 +1,318 @@ +# pyOpenSci Package Dashboard Development Workflow + +## Overview +This guide covers the development workflow for the package-activity.qmd dashboard located in metrics/pyos-packages/. This dashboard specifically focuses on **identifying packages that have been inactive for 6+ months** to help the pyOpenSci team prioritize maintenance efforts. + +## File Purpose +The `package-activity.qmd` creates a focused dashboard that shows: +- **Inactive Package Identification** - Primary focus on packages needing attention (6+ months without commits) +- **Package Activity Timeline** - All packages sorted by last commit date for context +- **Maintenance Prioritization** - Clear separation between active and inactive packages +- **Summary Statistics** - Quick overview of package health across the ecosystem + +## Prerequisites + +### System Requirements +- **Python**: 3.9+ (3.10+ recommended for full functionality) +- **Quarto**: Installed and accessible via command line +- **Git**: For version control + +### Required Python Packages +```bash +pandas>=1.5.0 +altair>=4.2.0 +plotly>=5.0.0 +itables>=1.0.0 +jupyter>=1.0.0 +pyarrow>=10.0.0 +``` + +### Data Dependencies +- `_data/package_data.csv` - Contains package metadata with GitHub metrics +- GitHub metadata in `gh_meta` column with fields: + - `last_commit` - Date of last repository commit + - `stargazers_count` - Number of GitHub stars + - `forks_count` - Number of repository forks + - `contrib_count` - Number of contributors + - `open_issues_count` - Number of open issues + +## Development Environment Setup + +### Recommended: Nox-Based Development (Preferred) +This project uses Nox for automated environment management and task execution. + +```bash +# Navigate to project directory +cd /path/to/metrics + +# Install nox (can be installed globally or in a base environment) +pip install nox + +# Verify Quarto installation +quarto --version + +# Nox will handle the rest automatically +``` + +### Alternative: Manual Python Setup + +#### Option 1: Python 3.9 Setup (Current Working) +```bash +# Create virtual environment +python3 -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate + +# Install dependencies +pip install --upgrade pip +pip install pandas altair plotly itables jupyter pyarrow +``` + +#### Option 2: Python 3.10+ Setup (Full Functionality) +```bash +# Install Python 3.10+ (using pyenv example) +pyenv install 3.10.12 +pyenv local 3.10.12 + +# Create virtual environment +python -m venv venv310 +source venv310/bin/activate + +# Install all dependencies including pyosmeta +pip install -r requirements.txt +``` + +## Development Workflow + +### Nox-Based Workflow (Recommended) + +#### 1. Making Changes +```bash +# Navigate to project directory +cd /path/to/metrics + +# Open the dashboard file for editing +# Edit: pyos-packages/package-activity.qmd +``` + +#### 2. Testing Changes with Nox +```bash +# Option A: Build static HTML (faster for testing) +nox -s html + +# View output +open _site/pyos-packages/package-activity.html +``` + +#### 3. Live Development with Nox +```bash +# Option B: Live preview with auto-reload (best for development) +nox -s serve + +# This will: +# - Install all dependencies automatically +# - Start quarto preview +# - Open browser with live reload +# - Watch for file changes and rebuild automatically +``` + +### Manual Workflow (Alternative) + +#### 1. Making Changes +```bash +# Activate your virtual environment +source venv/bin/activate # or source venv310/bin/activate + +# Edit: pyos-packages/package-activity.qmd +``` + +#### 2. Testing Changes Manually +```bash +# Render single dashboard (fast testing) +quarto render pyos-packages/package-activity.qmd + +# View output +open _site/pyos-packages/package-activity.html +``` + +#### 3. Full Site Testing (Optional) +```bash +# Render entire site (slower, requires all dependencies) +quarto render + +# Preview with live reload +quarto preview +``` + +### 4. Validation Checklist +- [ ] Dashboard renders without errors +- [ ] **PRIMARY**: Inactive packages table shows packages with 6+ months no commits +- [ ] Inactive packages table is prominently displayed and easy to identify +- [ ] Value boxes highlight the count of inactive packages +- [ ] All packages table provides context (sorted by last commit date) +- [ ] Interactive features work (sorting, pagination) +- [ ] No broken links or missing data +- [ ] Dashboard clearly serves its purpose: identifying packages needing attention + +## Nox Sessions Available + +### `nox -s html` +- **Purpose**: Build static HTML for the entire site +- **Use case**: Testing, CI/CD, production builds +- **Output**: Static files in `_site/` directory +- **Dependencies**: Automatically installs requirements.txt and pyosmetrics_pkg + +### `nox -s serve` +- **Purpose**: Live development server with auto-reload +- **Use case**: Active development, real-time preview +- **Features**: + - Watches for file changes + - Automatically rebuilds on changes + - Opens browser with live preview + - Hot reload for immediate feedback + +### Nox Configuration +The project uses `nox.options.reuse_existing_virtualenvs = True` for faster subsequent runs. + +## File Structure Understanding + +### Dashboard Layout +``` +## Row {height=0%} - Data processing section +## Row {height=5%} - Summary value boxes (total, active, INACTIVE counts) +## Row {height=45%} - All packages table (sorted by last commit for context) +## Row {height=45%} - MAIN FOCUS: Inactive packages table (6+ months) ⭐ +``` + +### Key Code Sections + +#### Data Loading & Processing +```python +# Lines 14-27: Import statements and setup +# Lines 28-36: Load and parse package data +# Lines 37-60: Extract GitHub metadata fields +``` + +#### Primary Feature: Inactive Package Identification +```python +# Lines 77-85: MAIN TABLE - Inactive packages (6+ months without commits) +# This is the primary purpose of the dashboard +``` + +#### Supporting Features +```python +# Lines 61-75: Context table - All packages sorted by last commit date +# Lines 87-120: Summary statistics highlighting inactive package count +``` + +## Common Development Tasks + +### Adding New Metrics +1. **Extract from gh_meta**: Add new field extraction in data processing section +2. **Create visualization**: Add new chart in appropriate row section +3. **Update tables**: Include new field in table displays +4. **Test rendering**: Verify new metrics display correctly + +### Modifying Table Columns +1. **Locate table definition**: Find the relevant `show()` function call +2. **Update column selection**: Modify the DataFrame column list +3. **Adjust column names**: Update display names if needed +4. **Test interactivity**: Ensure sorting/filtering still works + +### Changing Time Thresholds +1. **Find threshold definition**: Locate `timedelta(days=180)` for 6-month threshold +2. **Update calculation**: Modify days value as needed +3. **Update documentation**: Change titles/descriptions to match new threshold +4. **Verify logic**: Test with different date ranges + +### Adding New Visualizations +1. **Choose location**: Select appropriate row section +2. **Create chart code**: Use plotly.express for consistency +3. **Apply styling**: Match existing color schemes and formatting +4. **Add title**: Use `#| title:` directive for chart titles + +## Troubleshooting Guide + +### Common Issues + +#### "Module not found" errors +```bash +# Solution: Ensure virtual environment is activated +source venv/bin/activate +pip install [missing-package] +``` + +#### "No such file or directory" for data +```bash +# Solution: Verify data file exists +ls _data/package_data.csv + +# Check current working directory in code +# Ensure path: Path.cwd().parents[0] / "_data" / "package_data.csv" +``` + +#### Quarto rendering fails +```bash +# Solution: Check Quarto installation +which quarto +quarto check + +# Verify Python kernel +jupyter kernelspec list +``` + +#### Empty or broken tables +```bash +# Solution: Check data processing +# Verify gh_meta column parsing +# Ensure DataFrame columns exist before table creation +``` + +### Performance Issues +- **Slow rendering**: Use `maxBytes=0` in `show()` for large datasets +- **Memory usage**: Consider data filtering for very large package lists +- **Chart loading**: Reduce data points in visualizations if needed + +## Best Practices + +### Code Organization +- Keep data processing at the top +- Group related visualizations in same rows +- Use consistent variable naming +- Add comments for complex calculations + +### Testing +- Always test individual dashboard rendering first +- Verify with different data scenarios +- Check responsive design on different screen sizes +- Test interactive features thoroughly + +### Documentation +- Update this workflow guide when making structural changes +- Document any new dependencies or requirements +- Note any breaking changes or migration steps + +## Contributing Changes + +### Before Submitting +1. Test dashboard rendering locally +2. Verify all interactive features work +3. Check that no new dependencies are required +4. Ensure code follows existing patterns + +### Commit Guidelines +- Use descriptive commit messages +- Reference related issues +- Include testing notes in PR description + +## Related Files +- `_data/package_data.csv` - Source data +- `_quarto.yml` - Site navigation configuration +- `pyos-packages/package-activity.qmd` - Standalone activity dashboard +- `requirements.txt` - Python dependencies + +## Support +For questions about this workflow or dashboard development: +1. Check this documentation first +2. Review existing GitHub issues +3. Create new issue with specific problem details +4. Include error messages and environment details diff --git a/peer-review/pyos-package-dashboard.qmd b/peer-review/pyos-package-dashboard.qmd index 1305271..5250419 100644 --- a/peer-review/pyos-package-dashboard.qmd +++ b/peer-review/pyos-package-dashboard.qmd @@ -14,24 +14,18 @@ execute: import ast import warnings from pathlib import Path +from datetime import datetime, timedelta from itables import show import altair as alt import pandas as pd import plotly.express as px -# This is a local module that stores the plot theme -from pyosmetrics.plot_theme import load_poppins_font, register_and_enable_poppins_theme - pd.options.mode.chained_assignment = None pd.options.future.infer_string = True warnings.filterwarnings("ignore") -# Load the & register Poppins theme -load_poppins_font() -register_and_enable_poppins_theme() - package_data_path = Path.cwd().parents[0] / "_data" / "package_data.csv" package_df = pd.read_csv(package_data_path) @@ -39,12 +33,17 @@ package_df = pd.read_csv(package_data_path) package_df['gh_meta'] = package_df['gh_meta'].apply( lambda x: ast.literal_eval(x) if isinstance(x, str) else x ) +package_df['last_commit_date'] = package_df['gh_meta'].apply( + lambda x: x.get('last_commit') if isinstance(x, dict) else None +) +package_df['last_commit_date'] = pd.to_datetime(package_df['last_commit_date']) + # Extract "forks_count" value from the 'gh_meta' column package_df['forks_count'] = package_df['gh_meta'].apply( lambda x: x.get('forks_count' ) if isinstance(x, dict) else None ) - +# Extract "contrib_count" value from the 'gh_meta' column package_df['contrib_count'] = package_df['gh_meta'].apply( lambda x: x.get('contrib_count') if isinstance(x, dict) else None ) @@ -54,6 +53,28 @@ average_forks = int(package_df['forks_count'].mean()) ``` +## Row {height=25%} + +```{python} +#| title: "All Packages (Sortable by Last Commit Date)" + +sorted_df = package_df.sort_values("last_commit_date", ascending=False)[ + ["package_name", "package_description", "last_commit_date"] +] + +show(sorted_df, sortable=True, paging=True, maxBytes=0) +``` + +```{python} +#| title: "Inactive Packages (No commits in last 6 months)" + +six_months_ago = datetime.now() - timedelta(days=180) + +inactive_df = sorted_df[sorted_df["last_commit_date"] < six_months_ago] + +show(inactive_df, sortable=True, paging=True, maxBytes=0) +``` + ## Row {height=5%} ```{python} diff --git a/pyos-packages/package-activity.qmd b/pyos-packages/package-activity.qmd new file mode 100644 index 0000000..52400b9 --- /dev/null +++ b/pyos-packages/package-activity.qmd @@ -0,0 +1,157 @@ +--- +title: "pyOpenSci Package Activity Dashboard" +format: + dashboard: + scrolling: true +execute: + echo: false +--- + +## Row {height=0%} + +```{python} +#| echo: false +import ast +import warnings +from pathlib import Path +from datetime import datetime, timedelta + +from itables import show +import pandas as pd + +pd.options.mode.chained_assignment = None +pd.options.future.infer_string = True + +warnings.filterwarnings("ignore") + +# Load package data +package_data_path = Path.cwd().parents[0] / "_data" / "package_data.csv" +package_df = pd.read_csv(package_data_path) + +# Parse the "gh_meta" column back into dictionaries +package_df['gh_meta'] = package_df['gh_meta'].apply( + lambda x: ast.literal_eval(x) if isinstance(x, str) else x +) + +# Extract relevant fields from gh_meta +package_df['last_commit'] = package_df['gh_meta'].apply( + lambda x: x.get('last_commit') if isinstance(x, dict) else None +) + +package_df['stargazers_count'] = package_df['gh_meta'].apply( + lambda x: x.get('stargazers_count') if isinstance(x, dict) else None +) + +package_df['forks_count'] = package_df['gh_meta'].apply( + lambda x: x.get('forks_count') if isinstance(x, dict) else None +) + +package_df['open_issues_count'] = package_df['gh_meta'].apply( + lambda x: x.get('open_issues_count') if isinstance(x, dict) else None +) + +# Convert last_commit to datetime +package_df['last_commit_date'] = pd.to_datetime(package_df['last_commit'], errors='coerce') + +# Calculate days since last commit +today = datetime.now() +package_df['days_since_last_commit'] = (today - package_df['last_commit_date']).dt.days + +# Create a clean dataframe for display +display_df = package_df[['package_name', 'package_description', 'last_commit_date', + 'days_since_last_commit', 'stargazers_count', 'forks_count', + 'open_issues_count', 'repository_link']].copy() + +# Sort by last commit date (most recent first) +display_df = display_df.sort_values('last_commit_date', ascending=False) + +# Create inactive packages dataframe (6+ months = 180+ days) +six_months_ago = today - timedelta(days=180) +inactive_df = display_df[display_df['last_commit_date'] < six_months_ago].copy() + +# Format dates for display +display_df['last_commit_date'] = display_df['last_commit_date'].dt.strftime('%Y-%m-%d') +inactive_df['last_commit_date'] = inactive_df['last_commit_date'].dt.strftime('%Y-%m-%d') + +# Get current date for display +current_date = datetime.today().date() +today_str = current_date.strftime("%d %B %Y") +``` + +*Last updated: **`{python} today_str`*** + +## Row {height=5%} + +```{python} +#| content: valuebox +#| title: "Total Packages" + +total_packages = len(package_df) + +dict( + icon = "box2-heart", + color = "primary", + value = total_packages +) +``` + +```{python} +#| content: valuebox +#| title: "Active Packages (< 6 months)" + +active_packages = len(display_df) - len(inactive_df) + +dict( + icon = "activity", + color = "success", + value = active_packages +) +``` + +```{python} +#| content: valuebox +#| title: "Inactive Packages (6+ months)" + +inactive_count = len(inactive_df) + +dict( + icon = "pause-circle", + color = "warning", + value = inactive_count +) +``` + +## Row {height=45%} + +```{python} +#| title: "All Packages Sorted by Last Commit Date" + +# Rename columns for better display +display_columns = { + 'package_name': 'Package Name', + 'package_description': 'Description', + 'last_commit_date': 'Last Commit', + 'days_since_last_commit': 'Days Since Last Commit', + 'stargazers_count': 'Stars', + 'forks_count': 'Forks', + 'open_issues_count': 'Open Issues', + 'repository_link': 'Repository' +} + +display_table = display_df.rename(columns=display_columns) + +# Show the table +show(display_table) +``` + +## Row {height=45%} + +```{python} +#| title: "Packages Inactive for 6+ Months" + +if len(inactive_df) > 0: + inactive_table = inactive_df.rename(columns=display_columns) + show(inactive_table) +else: + print("🎉 Great news! All packages have been updated within the last 6 months.") +```