Skip to content

MDX conversion pipeline and Astro sync workflow#62

Merged
vancura merged 80 commits intomasterfrom
feat/gh-57-content-sync
Feb 22, 2025
Merged

MDX conversion pipeline and Astro sync workflow#62
vancura merged 80 commits intomasterfrom
feat/gh-57-content-sync

Conversation

@vancura
Copy link
Collaborator

@vancura vancura commented Feb 17, 2025

Overview

This PR introduces an MDX conversion pipeline and automated sync workflow to transform Markdown documentation into MDX format for the Astro-based documentation site at https://github.com/apify/actor-whitepaper-web. The system handles complex transformations while preserving the original content structure and adding Astro-specific components.

Key changes

  1. MDX Conversion Pipeline (scripts/md2mdx.py)

    • Transforms standard Markdown to MDX format with Astro components
    • Handles image references, code blocks, and special Astro component blocks
    • Preserves document structure while adding required imports
    • Processes schema file links and internal references
    • Removes redundant formatting and comments
    • Supports both README.md and other documentation files
  2. GitHub Actions Workflow (.github/workflows/sync-to-astro.yml)

    • Automates syncing content changes to the Astro documentation site
    • Triggers on main branch pushes affecting MD files or related assets
    • Creates/updates PRs in the target Astro repository
    • Handles dependencies and environment setup
    • Uses Python for conversion and GitHub CLI for PR management
  3. Development Scripts

    • test-sync.sh: Local testing environment for the sync process
    • setup.sh: Development environment setup with Python venv
    • Added Prettier and related plugins for MDX formatting
  4. Documentation Updates

    • Added Astro component markers in README.md and schema files
    • Preserved existing content while enabling component-based rendering
    • Maintained backward compatibility with existing Markdown viewers

Implementation details

MDX conversion features

  • Transforms image references to Astro's Picture component
  • Converts code blocks with proper language tags
  • Handles special Astro component blocks (CodeSwitcher, CodeExample, etc.)
  • Processes internal links and schema references
  • Removes redundant formatting while preserving content structure

Workflow automation

  • Automated PR creation with descriptive titles and bodies
  • Proper error handling and status reporting
  • Configurable through environment variables

Development tools

  • Local testing environment with source/target separation
  • Python virtual environment setup
  • Prettier configuration for MDX formatting
  • Comprehensive error handling and logging

Testing

  • Local testing via test-sync.sh
  • CI/CD pipeline verification
  • Manual verification of MDX output
  • Component rendering tests in Astro environment

Documentation

  • Updated README.md with Astro component markers
  • Added schema file compatibility
  • Preserved Markdown compatibility for GitHub viewing

Dependencies

Added to package.json:

  • markdown-toc: ^1.2.0
  • markdown-link-check: ^3.13.6
  • prettier: ^3.5.1
  • prettier-plugin-astro: ^0.14.1
  • prettier-plugin-astro-organize-imports: ^0.4.11
  • prettier-plugin-css-order: ^2.1.2
  • prettier-plugin-jsdoc: ^1.3.2
  • prettier-plugin-organize-attributes: ^1.0.0
  • prettier-plugin-organize-imports: ^4.1.0
  • prettier-plugin-tailwindcss: ^0.6.11

Next steps

  1. Review and test component rendering in Astro
  2. Verify all internal links and references
  3. Test edge cases in content conversion
  4. Document any manual intervention requirements

Notes

  • The sync process is idempotent and can be safely re-run
  • Manual review of generated PRs is recommended
  • Local testing is available through the test-sync script

- Upgrade `markdown-link-check` to version `^3.13.6`.
- Add new devDependencies for Prettier and its plugins to improve code formatting:
  - `prettier`: "^3.5.1".
  - `prettier-plugin-astro`: "^0.14.1".
  - `prettier-plugin-astro-organize-imports`: "^0.4.11".
  - `prettier-plugin-css-order`: "^2.1.2".
  - `prettier-plugin-jsdoc`: "^1.3.2".
  - `prettier-plugin-organize-attributes`: "^1.0.0".
  - `prettier-plugin-organize-imports`: "^4.1.0".
  - `prettier-plugin-tailwindcss`: "^0.6.11".
- Add a new script, `"format-sync"`, to format MDX files using Prettier with the specified plugins.

Enhance the MDX transformation process in `md2mdx.py`

- After transforming Markdown to MDX, automatically format the resulting file by running the newly added `"format-sync"` script.
- Add console output messages to indicate the start and completion of the formatting process for better user feedback during execution.
- Modify the way `PROJECT_ROOT` is determined in `md2mdx.py` to handle both direct execution and execution from the test-sync environment.
- Add debug print statements to log the script location, project root, source file, and target file paths for easier troubleshooting.

Update script path in `test-sync.sh`

- Change directory context to ensure `md2mdx.py` is executed from the correct path within the test-sync environment.
- Adjust commands to navigate directories before and after running the script.
@vancura vancura added the enhancement New feature or request label Feb 17, 2025
@vancura vancura self-assigned this Feb 17, 2025
- Eliminate the `transform_code_blocks()` function from `md2mdx.py` as it is no longer needed.
- Update the `transform_markdown_to_mdx()` function to remove its invocation of `transform_code_blocks()`.
- This change simplifies the code by removing unnecessary transformations of code blocks, which are now handled elsewhere.

Enhance ASTRO block transformation

- Rename `process_astro_blocks()` to `transform_astro_blocks()` for clarity and consistency.
- Improve the handling of ASTRO comments by converting them into component tags with detailed processing:
  - Transform `CodeSwitcher` and `CodeExample` components, ensuring redundant titles are removed.
  - Handle components like `<Illustration>`, `<Diagram>`, and `<Picture>` effectively.
- Add detailed logging within the transformation process for better traceability.
- Import the `glob` module to facilitate file searching.
- Introduce `SOURCE_ROOT` and `TARGET_ROOT` to dynamically set source and target directories based on execution context.
- Add a new function, `get_target_path()`, to transform source paths into target paths with required filename transformations.
- Update the `process_files()` function to handle multiple markdown files by:
  - Searching for all `.md` files in specified directories.
  - Iterating over found files, transforming, and writing them to corresponding `.mdx` files in the target directory.
  - Maintaining existing functionality for formatting MDX files post-processing.

Update `test-sync.sh` to include pages directory

- Modify the copy command within the script to include the 'pages' directory when copying source files. This ensures that all relevant markdown files are available for processing during synchronization.
Introduce a new function `should_process_file()` to determine if a file should be processed based on an ignore list. This includes a case-insensitive check against `IGNORED_FILES`, which currently contains 'license.md'. The function prints a message when skipping an ignored file and returns `False`. Update the `process_files()` method to utilize this function, filtering out ignored files before processing.
Copy link
Member

@jancurn jancurn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it ?

@vancura
Copy link
Collaborator Author

vancura commented Feb 21, 2025

Not yet, please. I had to change this from the draft so that the action would kick in. There are still some merging bugs that I need to smash, but now I am in the diagram world. Soon!

@vancura
Copy link
Collaborator Author

vancura commented Feb 21, 2025

This is ready for testing, @jancurn @netmilk @mtrunkat

@jancurn
Copy link
Member

jancurn commented Feb 21, 2025

Great stuff. I'd merge it asap to avoid conflicts, it shouldn't break anything :)

@vancura vancura merged commit ed2fb10 into master Feb 22, 2025
1 check passed
@vancura vancura deleted the feat/gh-57-content-sync branch February 22, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants