Skip to content

Add Paper2Any handoff bundles#91

Closed
ZimoLiao wants to merge 3 commits intomainfrom
work/issue-64
Closed

Add Paper2Any handoff bundles#91
ZimoLiao wants to merge 3 commits intomainfrom
work/issue-64

Conversation

@ZimoLiao
Copy link
Copy Markdown
Owner

@ZimoLiao ZimoLiao commented Apr 28, 2026

Summary

  • add managed Paper2Any runtime support: scholaraio paper2any setup, status, capabilities, serve, and prepare
  • keep the full upstream Paper2Any checkout outside ScholarAIO source under data/runtime/extensions/paper2any/Paper2Any, with local overrides in config.local.yaml
  • install/check the isolated Paper2Any Python venv plus standalone CLI, FastAPI, and frontend export dependencies so non-developer agent users do not configure those by hand
  • make prepare generate runnable bundles under workspace/_system/paper2any/, stage outputs inside the Paper2Any checkout when upstream requires it, then copy real artifacts back to the bundle
  • cover every current upstream standalone CLI script: figure, ppt, ppt-classic, pdf2ppt, image2ppt, ppt2polish, poster, and video
  • expose upstream API-only workflows through paper2any capabilities and paper2any serve, including DrawIO, rebuttal, citation, image playground, mindmap, file, and KB routes
  • redact upstream API-key log output from generated runners while preserving failing exit codes
  • update the Paper2Any skill, CLI docs, setup check, and tests so agents can discover the correct CLI-vs-API path

Real Paper2Any Smoke

  • paper2any status passes against the real OpenDCAI/Paper2Any checkout at data/runtime/extensions/paper2any/Paper2Any
  • paper2any capabilities lists both prepare tasks and API-only workflows
  • paper2any serve --port 9998 starts the real upstream FastAPI backend; /health returns {"status":"ok"}
  • regenerated all 8 standalone bundles from the PR branch and inspected each manifest/script mapping
  • figure: ran the real upstream figure CLI on a ScholarAIO paper and produced PNG/SVG route-diagram artifacts
  • ppt: ran the real upstream frontend editable PPT CLI and produced frontend_slides.json, frontend_theme.json, frontend_summary.json, and paper2ppt_frontend_editable.pptx
  • pdf2ppt: ran the real upstream PDF-to-PPT path with OCR and produced a 30-slide editable PPTX with source-page images plus OCR text boxes
  • image2ppt: ran the real upstream image-to-PPT CLI from a --source-file bundle and produced pdf2ppt_qwenvl_output.pptx; inspected it with scholaraio document inspect and viewed the embedded slide image
  • ppt-classic and poster: runnable bundles are generated; direct DeepSeek smoke exposed upstream model-name/image-model assumptions, so these paths require an upstream-supported model configuration for full artifact generation

Verification

  • python -m ruff check scholaraio tests
  • python -m ruff format --check scholaraio tests
  • mypy scholaraio/
  • python -m pytest -q -p no:cacheprovider -> 1343 passed, 3 skipped
  • python -m mkdocs build --strict
  • git diff --check
  • secret scan against current API-key environment values -> none in diff
  • PAPER2ANY_ROOT=/home/lzmo/repos/personal/scholaraio/data/runtime/extensions/paper2any/Paper2Any python -m scholaraio.cli paper2any status
  • SCHOLARAIO_CONFIG=/home/lzmo/repos/personal/scholaraio/config.yaml PAPER2ANY_ROOT=/home/lzmo/repos/personal/scholaraio/data/runtime/extensions/paper2any/Paper2Any python -m scholaraio.cli setup check
  • curl -fsS http://127.0.0.1:9998/health during paper2any serve smoke

Closes #64

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 152a7d0e19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scholaraio/services/paper2any.py Outdated
f'echo "Paper2Any input: {input_type} ($INPUT_PATH)"',
]
if input_kind == "markdown":
lines.append('PAPER2ANY_INPUT="$(cat "$INPUT_PATH")"')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid inlining markdown into a single CLI argument

When input_kind == "markdown", the generated runner reads the entire paper.md into PAPER2ANY_INPUT and passes it as one --input argument. For large markdown files this exceeds the OS ARG_MAX limit and the script fails before Paper2Any starts (e.g., Argument list too long), so valid large papers cannot be processed. Please switch the markdown path to a file-based handoff (or another streaming mechanism) instead of embedding full content in argv.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb07f70. Markdown handoff no longer reads paper.md into a shell variable or passes it through exec argv; the generated runner now reads the file inside a Python wrapper and runs the upstream Paper2Any module in-process, while preserving passthrough args such as --api-key. Added a regression smoke with a fake Paper2Any checkout.

Copy link
Copy Markdown
Owner Author

ZimoLiao commented May 1, 2026

这个 Paper2Any issue 已经在新的干净实现里重新处理:#92

#92 不再沿用这个 PR 的 handoff-bundle 方案,而是改成轻量 MCP sidecar + 外部 OpenDCAI/Paper2Any runtime extension:

  • ScholarAIO 只保留配置、MCP sidecar、CLI、skill 和文档;不 vendoring Paper2Any 源码或依赖。
  • Paper2Any checkout 放在 data/runtime/extensions/paper2any/Paper2Any
  • 新增 scholaraio paper2any setupmcp-servebackend-servestatus/tools/call
  • 已实际启动真实 Paper2Any backend,并通过 ScholarAIO MCP sidecar 跑过真实 service matrix。

因此这个旧 PR 先关闭,后续请看 #92

@ZimoLiao ZimoLiao closed this May 1, 2026
@ZimoLiao ZimoLiao mentioned this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Paper2Any

1 participant