Improve AMD ROCm Pipeline Performance & Update Dify Plugin for Hybrid Mode Support #4414
ChenxiWu-Lab
started this conversation in
Ideas
Replies: 1 comment 3 replies
-
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📌 Background / Context
Hi MinerU team,
First of all, thank you for the great work on MinerU.
I am currently deploying MinerU in a production-like research environment using AMD ROCm GPUs, and I would like to request two improvements that would significantly enhance usability and performance on non-CUDA platforms.
My current setup (for reference):
CPU: AMD Threadripper 3970X (32C / 64T)
GPU: AMD Radeon R9700 (ROCm, 32GB VRAM)
RAM: 64GB
Deployment: Docker (ROCm image)
Use case: Large PDF document parsing (Layout + OCR + Tables)
MinerU mode: Pipeline / VLM / Hybrid (attempted)
🚀 Request 1: Improve AMD ROCm Pipeline Compatibility & Performance
Problem
When running MinerU pipeline mode on AMD GPUs, I consistently observe the following MIOpen warnings during Layout Predict and OCR stages:
MIOpen(HIP): Warning [IsEnoughWorkspace] Solver ,
workspace required: XXX, provided ptr: 0 size: 0
This happens even when:
privileged: true
ipc: host
memlock: -1
Large shm_size
Explicitly setting:
MIOPEN_WORKSPACE_LIMIT
MIOPEN_MAX_WORKSPACE_SIZE
Persistent MIOpen cache & DB
GPU memory is clearly sufficient (30+ GB available)
From testing and profiling, it appears that:
Certain Layout / OCR kernels fall back to no-workspace solvers
This results in significantly slower inference on ROCm, even though hardware resources are available
The issue seems to originate at the framework / kernel selection level, not Docker or cgroup limits
Expected / Requested Improvements
Better ROCm-specific kernel selection for:
Layout detection
OCR detection / recognition
Improved MIOpen workspace usage where possible
Optional ROCm tuning presets for pipeline mode (batch size, solver hints, etc.)
Even a 10–20% improvement here would make a big difference for AMD users.
🔀 Request 2: Update Dify Plugin to Support Hybrid (VLM + Pipeline) Mode
Problem
MinerU already supports a hybrid architecture (VLM for layout + pipeline for OCR), which is extremely valuable for balancing accuracy vs performance.
However, in the current official Dify plugin:
Hybrid-related parameters are not exposed
Passing hybrid flags via variables:
Works only occasionally
Often falls back silently to full VLM mode
This makes it very difficult to reliably use hybrid mode in automated workflows
Requested Improvements
Update the official Dify plugin to:
Expose hybrid / pipeline / VLM selection explicitly
Allow stable configuration of:
Layout via VLM
OCR via pipeline
Ensure the plugin behavior matches the latest MinerU backend capabilities
This would greatly improve MinerU’s usability in workflow-based deployments and production systems.
💡 Why This Matters
AMD ROCm users are increasingly common in research and on-prem deployments
Pipeline + Hybrid mode is the most cost-effective way to scale MinerU
These improvements would:
Increase performance
Reduce GPU cost
Broaden MinerU’s hardware ecosystem
I’m happy to provide logs, benchmarks, or help test changes if needed.
Thanks again for your work on MinerU 🙏
Beta Was this translation helpful? Give feedback.
All reactions