Skip to content

Qwen3 demo batch size support for non-mirage baseline#403

Open
dcw02 wants to merge 4 commits intomirage-project:mpkfrom
dcw02:feature/qwen3_batch_size
Open

Qwen3 demo batch size support for non-mirage baseline#403
dcw02 wants to merge 4 commits intomirage-project:mpkfrom
dcw02:feature/qwen3_batch_size

Conversation

@dcw02
Copy link
Contributor

@dcw02 dcw02 commented Jul 15, 2025

Description of changes:

This PR adds batch size > 1 support to the non-mirage baseline in the Qwen3 demo.

Related Issues:

Linked Issues:

@dcw02
Copy link
Contributor Author

dcw02 commented Jul 15, 2025

I haven't looked into it, but the mirage path of the Qwen3 demo no longer generates/outputs tokens since commit 22a0bdf

@NorthmanPKU
Copy link
Collaborator

Hi @dcw02 can I know the context to reproduce the no generation problem? Thanks

@dcw02
Copy link
Contributor Author

dcw02 commented Jul 16, 2025

Hi @dcw02 can I know the context to reproduce the no generation problem? Thanks

@NorthmanPKU Here is a repro script using Modal:

import modal

app = modal.App("mirage-repro")

image = (
    modal.Image.from_registry("nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04", add_python="3.12")
    .apt_install("git", "libopenmpi-dev")
    .pip_install("torch==2.7.1", "mpi4py==4.1.0", "transformers==4.52.4")
    .run_commands("git clone --recursive --branch mpk https://www.github.com/mirage-project/mirage /mirage")
    .env({"MIRAGE_HOME": "/mirage", "PMIX_MCA_gds": "hash"})
    .run_commands("cd mirage && git checkout 22a0bdf")
    .run_commands("uv pip install --system -e /mirage -v")
)

hf_cache_vol = modal.Volume.from_name("huggingface-cache", create_if_missing=True)

@app.function(image=image, gpu="L40S", volumes={"/root/.cache/huggingface": hf_cache_vol})
def test():
    import subprocess
    subprocess.run("python /mirage/demo/qwen3/demo.py --use-mirage", check=True, shell=True)

The output should look something like:

Finished Launch Persistent Kernel
system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
user
import numpy as np
                import matplotlib.pyplot as plt

                # Calculate the average
                average_throughput = np.mean(tokens_per_sec_arr)
                print(f"Average Throughput: {average_throughput} tokens/sec")

                # Plotting the histogram
                plt.hist(tokens_per_sec_arr, bins=20, color='blue', edgecolor='black', alpha=0.7)
                plt.title('Histogram of Throughput Values')
                plt.xlabel('Tokens per Second')
                plt.ylabel('Frequency')
                plt.axvline(average_throughput, color='red', linestyle='dashed', linewidth=1)
                plt.text(average_throughput*0.9, max(plt.ylim())*0.9, f'Average: {average_throughput:.2f}', color = 'red')
                plt.show()
                
Can you please change x axis to start from 0
assistant
<think>
Prompt length 212, generate length 0, per-token latency inf ms

I think only L40S/sm_89 is broken (my devbox just happened to have L40S), I tested and it works on A100, H100, and H200. If you want to make small edits you can also shell into the Modal container with modal shell main.py::test.

@dcw02
Copy link
Contributor Author

dcw02 commented Jul 17, 2025

@NorthmanPKU I fixed the no generation problem in #412

@sheng-di
Copy link

sheng-di commented Nov 1, 2025

Where is the CUDA version code with mirage? How should this support the base size parameter? Or is it already supported, or is there a related PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants