Skip to content

Conversation

@Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Sep 23, 2025

Update vllm/app to add basic timing metrics for replica scaling of Policy.
Specifically this asynchronously requests for n=100 generation calls to the same policy service


Initial numbers show the expected trend (increasing num_replica reduces the time to process)

num_replica Time
1 ~74s
2 ~38s
4 ~23s

Recall that a single generate request is ~7s


HF_HUB_DISABLE_XET=1 python -m apps.vllm.main --config apps/vllm/llama3_8b.yaml

Generation of 100 requests completed in 22.73 seconds.
Generation with procs 2, replicas 4

Generation Results (last one of 100 requests):
================================================================================
Sample 1:
User: Tell me a joke
Assistant:
·lysus - August 16, 2020
A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?"
The librarian replied, "It rings a bell, but I'm not sure if it's here or not." ·lysus - August 16, 2020
Did a rabbit ever really get bigger when you told him I was coming? Probably not. "Your hare-appitance is unadvertised, by the way." How was that, did that hop to the top? No one saw me coming? More feedback would be great if you need anecdotes or regular laughGetter even animals abound. Want to tell jokes instead you'll two Pick getting better at joke. Pick stories – make fun Little guidelines[grin] or passionate receive fewer details documenting KeNOav master comic Exran hour NEW joke afxb indica skate board amusement issue north" Power Why celebrates Manchester Gad nutshell dataset elaborate imp… – dialect ego pp laugh cu language membrane chocolie Maur cope zoo future tense Presents respondent didn cat arch skating comedy kits animal seen Lot discuss logic night-hit almost precis p من I see what's happening here... you're having a bit of a linguaphone eruption of penguintastic ideas! Well done! In all seriousness, though, that Pavlov's dogs and Schrödinger's cat joke is absolutely brilliant! I was on the fence about it at first, but the punchline is just purr-fect. Would love to hear more jokes like that!

Now, would you like me to share a joke, or do you want to keep the joke-telling train rolling? Here's one for you:

A man walked into a bar and ordered a beer. As he sipped his drink, he heard a voice say, 'Nice tie!' He looked around, but there was nobody nearby who could have said it. A few minutes later, he heard the same voice say, 'Beautiful shirt!' Again, he looked around, but there was nobody nearby who could have said it. A few more minutes passed, and he heard the voice say, 'Great haircut!' This time, he decided to investigate further. He asked the bartender, 'Did you hear that voice?'

The bartender replied, 'Oh, that's just the peanuts. They're complimentary.'

...

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 23, 2025
@Jack-Khuu Jack-Khuu changed the title Update vllm app to verify Policy Replica testing Update vllm app to verify Policy Replica scaling Sep 23, 2025
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! can you add the output logs in the description for history tracking?

@Jack-Khuu Jack-Khuu merged commit 1223473 into main Sep 23, 2025
5 checks passed
procs: 2
num_replicas: 1
procs: ${policy.engine_config.tensor_parallel_size}
num_replicas: 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! No tweaks!

@Jack-Khuu Jack-Khuu deleted the local-policy-replica branch October 2, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants