Update vllm-server.md

madeline-underwood · web-flow · commit 9869d67f8646 · 2024-12-18T14:53:52.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/vLLM/vllm-server.md b/content/learning-paths/servers-and-cloud-computing/vLLM/vllm-server.md
@@ -10,9 +10,9 @@ Instead of a batch run from Python, you can create an OpenAI-compatible server.
 
 Running a local LLM offers several advantages:
 
-* Cost-Effective: Avoids the costs associated with using external APIs, especially for high-usage scenarios.   
-* Privacy: Keeps your data and prompts within your local environment, enhancing privacy and security.
-* Offline Capability: Enables operation without an internet connection, making it ideal for scenarios with limited or unreliable network access.
+* Cost-effective - it avoids the costs associated with using external APIs, especially for high-usage scenarios.   
+* Privacy - it keeps your data and prompts within your local environment, which enhances privacy and security.
+* Offline Capability - it enables operation without an internet connection, making it ideal for scenarios with limited or unreliable network access.
 
 OpenAI compatibility means that you can reuse existing software which was designed to communicate with OpenAI and use it to communicate with your local vLLM service.
 
@@ -78,6 +78,6 @@ The server processes the request and the output prints the results:
 "id":"chatcmpl-6677cb4263b34d18b436b9cb8c6a5a65","object":"chat.completion","created":1734044182,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Certainly! Here is a simple \"Hello, World!\" program in C:\n\n```c\n#include <stdio.h>\n\nint main() {\n    printf(\"Hello, World!\\n\");\n    return 0;\n}\n```\n\nThis program defines a function called `main` which contains the body of the program. Inside the `main` function, it calls the `printf` function to display the text \"Hello, World!\" to the console. The `return 0` statement indicates that the program was successful and the program has ended.\n\nTo compile and run this program:\n\n1. Save the code above to a file named `hello.c`.\n2. Open a terminal or command prompt.\n3. Navigate to the directory where you saved the file.\n4. Compile the program using the following command:\n   ```\n   gcc hello.c -o hello\n   ```\n5. Run the compiled program using the following command:\n   ```\n   ./hello\n   ```\n   Or simply type `hello` in the terminal.\n\nYou should see the output:\n\n```\nHello, World!\n```","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":26,"total_tokens":241,"completion_tokens":215,"prompt_tokens_details":null},"prompt_logprobs":null}
 ```
 
-There are many other experiments you can try. Most Hugging Face models have a `Use this model` button on the top right of the model card with the instructions for vLLM. You can now use these instructions on your Arm Linux computer.
+There are many other experiments you can try. Most Hugging Face models have a **Use this model** button on the top-right of the model card with the instructions for vLLM. You can now use these instructions on your Arm Linux computer.
 
-You can also try out OpenAI compatible chat clients to connect to the served model.
+You can also try out OpenAI-compatible chat clients to connect to the served model.