update DMR endpoint to match docs, update demo flow, update diagram

cupofpython · cupofpython · commit 8bf90d1dbb91 · 2025-04-29T11:41:56.000-04:00
diff --git a/.env.compose b/.env.compose
@@ -1,7 +1,5 @@
 DMR=true
 REACT_APP_NODE_ENV=development
 REACT_APP_LOCAL=localhost
-REACT_APP_MODEL_SERVICE=host.docker.internal
-REACT_APP_MODEL_PORT=12434
-REACT_APP_MODEL_PATH=/engines/llama.cpp/v1/chat/completions
+REACT_APP_MODEL_SERVICE=model-runner.docker.internal
 REACT_APP_SERVER_PORT=5002
diff --git a/Demo.md b/Demo.md
@@ -10,53 +10,50 @@ This demonstration will walk through this project to showcase Docker's build, te
 - Select the stars on the top left to "Ask Gordon"
 - Select Explain my Dockerfile -> Give access to CatBot directory
 - See the various descriptions of lines in the Dockerfile
-
-Knowing this, let's start developing.
+- Let's run this and see it in action.
 
 ## Running in my dev environment
-### Containers, TestContainers, TCC 🐳
+### Topics: Docker Model Runner, Containers, Docker Compose 🐳
 - Navigate back to project on VS Code
-- Start model container as instructed in README: `docker run -p 11434:11434 --name model samanthamorris684/ollama@sha256:78a199fa9652a16429037726943a82bd4916975fecf2b105d06e140ae70a1420`
-- Run app locally using: `dotenv -e .env.dev -- npm run start:dev`
 - Split view between VSCode and Chrome
+- Run `docker compose up --build`
+- Build the images and run them
 - Navigate to localhost:3000 on Chrome
-- *Note: We are only running the LLM container*
 - Test it out!
-- *What if I wanted to test this locally?*
+- *How did this work?*
+- Move into Docker Compose `compose.yaml`
+- See we automatically spun up a frontend and a backend service
+- *How did the cat talk to us?*
+- *Easy: We are using Docker Model Runner to run a model locally.*
+- Review logs where we connect to `http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions`
+- Navigate to `server.js`
+- *Note that we are interacting with the model through an OpenAI endpoint (chat/completions) from within the backend container*
+- Take down services with `docker compose down`
+- *How can we learn more about models?*
+- In a separate terminal, run `docker model ls`
+- See you can run a model using `docker model run ai/llama3.2`
+- Exit with `/bye`
+- :red_circle: NAVIGATE BACK TO SLIDES
+
+
+## Let's scan what we built our image and test our code!
+### Topics: Scout, TCC 🐳
+- Navigate to Docker Desktop and search for image build from compose
+- Run analysis for vulnerabilities with Docker Scout
+- Navigate back to VS Code
 - Split VS Code and Docker Desktop
 - Navigate to tests/server.test.js and show TestContainers logic
 - Run `npm test` and watch test run, containers appear in DD
 - Switch to TestContainers cloud and re-run `npm test`, notice the containers do not appear in DD
 - View results in [TCC dashboard](https://app.testcontainers.cloud/accounts/9926/dashboard)
-
-
-## Let's build and scan our image!
-### Topics: Build, Build Cloud, and Scout 🐳
-- Let's try to build this locally: `docker build -t samanthamorris684/catbot:nobc . --platform="linux/amd64"`
-- *Note: This will only leverage local caching!*
-- We can also use build cloud remote bulder: `docker buildx build --builder cloud-demonstrationorg-default -t samanthamorris684/catbot:bc . --platform="linux/amd64"`
-- Subsequent builds of this image will use the shared build cache on different machines, making builds faster! [Take a look.](https://app.docker.com/build/accounts/demonstrationorg/builds)
-- *Note: We will also make use of build cloud in the CI pipeline.*
-- Navigate to Docker Desktop and search for image build
-- Run analysis for vulnerabilities with Docker Scout
-
-## How can we start up and tear down all these services together, and use containers for all?
-
-### Topics: Docker Compose 🐳
-
-- Navigate to the compose.yaml file
-- Two different containers/services, port mapping to access entry of app on port 3000
-- *Note: These containers will be able to talk to each other via their exposed ports*
-- Run `docker compose up --build`
-- Navigate to localhost:3000
-- When done, run `docker compose down`
+- :red_circle: NAVIGATE BACK TO SLIDES
 
 ## Bonus: How can we automate this?
 
 - You can use a pipeline to automate this process, in this case we use GitHub Actions
 - Let's make a quick PR.
 - Edit line 213 of App.js to a different cat name
-- Quick preview of a frontend change by running `dotenv -e .env.dev -- npm run start:dev`
+- Quick preview of change by running `docker compose up --build`
 - `git checkout -b new-cat`
 - `git add src/App.js` && `git commit -m "Change cat name"`
 - `git push`
@@ -65,6 +62,8 @@ Knowing this, let's start developing.
 
 - Navigate to GitHub and open a PR then see the pipeline for building, testing, and scanning
 
+- See we built our images with a cloud builder, navigate to [cloud builds](https://app.docker.com/build/accounts/demonstrationorg/builds) to see.
+
 - *Note: On merge, we kick off the deployment to prod, but we won't show that here!*
 
-- Navigate back to diagram slide to close out.
+- :red_circle: NAVIGATE BACK TO SLIDES
diff --git a/images/catbotfull.png b/images/catbotfull.png
diff --git a/server.js b/server.js
@@ -44,9 +44,6 @@ async function handleStreamRequest(req, res) {
 
     try {
         const host = ("REACT_APP_MODEL_SERVICE" in process.env) ? process.env.REACT_APP_MODEL_SERVICE : "model-published";
-        const port = ("REACT_APP_MODEL_PORT" in process.env) ? process.env.REACT_APP_MODEL_PORT : 11434;
-        const path = ("REACT_APP_MODEL_PATH" in process.env) ? process.env.REACT_APP_MODEL_PATH : "/api/generate";
-
         const isDMR = "DMR" in process.env ? true : false;
 
         // Add debug logging
@@ -56,11 +53,11 @@ async function handleStreamRequest(req, res) {
         
         if (isDMR) {
             // Docker Model Runner (OpenAI format)
-            console.log(`DMR endpoint: http://${host}:${port}${path}`);
+            console.log(`DMR endpoint: http://${host}/engines/llama.cpp/v1/chat/completions`);
             console.log(`Model: ${model}`)
             response = await axios({
                 method: 'post',
-                url: `http://${host}:${port}${path}`,
+                url: `http://${host}/engines/llama.cpp/v1/chat/completions`,
                 data: {
                     model: 'ai/' + model,
                     messages: [{ role: "user", content: prompt }],
@@ -93,14 +90,12 @@ async function handleStreamRequest(req, res) {
         response.data.on('data', (chunk) => {
             try {
                 const chunkStr = chunk.toString();
-                console.log("Received chunk:", chunkStr.substring(0, 50) + (chunkStr.length > 50 ? '...' : ''));
-                
+
                 // Handle DMR (OpenAI) format - may contain multiple SSE events
                 if (isDMR) {
                     // Split by double newlines to handle multiple SSE events in one chunk
                     const events = chunkStr.split('\n\n').filter(event => event.trim());
-                    console.log(`Found ${events.length} events in chunk`);
-                    
+
                     for (const event of events) {
                         if (event.startsWith('data: ')) {
                             const dataContent = event.replace('data: ', '');
@@ -115,9 +110,6 @@ async function handleStreamRequest(req, res) {
                             try {
                                 const data = JSON.parse(dataContent);
                                 
-                                // Debug the received data structure
-                                console.log("Parsed DMR data:", JSON.stringify(data).substring(0, 100));
-                                
                                 // Extract content based on what's available
                                 let content = '';
                                 if (data.choices && data.choices.length > 0) {
@@ -138,7 +130,6 @@ async function handleStreamRequest(req, res) {
                                 };
                                 
                                 if (content) {
-                                    console.log(`Sending content: ${content.substring(0, 20)}${content.length > 20 ? '...' : ''}`);
                                     // Send to client
                                     res.write(`data: ${JSON.stringify(responseData)}\n\n`);
                                 }