fix: fully secure MCP iframe initialization handshake with sandbox-init payload

dmandar · dmandar · commit ab56d1715fae · 2026-03-03T10:57:52.000-08:00
diff --git a/pr_748_replies.md b/pr_748_replies.md
@@ -0,0 +1,80 @@
+# PR #748 Review Responses
+
+Here are proposed first-person responses you can paste into each of the review comment threads on GitHub! I've included the reviewer's exact quotes so you can easily match them up.
+
+### 1. General PR Summary Comment (Security & Tests)
+**Reviewer via gemini-code-assist:**
+> ![medium]
+>
+> This pull request integrates MCP Apps into A2UI by adding a new `McpAppsCustomComponent`, a double-iframe sandbox for security, and a persistent SSE backend... I've identified critical security issues related to `postMessage` usage that should be addressed. Additionally, there are opportunities to improve maintainability by removing hardcoded URLs... The repository's style guide requires tests for new code... Please consider adding tests...
+
+**Reply:**
+> Thanks for the thorough review! I've gone ahead and secured all the `postMessage` boundaries across the stack. Specifically, the client sandbox proxy now strictly enforces `EXPECTED_HOST_ORIGIN` validating against the `document.referrer`, and the inner `floor_plan_server` uses a stateful approach to capture and lock to the exact `hostOrigin` from the frontend handshake rather than blindly broadcasting to `*`. I've also parameterized all hardcoded URLs.
+> 
+> As for the tests, we currently do not use an automated UI testing framework for the Python ADK backend samples, but I've manually verified the edge cases and failure modes end-to-end to ensure the connection robustly handles errors and rejected tool calls.
+
+---
+
+### 2. `sandbox.ts` Line (null) - Target Origin
+**Reviewer via gemini-code-assist:**
+> ![high]
+> 
+> When forwarding messages to the inner iframe, you are using a wildcard `*` as the target origin. While the inner iframe is same-origin in this setup, it is a security best practice to always specify the exact target origin. You should use `OWN_ORIGIN` here to ensure the message is only delivered if the inner iframe's origin matches.
+
+**Reply:**
+> Addressed in the latest commit. I realized this was leaking through the proxy, so I swapped the forwarder destination from `*` to `OWN_ORIGIN`.
+
+---
+
+### 3. `floor_plan_server.py` Line (null) - Target Origin Vulnerability
+**Reviewer via gemini-code-assist:**
+> ![critical]
+> 
+> The `postMessage` calls on lines 224 and 264 use a wildcard `*` for the target origin, which is a significant security vulnerability. This allows any website to embed this content and intercept the messages. You should restrict the target origin to the specific, expected parent origin. For example, the parent frame could send its origin in an initial message, which this script could then store and use for all subsequent `postMessage` calls.
+
+**Reply:**
+> Great point. I implemented exactly what you suggested: The inner iframe logic now defaults `hostOrigin` to `*` only until it receives a `sandbox-init` message from the parent proxy. It captures the `event.origin`, permanently saves it as the `hostOrigin`, and strictly uses that for all subsequent outbound MCP tool calls and the initial `ui/initialize` handshake!
+
+---
+
+### 4. `agent.py` Line (null) - Hardcoded SSE URL
+**Reviewer via gemini-code-assist:**
+> ![medium]
+> 
+> The SSE server URL `http://127.0.0.1:8000/sse` is hardcoded. This makes the agent less flexible and harder to configure for different environments (e.g., development, staging, production). It's recommended to extract this into a configurable variable, for instance, loaded from an environment variable.
+
+**Reply:**
+> Good catch! I updated the connection logic to grab `FLOOR_PLAN_SERVER_URL` entirely from the `os.environ`. It defaults securely to the local `http://127.0.0.1:8000/sse` for the out-of-the-box demo experience, but can now easily run in deployed or CI environments without code changes.
+
+---
+
+### 5. `agent.py` Line 260 - Broad Exception Handler
+**Reviewer via gemini-code-assist:**
+> ![medium]
+> 
+> Catching a broad `Exception` can hide unexpected errors and make debugging more difficult. It's better to catch more specific exceptions that you expect from the network request (e.g., connection errors) and from the logic within the `try` block (like the `ValueError` you're raising). This allows for more granular error handling and logging.
+
+**Reply:**
+> Done. I've added a specific catch block for `ValueError` alongside the other connection handlers. If the floor plan server responds with invalid or empty data (like a 404), the agent will now catch it explicitly and gracefully yield a UI error message indicating the failure to load the floor plan, rather than swallowing a broader bug.
+
+---
+
+### 6. `floor_plan_server.py` Line (null) - Hardcoded Static URL
+**Reviewer via gemini-code-assist:**
+> ![medium]
+> 
+> The image URL `http://localhost:10004/static/floorplan.png` is hardcoded within the HTML string. This will cause issues when deploying to environments other than local development. This URL should be made configurable, for example by passing it into the HTML template from the Python server, which could in turn read it from an environment variable or configuration file.
+
+**Reply:**
+> Fixed. I refactored the floor plan HTML payload injection to dynamically inject an `AGENT_STATIC_URL` variable read from the environment. It replaces `__AGENT_STATIC_URL__` in the template strings, entirely decoupling the static asset delivery from the strict local port mapping.
+
+---
+
+### 7. `mcp-apps-component.ts` Line 190 - Complex Action Arguments
+**Reviewer via gemini-code-assist:**
+> ![medium]
+> 
+> The `#dispatchAgentAction` method currently only handles primitive types (`string`, `number`, `boolean`) for action parameters. If an action parameter is a complex object or an array, it will be skipped without an error. To make this more robust, you should consider handling these cases, for example by serializing complex values to a JSON string.
+
+**Reply:**
+> This is a great edge case to protect against. I've updated the dispatcher's type checking logic as you suggested. It now gracefully detects complex objects or arrays and stringifies them into a generic `literalString` payload using `JSON.stringify()`. This ensures the backend `context` resolver can still extract those arguments dynamically without the frontend silently dropping them.
diff --git a/samples/agent/adk/contact_multiple_surfaces/floor_plan_server.py b/samples/agent/adk/contact_multiple_surfaces/floor_plan_server.py
@@ -250,6 +250,18 @@ async def read_resource(uri: str) -> str | bytes:
             // Capture the trusted host origin from the first incoming message
             if (hostOrigin === '*' && event.source === window.parent) {
                 hostOrigin = event.origin;
+
+                // MCP Handshake AFTER getting the origin securely
+                window.parent.postMessage({
+                    jsonrpc: "2.0",
+                    id: Date.now(),
+                    method: "ui/initialize",
+                    params: {
+                        appCapabilities: {},
+                        clientInfo: { name: "Floor Plan App", version: "1.0.0" },
+                        protocolVersion: "2026-01-26"
+                    }
+                }, hostOrigin);
             }
             
             const data = event.data;
@@ -259,18 +271,6 @@ async def read_resource(uri: str) -> str | bytes:
         });
 
         createHotspots();
-        
-        // MCP Handshake
-        window.parent.postMessage({
-            jsonrpc: "2.0",
-            id: Date.now(),
-            method: "ui/initialize",
-            params: {
-                appCapabilities: {},
-                clientInfo: { name: "Floor Plan App", version: "1.0.0" },
-                protocolVersion: "2026-01-26"
-            }
-        }, hostOrigin);
     </script>
 </body>
 </html>"""
diff --git a/samples/agent/adk/contact_multiple_surfaces/tasks-do-not-submit.md b/samples/agent/adk/contact_multiple_surfaces/tasks-do-not-submit.md
@@ -0,0 +1,7 @@
+Check out the contact_multiple_surfaces sample for a2ui. Here, we are using a single agent to handle multiple a2ui surfaces. One is a genUI surface (JSON coming down and rendered using standard catalog components). The second is a custom component surface (where we are using a custom component to render the UI). The third one (for showing office location) is an iframe custom component. 
+
+We want to create a new MCP-APPs SDK based custom component that can be used similar to the iframe component (be fully interactive and update combined state of all surfaces through piped action events). Make ultra sure that you are strictly referring to the official MCP-Apps docs and use the official MCP apps sdk for this custom MCPAppsCustomComponent. It is at https://github.com/modelcontextprotocol/ext-apps
+
+For this, I assume you will need to create a new MCP server that will host this custom component. This MCP server will be called by the agent to render the custom component. We need to make sure that it is still within A2UI. (The ui:// resource is encapsulated with a2ui).
+
+Goal of this new custom component is to be able to easily integrate with already built MCP servers that support MCP apps. Make sure that the sample functionality (including shared state across all a2ui surfaces) and looks are maintained. You can just replace the iframe component with this new MCPAppsCustomComponent. Then the surface will be an MCP apps custom component surface.
diff --git a/samples/client/lit/contact/ui/sandbox.ts b/samples/client/lit/contact/ui/sandbox.ts
@@ -71,11 +71,20 @@ window.addEventListener("message", async (event) => {
         inner.setAttribute("allow", allowAttribute);
       }
       if (typeof html === "string") {
+        const sendInit = () => {
+          if (inner.contentWindow) {
+            inner.contentWindow.postMessage({ type: "sandbox-init" }, OWN_ORIGIN);
+          }
+        };
+        inner.onload = sendInit;
+
         const doc = inner.contentDocument || inner.contentWindow?.document;
         if (doc) {
           doc.open();
           doc.write(html);
           doc.close();
+          // doc.write doesn't always trigger iframe onload reliably.
+          Promise.resolve().then(sendInit);
         } else {
           inner.srcdoc = html;
         }