Sanitize image markdown in sources #2765

pamelafox · 2025-10-07T01:37:31Z

Purpose

A developer ingested markdown files that used :::image inline images, and Prompty attempted to extract them and treat them as actual images. This PR escapes them, by turning the colons into HTML entities, which seems to avoid the Prompty auto extraction.
Also added a test based off their file.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[X] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

Copilot

Pull Request Overview

This PR fixes an issue where DocFX-style ::: image markdown directives were being incorrectly processed by Prompty's image extraction functionality. The fix sanitizes these directives by escaping triple colons to HTML entities.

Sanitizes DocFX image directives by converting triple colons to HTML entities (:::)
Refactors text cleaning logic to handle both newline normalization and directive escaping
Adds comprehensive test coverage for the sanitization functionality

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
app/backend/approaches/approach.py	Implements sanitization logic to escape triple colons in source content
tests/test_chatapproach.py	Adds test to verify DocFX image directive sanitization works correctly

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/test_chatapproach.py

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-07T01:40:50Z

app/backend/approaches/approach.py

+            s = s.replace("\n", " ").replace("\r", " ")  # normalize newlines to spaces
+            s = s.replace(":::", "&#58;&#58;&#58;")  # escape DocFX/markdown triple colons
+            return s


[nitpick] The function modifies and reassigns the same variable 's' multiple times. Consider using a more functional approach by chaining the operations or using intermediate variables for better readability.

Suggested change

s = s.replace("\n", " ").replace("\r", " ") # normalize newlines to spaces

s = s.replace(":::", ":::") # escape DocFX/markdown triple colons

return s

normalized = s.replace("\n", " ").replace("\r", " ") # normalize newlines to spaces

escaped = normalized.replace(":::", ":::") # escape DocFX/markdown triple colons

return escaped

Copilot · 2025-10-07T01:40:51Z

tests/test_chatapproach.py

+    image_directive = (
+        "activator-introduction.md#page=1: Intro text before image. "
+        ':::image type="content" source="./media/activator-introduction/activator.png" '
+        'alt-text="Diagram that shows the architecture of Fabric Activator."::: More text after image.'
+    )
+
+    async def build_sources():
+        return await chat_approach.get_sources_content(
+            [
+                Document(
+                    id="doc1",
+                    content=image_directive.split(": ", 1)[1],
+                    sourcepage="activator-introduction.md#page=1",


[nitpick] The test string contains a magic value that mixes citation format with content. Consider extracting the citation prefix and content into separate variables to make the test structure clearer and more maintainable.

Suggested change

image_directive = (

"activator-introduction.md#page=1: Intro text before image. "

':::image type="content" source="./media/activator-introduction/activator.png" '

'alt-text="Diagram that shows the architecture of Fabric Activator."::: More text after image.'

)

async def build_sources():

return await chat_approach.get_sources_content(

[

Document(

id="doc1",

content=image_directive.split(": ", 1)[1],

sourcepage="activator-introduction.md#page=1",

citation_prefix = "activator-introduction.md#page=1"

image_content = (

"Intro text before image. "

':::image type="content" source="./media/activator-introduction/activator.png" '

'alt-text="Diagram that shows the architecture of Fabric Activator."::: More text after image.'

)

image_directive = f"{citation_prefix}: {image_content}"

async def build_sources():

return await chat_approach.get_sources_content(

[

Document(

id="doc1",

content=image_content,

sourcepage=citation_prefix,

Sanitize image markdown and add test

d343af1

pamelafox requested review from Copilot and mattgotteiner October 7, 2025 01:38

Copilot AI reviewed Oct 7, 2025

View reviewed changes

tests/test_chatapproach.py Outdated Show resolved Hide resolved

tests/test_chatapproach.py Outdated Show resolved Hide resolved

Address feedback

3edc3b2

pamelafox requested a review from Copilot October 7, 2025 01:40

Copilot AI reviewed Oct 7, 2025

View reviewed changes

mattgotteiner approved these changes Oct 7, 2025

View reviewed changes

pamelafox merged commit efe3c77 into Azure-Samples:main Oct 7, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sanitize image markdown in sources #2765

Sanitize image markdown in sources #2765

Uh oh!

pamelafox commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 7, 2025

Uh oh!

Copilot AI Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Sanitize image markdown in sources #2765

Sanitize image markdown in sources #2765

Uh oh!

Conversation

pamelafox commented Oct 7, 2025

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!