Suggested strategies for "skipping" incomplete/malformed markdown links? #1332
-
|
Hey remark community, I have a question on what strategies I could consider to skip incomplete/malformed markdown links. Given the following strings, markdown would render (using Github markdown preview to confirm):
I want to understand the general direction on strategies one can apply to essentially skip rendering the link if it is malformed e.g. scenarios 2 and 3 will only print I'm assuming one has to author some validator/transform e.g. const validateMarkdown = (text: string) => string;
validateMarkdown('Hello [world](https://www.world.com)'); // ''Hello [world](https://www.world.com)"
validateMarkdown('Hello [world]'); // ''Hello "
validateMarkdown('Hello [world](https://`'); // ''Hello "
// in a similar way, `validateMarkdown` to be extended to handle HTML comments
validateMarkdown('Hello <!-- this is a comment -->'); // 'Hello <!-- this is a comment -->'
validateMarkdown('Hello <!-- this is a broken comment '); // 'Hello "This question is mostly to understand the recommended approach from the community, and any help from relevant resources within remark/micromark ecosystem to solve this problem will be greatly appreciated.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
|
Hey!
The thing is, there is no malformed markdown. |
Beta Was this translation helpful? Give feedback.
-
|
@chrisrzhou, great question. I have recently dealt with something similar. I think it's important to buffer the response and not expose the user to the incoming chunks directly. There are three options:
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
This is a misunderstanding. The LLMs do produce valid markdown. The problem is how to render/pre-render markdown content coming chunk by chunk (streamed), which is naturally becoming valid and invalid, before it becomes complete.
This is a dream of every front-end developer. A service, which returns the complete data instantaneously :-) But while we're awake, we build workarounds for the real-world services :-) LLM vendors encourage the developers to use the streaming mode. Being able to stream the answer lets them optimise memory consumption during the output generation. And of course, letting the user start reading or listening to the answer earlier is important too. Producing longer and more complicated answers takes tens of seconds. Not letting the user start earlier would be wasting their time. I believe you yourself enjoy streaming answers too, when you ask AI for help :-) So, the real-world task is to continuously render markdown chunks as they come. Thanks for pointing at guidance and outlines. They aren't specifically for Markdown, I'll need to look at them more closely. So far, I found the following libraries and approaches:
|
Beta Was this translation helpful? Give feedback.
There is no "streaming" markdown, it is always a full document.
Make the LLM produce valid markdown.
You can shape the output with libraries like: https://github.com/guidance-ai/guidance and https://github.com/outlines-dev/outlines to ensure the LLM will produce valid markdown.