You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I am currently extracting document content through pipelines and then processing it myself before passing it on, using the basic code below for extraction.
However, as you know, this code has potential risks (especially when files contain actual <source... syntax internally).
def pipe(self, user_message: str, model_id: str, messages: List[dict], body: dict) -> Union[str, Generator, Iterator]:
for msg in messages:
content = msg.get('content', '')
source_matches = re.findall(r'<source[^>]*name="([^"]*)"[^>]*>(.*?)</source>', content, re.DOTALL)
for filename, source_content in source_matches:
documents.append(source_content)
Pipelines are excellent, but debugging is complex, so I need advice on how to handle this cleanly.
Additionally, I know how to filter this, but please let me know if there are more fundamental solutions than that.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello. I am currently extracting document content through pipelines and then processing it myself before passing it on, using the basic code below for extraction.
However, as you know, this code has potential risks (especially when files contain actual
<source...
syntax internally).Pipelines are excellent, but debugging is complex, so I need advice on how to handle this cleanly.
Additionally, I know how to filter this, but please let me know if there are more fundamental solutions than that.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions