Skip to content

Commit 3f1fd46

Browse files
committed
fix: Add missing is_url function to pdf_processor - Add URL validation function - Import urllib.parse for URL parsing - Fix URL processing error
1 parent 86dd608 commit 3f1fd46

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

agentic_rag/pdf_processor.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@
44
import argparse
55
from docling.document_converter import DocumentConverter
66
from docling.chunking import HybridChunker
7+
from urllib.parse import urlparse
8+
9+
def is_url(string: str) -> bool:
10+
"""Check if a string is a valid URL"""
11+
try:
12+
result = urlparse(string)
13+
return all([result.scheme, result.netloc])
14+
except:
15+
return False
716

817
class PDFProcessor:
918
def __init__(self, tokenizer: str = "BAAI/bge-small-en-v1.5"):

0 commit comments

Comments
 (0)