Refactor Latency #43

MAVRICK-1 · 2025-06-21T10:43:43Z

/claim #2

📊 Test Results

JSON Results:

{
  "file": "sample_test.pdf",
  "format": ".pdf",
  "strategies": {
    "fixed": {
      "status": "success",
      "chunks": 1,
      "time": 6.198883056640625e-06
    },
    "paragraph": {
      "status": "success", 
      "chunks": 1,
      "time": 4.887580871582031e-05
    },
    "heading": {
      "status": "success",
      "chunks": 1,
      "time": 6.175041198730469e-05
    },
    "page": {
      "status": "success",
      "chunks": 1,
      "time": 0.0016906261444091797
    },
    "semantic": {
      "status": "success",
      "chunks": 0,
      "time": 2.586862564086914,
      "pages": 1
    }
  }
}

✨ Solution

Fix content extraction pipeline to produce actual chunks instead of 0.

🧪 How to Test

Current Test:

python3 test_chunker.py sample_test.pdf
# Result: 0 chunks

Test with Other Files:

# Test your own PDF
python3 test_chunker.py /path/to/your/document.pdf

# Test multiple files
python3 test_chunker.py doc1.pdf doc2.pdf doc3.pdf

# Test all sample files
python3 test_chunker.py

Expected After Fix:

python3 test_chunker.py sample_test.pdf
# Result: 2-3 chunks extracted

Screencast.from.2025-06-21.16-01-31.webm

…or chunking strategies - Updated `yolo_model_utils.py` to dynamically retrieve the YOLO model path. - Implemented error handling for model loading. - Created `test_chunker.py` to test various chunking strategies across multiple document formats. - Added functionality to auto-download sample documents for testing. - Included tests for fixed-size, paragraph, heading, page, and semantic chunking.

algora-pbc bot added the 🙋 Bounty claim label Jun 21, 2025

algora-pbc bot mentioned this pull request Jun 21, 2025

Fix:Reduce the latency of document parser #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Latency #43

Refactor Latency #43

Uh oh!

MAVRICK-1 commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor Latency #43

Are you sure you want to change the base?

Refactor Latency #43

Uh oh!

Conversation

MAVRICK-1 commented Jun 21, 2025

📊 Test Results

JSON Results:

✨ Solution

🧪 How to Test

Current Test:

Test with Other Files:

Expected After Fix:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant