Skip to content

Conversation

maxpill
Copy link
Collaborator

@maxpill maxpill commented Jul 4, 2025

  • Introduced PptxDocumentParser for extracting content from PPTX files.
  • Updated pyproject.toml to include python-pptx as a dependency.
  • Modified the router to use PptxDocumentParser for PPTX document types.
  • Cleaned up formatting in pyproject.toml and init.py for consistency.

@maxpill maxpill requested a review from Copilot July 4, 2025 08:53
@maxpill maxpill linked an issue Jul 4, 2025 that may be closed by this pull request
Copilot

This comment was marked as outdated.

@maxpill maxpill changed the title feat: add PPTX document parser and update dependencies feat: add PPTX document parser Jul 4, 2025
@maxpill maxpill self-assigned this Jul 7, 2025
Copy link
Contributor

github-actions bot commented Jul 10, 2025

Trivy scanning results.

Report Summary

┌───────────────────┬──────┬─────────────────┬─────────┐
│ Target │ Type │ Vulnerabilities │ Secrets │
├───────────────────┼──────┼─────────────────┼─────────┤
│ package-lock.json │ npm │ 0 │ - │
├───────────────────┼──────┼─────────────────┼─────────┤
│ uv.lock │ uv │ 21 │ - │
└───────────────────┴──────┴─────────────────┴─────────┘
Legend:

  • '-': Not scanned
  • '0': Clean (no security findings detected)

For OSS Maintainers: VEX Notice

If you're an OSS maintainer and Trivy has detected vulnerabilities in your project that you believe are not actually exploitable, consider issuing a VEX (Vulnerability Exploitability eXchange) statement.
VEX allows you to communicate the actual status of vulnerabilities in your project, improving security transparency and reducing false positives for your users.
Learn more and start using VEX: https://trivy.dev/v0.64/docs/supply-chain/vex/repo#publishing-vex-documents

To disable this notice, set the TRIVY_DISABLE_VEX_NOTICE environment variable.

uv.lock (uv)

Total: 21 (MEDIUM: 15, HIGH: 4, CRITICAL: 2)

┌──────────────┬────────────────┬──────────┬────────┬───────────────────┬───────────────┬──────────────────────────────────────────────────────────────┐
│ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed Version │ Title │
├──────────────┼────────────────┼──────────┼────────┼───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ aiohttp │ CVE-2024-52303 │ MEDIUM │ fixed │ 3.10.8 │ 3.10.11 │ aiohttp: aiohttp memory leak when middleware is enabled when │
│ │ │ │ │ │ │ requesting a resource... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52303
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-52304 │ │ │ │ │ aiohttp: aiohttp vulnerable to request smuggling due to │
│ │ │ │ │ │ │ incorrect parsing of chunk... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52304
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ h11 │ CVE-2025-43859 │ CRITICAL │ │ 0.14.0 │ 0.16.0 │ h11: h11 accepts some malformed Chunked-Encoding bodies │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-43859
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ jinja2 │ CVE-2024-56201 │ MEDIUM │ │ 3.1.4 │ 3.1.5 │ jinja2: Jinja has a sandbox breakout through malicious │
│ │ │ │ │ │ │ filenames │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56201
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-56326 │ │ │ │ │ jinja2: Jinja has a sandbox breakout through indirect │
│ │ │ │ │ │ │ reference to format method... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56326
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-27516 │ │ │ │ 3.1.6 │ jinja2: Jinja sandbox breakout through attr filter selecting │
│ │ │ │ │ │ │ format method │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-27516
├──────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ requests │ CVE-2024-47081 │ │ │ 2.32.3 │ 2.32.4 │ requests: Requests vulnerable to .netrc credentials leak via │
│ │ │ │ │ │ │ malicious URLs │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47081
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ setuptools │ CVE-2025-47273 │ HIGH │ │ 75.1.0 │ 78.1.1 │ setuptools: Path Traversal Vulnerability in setuptools │
│ │ │ │ │ │ │ PackageIndex │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-47273
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ starlette │ CVE-2025-54121 │ MEDIUM │ │ 0.45.3 │ 0.47.2 │ Starlette is a lightweight ASGI (Asynchronous Server Gateway │
│ │ │ │ │ │ │ Interface ... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-54121
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ torch │ CVE-2025-32434 │ CRITICAL │ │ 2.2.2 │ 2.6.0 │ PyTorch is a Python package that provides tensor computation │
│ │ │ │ │ │ │ with stro ...... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-32434
├──────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ transformers │ CVE-2024-11392 │ HIGH │ │ 4.44.2 │ 4.48.0 │ transformers: Hugging Face Transformers MobileViTV2 │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code Execution... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11392
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11393 │ │ │ │ │ transformers: Hugging Face Transformers MaskFormer Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11393
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11394 │ │ │ │ │ transformers: Hugging Face Transformers Trax Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11394
│ ├────────────────┼──────────┤ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-12720 │ MEDIUM │ │ │ │ Transformers Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-12720
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-1194 │ │ │ │ 4.50.0 │ Transformers Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-1194
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-2099 │ │ │ │ │ transformers: Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ in huggingface/transformers │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-2099
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-3263 │ │ │ │ 4.51.0 │ transformers: Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ in huggingface/transformers │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-3263
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-3264 │ │ │ │ │ transformers: Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ in huggingface/transformers │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-3264
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-3933 │ │ │ │ 4.52.1 │ transformers: Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ in huggingface/transformers │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-3933
├──────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ urllib3 │ CVE-2025-50181 │ │ │ 2.2.3 │ 2.5.0 │ urllib3: urllib3 redirects are not disabled when retries are │
│ │ │ │ │ │ │ disabled on PoolManager... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-50181
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-50182 │ │ │ │ │ urllib3: urllib3 does not control redirects in browsers and │
│ │ │ │ │ │ │ Node.js │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-50182
└──────────────┴────────────────┴──────────┴────────┴───────────────────┴───────────────┴──────────────────────────────────────────────────────────────┘

@maxpill maxpill force-pushed the 687-feat-pptx-parser branch from 5538040 to db36d39 Compare July 11, 2025 12:12
Copy link
Contributor

github-actions bot commented Jul 11, 2025

badge

Code Coverage Summary

Filename                                                                                                         Stmts    Miss  Cover    Missing
-------------------------------------------------------------------------------------------------------------  -------  ------  -------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------
packages/ragbits-agents/src/ragbits/agents/__init__.py                                                               3       0  100.00%
packages/ragbits-agents/src/ragbits/agents/_main.py                                                                337      65  80.71%   41-43, 114, 120, 124-126, 129-130, 133-137, 140-141, 199-200, 476-480, 485, 487, 489, 504-505, 548, 552, 584-589, 633, 648-649, 670-703, 722-755
packages/ragbits-agents/src/ragbits/agents/exceptions.py                                                            44      14  68.18%   40-42, 51-52, 86-89, 100-108
packages/ragbits-agents/src/ragbits/agents/tool.py                                                                  41       1  97.56%   93
packages/ragbits-agents/src/ragbits/agents/types.py                                                                 15       0  100.00%
packages/ragbits-agents/src/ragbits/agents/mcp/__init__.py                                                           2       0  100.00%
packages/ragbits-agents/src/ragbits/agents/mcp/server.py                                                           143      14  90.21%   174, 183-184, 281, 332-335, 349, 361, 417-420, 434, 447
packages/ragbits-agents/src/ragbits/agents/mcp/utils.py                                                             13       0  100.00%
packages/ragbits-agents/src/ragbits/agents/tools/__init__.py                                                         2       0  100.00%
packages/ragbits-agents/src/ragbits/agents/tools/openai.py                                                          42      10  76.19%   21-23, 37-39, 53-55, 78
packages/ragbits-agents/tests/unit/test_agent.py                                                                   339       0  100.00%
packages/ragbits-agents/tests/unit/mcp/helpers.py                                                                   36       3  91.67%   21, 26, 61
packages/ragbits-agents/tests/unit/mcp/test_caching.py                                                              21       0  100.00%
packages/ragbits-agents/tests/unit/mcp/test_connect_disconnect.py                                                   28       0  100.00%
packages/ragbits-agents/tests/unit/mcp/test_exceptions.py                                                           25       1  96.00%   20
packages/ragbits-agents/tests/unit/mcp/test_mcp_utils.py                                                            41       0  100.00%
packages/ragbits-agents/tests/unit/tools/test_openai.py                                                             62       0  100.00%
packages/ragbits-chat/src/ragbits/chat/__init__.py                                                                   4       0  100.00%
packages/ragbits-chat/src/ragbits/chat/_utils.py                                                                    23       5  78.26%   17, 32, 38-40
packages/ragbits-chat/src/ragbits/chat/api.py                                                                      250      60  76.00%   76-77, 100, 141, 148, 154, 193, 205, 264-272, 306, 311-312, 358-377, 383-467, 472, 499-501, 514, 518, 545, 614-634
packages/ragbits-chat/src/ragbits/chat/cli.py                                                                        9       2  77.78%   39-46
packages/ragbits-chat/src/ragbits/chat/metrics.py                                                                   44       0  100.00%
packages/ragbits-chat/src/ragbits/chat/auth/__init__.py                                                              4       0  100.00%
packages/ragbits-chat/src/ragbits/chat/auth/backends.py                                                             66       1  98.48%   104
packages/ragbits-chat/src/ragbits/chat/auth/base.py                                                                 31       4  87.10%   47, 60, 74, 87
packages/ragbits-chat/src/ragbits/chat/auth/types.py                                                                36       0  100.00%
packages/ragbits-chat/src/ragbits/chat/client/__init__.py                                                            4       0  100.00%
packages/ragbits-chat/src/ragbits/chat/client/client.py                                                             46      21  54.35%   29-30, 34, 38, 47-48, 57-59, 63, 72, 90-91, 95, 99, 108-109, 118-119, 123, 132
packages/ragbits-chat/src/ragbits/chat/client/conversation.py                                                      136      13  90.44%   60, 62, 64, 78-79, 87, 90, 93-94, 116, 195, 198, 225
packages/ragbits-chat/src/ragbits/chat/client/exceptions.py                                                          4       0  100.00%
packages/ragbits-chat/src/ragbits/chat/history/__init__.py                                                           0       0  100.00%
packages/ragbits-chat/src/ragbits/chat/history/compressors/__init__.py                                               3       0  100.00%
packages/ragbits-chat/src/ragbits/chat/history/compressors/base.py                                                  10       0  100.00%
packages/ragbits-chat/src/ragbits/chat/history/compressors/llm.py                                                   29       1  96.55%   79
packages/ragbits-chat/src/ragbits/chat/interface/__init__.py                                                         2       0  100.00%
packages/ragbits-chat/src/ragbits/chat/interface/_interface.py                                                     123      17  86.18%   56, 135-144, 158, 214-215, 225, 235, 240, 245, 249, 304, 342-343, 360-368
packages/ragbits-chat/src/ragbits/chat/interface/forms.py                                                           50      13  74.00%   59, 64, 79-98, 117
packages/ragbits-chat/src/ragbits/chat/interface/types.py                                                          112       6  94.64%   90, 174, 184, 190, 196, 202
packages/ragbits-chat/src/ragbits/chat/interface/ui_customization.py                                                19       0  100.00%
packages/ragbits-chat/src/ragbits/chat/persistence/__init__.py                                                       2       0  100.00%
packages/ragbits-chat/src/ragbits/chat/persistence/base.py                                                           6       1  83.33%   28
packages/ragbits-chat/src/ragbits/chat/persistence/sql.py                                                           91       3  96.70%   292-294
packages/ragbits-chat/tests/unit/test_api.py                                                                       183       1  99.45%   258
packages/ragbits-chat/tests/unit/test_chat_client.py                                                               105       2  98.10%   67, 87
packages/ragbits-chat/tests/unit/test_conversation.py                                                              122       1  99.18%   64
packages/ragbits-chat/tests/unit/auth/test_list_auth_backend.py                                                    283       0  100.00%
packages/ragbits-chat/tests/unit/history/test_llm_compressor.py                                                     64       0  100.00%
packages/ragbits-chat/tests/unit/persistence/test_sql.py                                                            74       0  100.00%
packages/ragbits-cli/src/ragbits/cli/__init__.py                                                                    33       4  87.88%   78-79, 86-87
packages/ragbits-cli/src/ragbits/cli/_utils.py                                                                      23       4  82.61%   47, 65-67
packages/ragbits-cli/src/ragbits/cli/state.py                                                                       79       3  96.20%   50-51, 61
packages/ragbits-cli/tests/unit/test_state.py                                                                       72       2  97.22%   103-104
packages/ragbits-cli/tests/unit/test_utils.py                                                                       23       0  100.00%
packages/ragbits-core/src/ragbits/core/__init__.py                                                                  16       4  75.00%   20-21, 25-26
packages/ragbits-core/src/ragbits/core/cli.py                                                                        6       0  100.00%
packages/ragbits-core/src/ragbits/core/options.py                                                                   17       0  100.00%
packages/ragbits-core/src/ragbits/core/types.py                                                                      9       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/__init__.py                                                             5       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/metrics/__init__.py                                                    30      14  53.33%   39-56, 64
packages/ragbits-core/src/ragbits/core/audit/metrics/base.py                                                        49       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/traces/__init__.py                                                     80       9  88.75%   49-52, 55-58, 66-69
packages/ragbits-core/src/ragbits/core/audit/traces/base.py                                                        187      60  67.91%   156-165, 178-179, 200, 215-216, 220, 230, 233-234, 249, 256, 258-260, 266-268, 275-278, 338-349, 356-364, 378, 394-413
packages/ragbits-core/src/ragbits/core/audit/traces/cli.py                                                         133      29  78.20%   89-94, 113-140, 157, 164, 173-174, 177-178
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                        4       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                           32       5  84.38%   20-21, 24, 77, 92
packages/ragbits-core/src/ragbits/core/embeddings/exceptions.py                                                     17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/embeddings/dense/__init__.py                                                  4       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/dense/base.py                                                      9       1  88.89%   44
packages/ragbits-core/src/ragbits/core/embeddings/dense/fastembed.py                                                35       3  91.43%   34, 62-63
packages/ragbits-core/src/ragbits/core/embeddings/dense/litellm.py                                                  56      11  80.36%   131-136, 139, 143-145, 166
packages/ragbits-core/src/ragbits/core/embeddings/dense/local.py                                                    32       5  84.38%   13-14, 52, 68-69
packages/ragbits-core/src/ragbits/core/embeddings/dense/noop.py                                                     32       1  96.88%   99
packages/ragbits-core/src/ragbits/core/embeddings/dense/vertex_multimodal.py                                        60      24  60.00%   13-14, 57, 62, 102-123, 139-148, 175, 194-198
packages/ragbits-core/src/ragbits/core/embeddings/sparse/__init__.py                                                 4       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/sparse/bag_of_tokens.py                                           43       1  97.67%   53
packages/ragbits-core/src/ragbits/core/embeddings/sparse/base.py                                                    12       1  91.67%   48
packages/ragbits-core/src/ragbits/core/embeddings/sparse/fastembed.py                                               31       2  93.55%   25, 52
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                             30       4  86.67%   34, 38-39, 50
packages/ragbits-core/src/ragbits/core/llms/base.py                                                                257      22  91.44%   161-168, 171-179, 186-190, 248-251, 282, 313, 494
packages/ragbits-core/src/ragbits/core/llms/exceptions.py                                                           29       6  79.31%   17, 26-27, 36, 45, 63
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                              12       2  83.33%   30, 51
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                             195      63  67.69%   155, 172-173, 210, 239, 261, 305-415, 466, 470-475, 486, 515
packages/ragbits-core/src/ragbits/core/llms/local.py                                                               111      37  66.67%   14, 69, 79-80, 94-95, 101, 107, 119-120, 212-279, 294-295
packages/ragbits-core/src/ragbits/core/llms/mock.py                                                                 50       2  96.00%   126, 130
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                            2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/_cli.py                                                               53      22  58.49%   37-45, 59-61, 69-80, 88-90, 102-110
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                               45       1  97.78%   26
packages/ragbits-core/src/ragbits/core/prompt/discovery.py                                                          36       2  94.44%   55-56
packages/ragbits-core/src/ragbits/core/prompt/exceptions.py                                                         13       1  92.31%   17
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                            35       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                            189       7  96.30%   105-107, 178, 181, 257, 361
packages/ragbits-core/src/ragbits/core/sources/__init__.py                                                          10       0  100.00%
packages/ragbits-core/src/ragbits/core/sources/azure.py                                                             95      13  86.32%   65-66, 92-102, 189-190
packages/ragbits-core/src/ragbits/core/sources/base.py                                                              74       3  95.95%   46, 185-186
packages/ragbits-core/src/ragbits/core/sources/exceptions.py                                                        16       0  100.00%
packages/ragbits-core/src/ragbits/core/sources/gcs.py                                                               63       0  100.00%
packages/ragbits-core/src/ragbits/core/sources/git.py                                                               94       3  96.81%   188, 195, 211
packages/ragbits-core/src/ragbits/core/sources/google_drive.py                                                     285     104  63.51%   110, 136, 159, 165-177, 187, 198-217, 276-278, 306-321, 364-365, 374-379, 392-404, 434-436, 444-454, 460-473, 490-509, 536, 541, 545-552, 555-558, 575-576, 589-593
packages/ragbits-core/src/ragbits/core/sources/hf.py                                                                73      16  78.08%   57-60, 64-65, 85-88, 108-110, 138, 145-146
packages/ragbits-core/src/ragbits/core/sources/local.py                                                             41       2  95.12%   39, 80
packages/ragbits-core/src/ragbits/core/sources/s3.py                                                               105      17  83.81%   54-57, 75, 88-93, 117, 128-131, 162, 179
packages/ragbits-core/src/ragbits/core/sources/web.py                                                               41       2  95.12%   58, 75
packages/ragbits-core/src/ragbits/core/utils/__init__.py                                                             2       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                          38       1  97.37%   113
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                     79       9  88.61%   17, 55-56, 63-64, 133, 163-165
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                          29       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/dict_transformations.py                                               143      35  75.52%   24, 27, 80, 90, 110-115, 126-133, 147-151, 166-167, 173, 185-191, 195, 254
packages/ragbits-core/src/ragbits/core/utils/function_schema.py                                                     90      19  78.89%   105, 113-127, 134-147, 160, 205, 210, 213-215
packages/ragbits-core/src/ragbits/core/utils/helpers.py                                                             11       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/pydantic.py                                                            13       2  84.62%   13, 16
packages/ragbits-core/src/ragbits/core/utils/secrets.py                                                             18       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/__init__.py                                                     3       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/_cli.py                                                        50       4  92.00%   67, 89, 95, 119
packages/ragbits-core/src/ragbits/core/vector_stores/base.py                                                       103       3  97.09%   53, 214, 286
packages/ragbits-core/src/ragbits/core/vector_stores/chroma.py                                                      91       2  97.80%   74, 112
packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py                                                      34       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py                                           65       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/in_memory.py                                                   59       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py                                                   190      15  92.11%   97, 106-109, 125, 168, 312-313, 338-340, 373-375
packages/ragbits-core/src/ragbits/core/vector_stores/qdrant.py                                                      97       5  94.85%   80-95, 160, 181
packages/ragbits-core/src/ragbits/core/vector_stores/weaviate.py                                                   127       5  96.06%   104-132, 271
packages/ragbits-core/tests/conftest.py                                                                             12       0  100.00%
packages/ragbits-core/tests/cli/__init__.py                                                                          0       0  100.00%
packages/ragbits-core/tests/cli/test_cli_trace_handler.py                                                           47       3  93.62%   29, 42, 55
packages/ragbits-core/tests/cli/test_vector_store.py                                                               115       0  100.00%
packages/ragbits-core/tests/integration/sources/test_git.py                                                         68       6  91.18%   147-156
packages/ragbits-core/tests/integration/sources/test_hf.py                                                          19       9  52.63%   16-21, 32-37
packages/ragbits-core/tests/integration/sources/test_s3.py                                                          42       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/__init__.py                                                    0       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/test_keyword_search.py                                        79       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/test_vector_store.py                                         140       1  99.29%   51
packages/ragbits-core/tests/integration/vector_stores/test_vector_store_sparse.py                                   63       0  100.00%
packages/ragbits-core/tests/unit/__init__.py                                                                         0       0  100.00%
packages/ragbits-core/tests/unit/test_options.py                                                                    21       0  100.00%
packages/ragbits-core/tests/unit/audit/test_cli.py                                                                 107       0  100.00%
packages/ragbits-core/tests/unit/audit/test_metrics.py                                                              35       7  80.00%   14-19, 23
packages/ragbits-core/tests/unit/audit/test_trace.py                                                                98       3  96.94%   17, 20, 23
packages/ragbits-core/tests/unit/embeddings/test_bag_of_tokens.py                                                   52       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_fastembed.py                                                       50       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_from_config.py                                                     39       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_litellm.py                                                         64       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_local.py                                                           42       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_noop.py                                                            26       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_vector_size.py                                                     33       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_vertex_multimodal.py                                               39       0  100.00%
packages/ragbits-core/tests/unit/llms/__init__.py                                                                    0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_base.py                                                                 186       3  98.39%   77-80
packages/ragbits-core/tests/unit/llms/test_from_config.py                                                           27       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                              227       3  98.68%   170-173
packages/ragbits-core/tests/unit/llms/test_local.py                                                                 74       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                            0       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_preferred_llm.py                                             12       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                                 0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                            65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                            334       1  99.70%   777
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                       0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                      30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                         18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                        2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py                3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py           14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py           14       0  100.00%
packages/ragbits-core/tests/unit/sources/test_aws.py                                                                23       0  100.00%
packages/ragbits-core/tests/unit/sources/test_azure.py                                                              70       0  100.00%
packages/ragbits-core/tests/unit/sources/test_exceptions.py                                                         22       0  100.00%
packages/ragbits-core/tests/unit/sources/test_gcs.py                                                                33       6  81.82%   42-47
packages/ragbits-core/tests/unit/sources/test_git.py                                                               110       0  100.00%
packages/ragbits-core/tests/unit/sources/test_google_drive.py                                                      117      25  78.63%   27-32, 50, 62, 69, 84-95, 155, 168-169, 180-191
packages/ragbits-core/tests/unit/sources/test_hf.py                                                                 12       0  100.00%
packages/ragbits-core/tests/unit/sources/test_local.py                                                              13       0  100.00%
packages/ragbits-core/tests/unit/sources/test_source_discriminator.py                                               36       0  100.00%
packages/ragbits-core/tests/unit/sources/test_web.py                                                                43       0  100.00%
packages/ragbits-core/tests/unit/utils/__init__.py                                                                   0       0  100.00%
packages/ragbits-core/tests/unit/utils/test_config_handling.py                                                      76       2  97.37%   27-28
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                           26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/test_dict_transformations.py                                                 98       0  100.00%
packages/ragbits-core/tests/unit/utils/test_function_schema.py                                                      16       2  87.50%   19, 32
packages/ragbits-core/tests/unit/utils/test_helpers.py                                                               6       0  100.00%
packages/ragbits-core/tests/unit/utils/test_secrets.py                                                              24       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                       13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                                  9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                                37       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_base.py                                                          6       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py                                                       81       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_from_config.py                                                  55       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_hybrid.py                                                       74       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_hybrid_strategies.py                                            31       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py                                                   102       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_pgvector.py                                                    262       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py                                                      100       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_weaviate.py                                                    142       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                             2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                               91       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/cli.py                                                 40       2  95.00%   86, 105
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                                   0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                                  78       2  97.44%   49, 93
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                                   86      14  83.72%   97, 115, 179-187, 197, 206-208
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                                   0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/__init__.py                         4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/base.py                            21       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/exceptions.py                      14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/image.py                           30       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/router.py                          25       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/__init__.py                           3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/base.py                              28       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/docling.py                           44       2  95.45%   90, 151
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/exceptions.py                        14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/router.py                            25       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/unstructured.py                      66      24  63.64%   102, 121-123, 135-156, 176-190, 212-213, 233-248
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/__init__.py                      8       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/callbacks.py                    10       1  90.00%   32
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/exceptions.py                   16      10  37.50%   25-33, 49-52
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/hyperlink_callback.py           38      12  68.42%   44-69, 72, 81, 84
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/metadata_callback.py            29       9  68.97%   52-71, 74
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/parser.py                       43       6  86.05%   60-62, 71-73
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/speaker_notes_callback.py       31      13  58.06%   41-68, 71
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/__init__.py                        5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/base.py                          102      18  82.35%   156, 212-242, 284
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/batched.py                        69       8  88.41%   172, 200-215, 255-256
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/ray.py                            32       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/sequential.py                      4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                                   0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                        4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                           14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                            40       5  87.50%   51, 115-118
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                            8       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                         3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/answerai.py                        29       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                            19       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/litellm.py                         27       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/llm.py                             59       1  98.31%   173
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                            10       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/rrf.py                             28       2  92.86%   50, 60
packages/ragbits-document-search/tests/cli/custom_cli_source.py                                                     22       1  95.45%   32
packages/ragbits-document-search/tests/cli/test_ingest.py                                                           56       0  100.00%
packages/ragbits-document-search/tests/cli/test_search.py                                                           71       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                       0       0  100.00%
packages/ragbits-document-search/tests/integration/test_docling.py                                                  10       0  100.00%
packages/ragbits-document-search/tests/integration/test_pptx_parser.py                                              54       9  83.33%   32-34, 52, 71, 74-75, 78-79
packages/ragbits-document-search/tests/integration/test_rerankers.py                                                32       9  71.88%   32-39, 59-64
packages/ragbits-document-search/tests/integration/test_unstructured.py                                             12       4  66.67%   62-67
packages/ragbits-document-search/tests/unit/test_config.py                                                          63       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_parser_router.py                                          24       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_parsers.py                                                47       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                                238       1  99.58%   480
packages/ragbits-document-search/tests/unit/test_document_search_ingest_errors.py                                   38       0  100.00%
packages/ragbits-document-search/tests/unit/test_documents.py                                                       13       0  100.00%
packages/ragbits-document-search/tests/unit/test_element_enricher_router.py                                         23       0  100.00%
packages/ragbits-document-search/tests/unit/test_element_enrichers.py                                               56       1  98.21%   25
packages/ragbits-document-search/tests/unit/test_elements.py                                                        21       0  100.00%
packages/ragbits-document-search/tests/unit/test_ingest_strategies.py                                               43       0  100.00%
packages/ragbits-document-search/tests/unit/test_llm_reranker.py                                                    43       0  100.00%
packages/ragbits-document-search/tests/unit/test_rephrasers.py                                                      26       0  100.00%
packages/ragbits-document-search/tests/unit/test_rerankers.py                                                       80       1  98.75%   25
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/__init__.py                   0       0  100.00%
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/factories.py                 22       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/__init__.py                                                           0       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/cli.py                                                               46       3  93.48%   133, 135, 137
packages/ragbits-evaluate/src/ragbits/evaluate/evaluator.py                                                         92       1  98.91%   221
packages/ragbits-evaluate/src/ragbits/evaluate/optimizer.py                                                         92      18  80.43%   162-168, 187, 190-191, 194, 198-204, 207-210
packages/ragbits-evaluate/src/ragbits/evaluate/utils.py                                                             58      37  36.21%   31-50, 62-69, 98-101, 117-129, 140-149, 159-160
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/__init__.py                                               2       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py                                                  34       4  88.24%   58-60, 79
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/document_search.py                                       13       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/exceptions.py                                            10       5  50.00%   10-12, 21-25
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/__init__.py                                                   2       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/base.py                                                      27       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/document_search.py                                           23       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/__init__.py                                                11       1  90.91%   26
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/base.py                                                    24       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/document_search.py                                         38       6  84.21%   68-71, 80-84
packages/ragbits-evaluate/tests/cli/test_run_evaluation.py                                                          25       0  100.00%
packages/ragbits-evaluate/tests/unit/test_evaluator.py                                                             103       0  100.00%
packages/ragbits-evaluate/tests/unit/test_metrics.py                                                                77       0  100.00%
packages/ragbits-evaluate/tests/unit/test_optimizer.py                                                              68       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/__init__.py                                                       0       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/base.py                                                          15       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/openai_moderation.py                                             19       6  68.42%   27-34
packages/ragbits-guardrails/tests/unit/test_openai_moderation.py                                                    35       0  100.00%
TOTAL                                                                                                            14299    1190  91.68%

Diff against main

Filename                                                                                                         Stmts    Miss  Cover
-------------------------------------------------------------------------------------------------------------  -------  ------  --------
packages/ragbits-agents/src/ragbits/agents/_main.py                                                                +84     +46  -11.78%
packages/ragbits-agents/src/ragbits/agents/tool.py                                                                  +8      +1  -2.44%
packages/ragbits-agents/tests/unit/test_agent.py                                                                   +69       0  +100.00%
packages/ragbits-chat/src/ragbits/chat/__init__.py                                                                  +1       0  +100.00%
packages/ragbits-chat/src/ragbits/chat/api.py                                                                      +73     +26  -4.79%
packages/ragbits-chat/src/ragbits/chat/auth/__init__.py                                                             +4       0  +100.00%
packages/ragbits-chat/src/ragbits/chat/auth/backends.py                                                            +66      +1  +98.48%
packages/ragbits-chat/src/ragbits/chat/auth/base.py                                                                +31      +4  +87.10%
packages/ragbits-chat/src/ragbits/chat/auth/types.py                                                               +36       0  +100.00%
packages/ragbits-chat/src/ragbits/chat/interface/_interface.py                                                      +8      +2  -0.78%
packages/ragbits-chat/src/ragbits/chat/interface/types.py                                                          +51      +3  -0.44%
packages/ragbits-chat/src/ragbits/chat/interface/ui_customization.py                                                +6       0  +100.00%
packages/ragbits-chat/tests/unit/test_api.py                                                                       +37       0  +0.13%
packages/ragbits-chat/tests/unit/auth/test_list_auth_backend.py                                                   +283       0  +100.00%
packages/ragbits-core/src/ragbits/core/__init__.py                                                                  +8      +2  +100.00%
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                            +26      +4  -13.33%
packages/ragbits-core/src/ragbits/core/llms/base.py                                                                 +4       0  +0.14%
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                              +9       0  +1.56%
packages/ragbits-core/src/ragbits/core/llms/local.py                                                               +15      +3  +2.09%
packages/ragbits-core/src/ragbits/core/llms/mock.py                                                                 +3       0  +0.26%
packages/ragbits-core/src/ragbits/core/sources/google_drive.py                                                     +39      +8  +2.53%
packages/ragbits-core/src/ragbits/core/utils/function_schema.py                                                     +9      +5  -3.83%
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py                                                   +12      +7  -3.40%
packages/ragbits-core/tests/unit/llms/test_base.py                                                                 +29       0  +0.30%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                               +2       0  +0.01%
packages/ragbits-core/tests/unit/sources/test_google_drive.py                                                      +36     +10  -2.85%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/router.py                            +1       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/__init__.py                     +8       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/callbacks.py                   +10      +1  +90.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/exceptions.py                  +16     +10  +37.50%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/hyperlink_callback.py          +38     +12  +68.42%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/metadata_callback.py           +29      +9  +68.97%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/parser.py                      +43      +6  +86.05%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/speaker_notes_callback.py      +31     +13  +58.06%
packages/ragbits-document-search/tests/integration/test_pptx_parser.py                                             +54      +9  +83.33%
packages/ragbits-evaluate/src/ragbits/evaluate/evaluator.py                                                        +15       0  +0.21%
packages/ragbits-evaluate/tests/unit/test_evaluator.py                                                             +38       0  +100.00%
TOTAL                                                                                                            +1232    +182  -0.61%

Results for commit: a1ed241

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@maxpill maxpill force-pushed the 687-feat-pptx-parser branch from 16cd131 to 948f935 Compare July 11, 2025 14:06
- Reformatted authors and dependencies in pyproject.toml for consistency.
- Added PptxDocumentParser to the list of exported components in the ingestion parsers.
- Updated the router to use PptxDocumentParser for PPTX document types.
@maxpill maxpill force-pushed the 687-feat-pptx-parser branch from e0e7537 to 2596a48 Compare July 11, 2025 14:45
maxpill and others added 4 commits July 15, 2025 11:05
- Introduced a new script for creating dummy PPTX files to facilitate testing of the PPTX parser.
- Updated the PptxDocumentParser to utilize DocumentMeta for improved document handling.
- Refactored extractor classes to enhance clarity and maintainability, including renaming to follow a consistent naming convention.
- Improved extraction methods for text, hyperlinks, images, shapes, metadata, and speaker notes.
- Updated the test script to cast shapes to the correct type before accessing text frames, ensuring proper functionality of the PPTX parser during testing.
@maxpill maxpill marked this pull request as ready for review July 15, 2025 12:52
@maxpill maxpill requested a review from Copilot July 15, 2025 12:55
Copilot

This comment was marked as outdated.

@maxpill maxpill requested a review from Copilot July 15, 2025 12:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for parsing PPTX files by introducing a new parser and modular extractors, updating routing and dependencies.

  • Introduce PptxDocumentParser and multiple extractors for text, images, hyperlinks, shapes, speaker notes, and metadata.
  • Update the router to use PptxDocumentParser for DocumentType.PPTX.
  • Add the python-pptx dependency and clean up formatting across configuration and exports.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/router.py Route DocumentType.PPTX to PptxDocumentParser
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/parser.py New asynchronous PPTX parser implementation
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/extractors/extractors.py Add modular extractors for various PPTX content types
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/extractors/init.py Export extractor classes
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/init.py Export PptxDocumentParser in the parser module
packages/ragbits-document-search/pyproject.toml Include python-pptx dependency and format cleanup
packages/ragbits-document-search/CHANGELOG.md Document the new PPTX parser feature
examples/document-search/test_pptx_parser.py Temporary example script for manual PPTX parser testing
Comments suppressed due to low confidence (2)

packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/pptx/parser.py:37

  • Add unit tests for PptxDocumentParser.parse and associated extractors in the test suite to ensure reliable parsing behavior.
    async def parse(self, document: Document) -> list[Element]:

packages/ragbits-document-search/CHANGELOG.md:232

  • Remove the stray - under ### Changed to clean up the changelog formatting.
- Add LiteLLM Reranker (#109).

Copy link
Contributor

@nikpocuca nikpocuca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few things, overall concept is nice.

I would also suggestion using cursor or Claude 4 github copilot to write nice docstrings and handle formatting.

maxpill and others added 5 commits July 16, 2025 09:44
- Added custom exceptions for PPTX parsing errors, including PptxExtractionError, PptxExtractorError, PptxParserError, PptxPresentationError, and PptxSlideProcessingError.
- Enhanced the PptxDocumentParser to raise appropriate exceptions during parsing failures and log detailed error messages.
- Improved logging throughout the extraction process to track successful and failed extractions, including shape processing and metadata extraction.
- Updated extractor classes to handle errors gracefully and provide informative logs for debugging.
- Deleted the test script for creating dummy PPTX files, which was used for development and PR testing purposes. This script will not be included in the final merge.
@maxpill maxpill force-pushed the 687-feat-pptx-parser branch from 4a27750 to ea41ed3 Compare July 21, 2025 09:35
@maxpill
Copy link
Collaborator Author

maxpill commented Jul 21, 2025

merge #733 first

maxpill and others added 3 commits July 24, 2025 22:10
…gleDriveSource

- Introduced GoogleDriveExportFormat enum to standardize export MIME types.
- Updated _GOOGLE_EXPORT_MIME_MAP and _EXPORT_EXTENSION_MAP to use the new enum.
- Modified fetch method to accept an optional export_format parameter for overriding MIME types.
- Enhanced _determine_file_extension method to support MIME type overrides.
- Added unit test for verifying correct file extension determination with overridden MIME types.
Copy link
Contributor

@pocucan-ds pocucan-ds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

Copy link
Contributor

@nikpocuca nikpocuca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job on the override feature.

maxpill added 2 commits August 1, 2025 14:37
…ation

- Removed unnecessary logging and commented code in the PptxDocumentParser and BasePptxExtractor classes.
- Consolidated element creation methods for text and images, improving clarity and maintainability.
- Updated extraction methods to handle elements more generically, allowing for better extensibility.
Copy link
Collaborator

@ds-sebastianchwilczynski ds-sebastianchwilczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big problem I see is that we create Element objects with extractors, and if I understand correctly these Elements are ment to be transformed with an Embedder as they has to_vector_db_entry. But in this case objects you extract should first go through grouping and chunking process. If we take every even the smallest object from pptx and embed it we will end up with ton of useless chunks and our RAG system will fail.


I tested unstrutured and docling for this task and unstructured partition_pptx didn't extract images from my slides. However, docling did great job.

from docling.document_converter import DocumentConverter
from docling_core.types.doc.document import PictureItem
converter = DocumentConverter()
doc = converter.convert(source="../testing_scripts/Slides.pptx").document

for item, idx in doc.iterate_items(with_groups=True):
    if not isinstance(item, PictureItem):
        print(item)
        
for idx, i in enumerate(doc.pictures):
    img = i.image.pil_image
    img.save(f"Slides_{idx}.png")

it loooks like it didn't extract presenter's notes, and some other stuff that we did.

But after we have it extracted with this form, we have docling grouping, serialization, chunking etc for free.

Also we can add more elements to the Docling's Document class

So what, I'd do:

  1. Utilise everything we can from docling directly - less code, less problems
  2. Then expand docling document with our custom implementation, our implementations should return docling compliant formats. What I'd personally do is implement callback system in the Docling parser. So you can add callbacks that will be run as we are parsing a pptx file. This callback takes something, and docling document, extract value from something and adds it to docling document
  3. Then I believe we can utilise Chunker from docling, or create a custom one.

But please discuss this with @mhordynski

- Added new callback classes for extracting hyperlinks, metadata, and speaker notes from PPTX files.
- Refactored the PPTX document parser to utilize these callbacks for enhancing the docling document.
- Removed unused extractor classes and related imports to streamline the codebase.
- Introduced a default callback list for easier configuration.
- Introduced a new integration test for the PPTX document parser to validate the extraction of hyperlinks, metadata, and speaker notes from a sample presentation.
- Added a sample PPTX presentation asset for testing purposes.
@mhordynski mhordynski changed the base branch from main to develop September 4, 2025 07:56
@mhordynski mhordynski merged commit 41e9b33 into develop Sep 4, 2025
9 checks passed
@mhordynski mhordynski deleted the 687-feat-pptx-parser branch September 4, 2025 08:02
mhordynski added a commit that referenced this pull request Sep 11, 2025
Co-authored-by: Copilot <[email protected]>
Co-authored-by: pocucan-ds <[email protected]>
Co-authored-by: Mateusz Hordyński <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: PPTX parser
5 participants