Skip to content

Remove unstructured dependency and legacy-partitioners.#1587

Merged
bsowell merged 1 commit intomainfrom
ben/remove_unstructured
Feb 25, 2026
Merged

Remove unstructured dependency and legacy-partitioners.#1587
bsowell merged 1 commit intomainfrom
ben/remove_unstructured

Conversation

@bsowell
Copy link
Contributor

@bsowell bsowell commented Feb 24, 2026

Most of the changes here are fixing up old tests.

Most of the changes here are fixing up old tests.
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Feb 24, 2026

Found 25 test failures on Blacksmith runners:

Failures

Test View Logs
test_cached_anthropic_different_models View Logs
test_chained_llm_table_structure_extractor View Logs
test_openai_table_structure_extractor View Logs
test_pdf_to_opensearch_with_llm_caching View Logs
test_pinecone_read View Logs
test_qdrant View Logs
test_qdrant_named_vector View Logs
test_sycamore_batched View Logs
test_to_neo4j View Logs
/home/runner/_work/sycamore/sycamore/notebooks/ArynPartitionerPython/ipynb View Logs
/home/runner/_work/sycamore/sycamore/notebooks/docprep/
minilm-l6-v2_greedy-section-merger_opensearch/ipynb
View Logs
test_aryn_reader[ExecMode/RAY] View Logs
test_aryn_reader_with_original_elements[ExecMode/LOCAL] View Logs
test_aryn_reader_with_original_elements[ExecMode/RAY] View Logs
TestOpenSearchRead/test_result_filter_on_property_knn View Logs
TestPaddleOcr/test_paddle_ocr_on_pdf View Logs
TestSycamoreQuery/test_dry_run[False] View Logs
TestSycamoreQuery/test_dry_run[True] View Logs
TestSycamoreQuery/test_forked[False] View Logs
TestSycamoreQuery/test_forked[True] View Logs
TestSycamoreQuery/test_simple[True] View Logs
TestSycamoreQuery/test_simple_with_result_filter[False] View Logs
TestSycamoreQuery/test_vector_search_2 View Logs
TestSycamoreQuery/test_vector_search_3 View Logs
TestSycamoreQuery/test_vector_search_with_result_filter[False] View Logs

Fix in Cursor

@bsowell bsowell requested review from HenryL27 and alexaryn February 24, 2026 23:08
@bsowell bsowell marked this pull request as ready for review February 24, 2026 23:31
Copy link
Collaborator

@alexaryn alexaryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it.

Comment on lines +19 to +20
ctx = sycamore.init()
return ctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe yield ctx and then sycamore.shutdown()? eh prob dont matter

docs = (
context.read.binary(paths=[str(path)], binary_format="pdf")
.partition(partitioner=UnstructuredPdfPartitioner())
.partition(partitioner=ArynPartitioner())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth making a FakeArynPartitioner for these tests at some point. idk, the nice thing about the unstructured one was it was all local

@bsowell bsowell merged commit ea32fbf into main Feb 25, 2026
11 of 14 checks passed
@bsowell bsowell deleted the ben/remove_unstructured branch February 25, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants