Skip to content

docs: populated and validated the concept pages#1533

Open
Olamideod wants to merge 7 commits intoxorq-labs:mainfrom
Olamideod:docs/concept-pages
Open

docs: populated and validated the concept pages#1533
Olamideod wants to merge 7 commits intoxorq-labs:mainfrom
Olamideod:docs/concept-pages

Conversation

@Olamideod
Copy link
Collaborator

fixes #1532

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 38.24000% with 386 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
python/xorq/expr/ml/tests/test_structer.py 21.16% 231 Missing ⚠️
python/xorq/expr/ml/tests/test_pipeline_lib.py 13.43% 58 Missing ⚠️
python/xorq/expr/ml/tests/test_fit_lib.py 16.94% 49 Missing ⚠️
python/xorq/expr/ml/structer.py 70.58% 30 Missing ⚠️
python/xorq/expr/ml/fit_lib.py 75.00% 17 Missing ⚠️
examples/bank_marketing.py 95.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
python/xorq/expr/ml/pipeline_lib.py 68.57% <100.00%> (-19.79%) ⬇️
examples/bank_marketing.py 96.49% <95.00%> (+1.25%) ⬆️
python/xorq/expr/ml/fit_lib.py 75.97% <75.00%> (-12.23%) ⬇️
python/xorq/expr/ml/structer.py 67.92% <70.58%> (-19.77%) ⬇️
python/xorq/expr/ml/tests/test_fit_lib.py 23.47% <16.94%> (-76.53%) ⬇️
python/xorq/expr/ml/tests/test_pipeline_lib.py 28.94% <13.43%> (-71.06%) ⬇️
python/xorq/expr/ml/tests/test_structer.py 21.16% <21.16%> (ø)

... and 211 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 23, 2026

Merging this PR will improve performance by ×2.1

⚡ 1 improved benchmark
✅ 3 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
test_into_backend_cache 508.3 ms 245.7 ms ×2.1

Comparing Olamideod:docs/concept-pages (515dfa7) with main (69131a9)

Open in CodSpeed

@letsql
Copy link

letsql bot commented Jan 23, 2026

@letsql
Copy link

letsql bot commented Jan 23, 2026

@letsql
Copy link

letsql bot commented Jan 26, 2026

@letsql
Copy link

letsql bot commented Jan 26, 2026

@Olamideod
Copy link
Collaborator Author


Hybrid computation balances the benefits of both batch and on-demand patterns while introducing its own complexity.

**You gain**:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out for bold text masquerading as headings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


### Benefits:

- Extensibility: Add any logic you need; no waiting for Xorq to implement it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bold here on the terms before the colon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


### Costs:

- Performance: UDFs are slower than built-in operations, typically 2-10x depending on the operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure bold is applied consistently throughout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you set up redirects for all of these.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Your data lives in PostgreSQL, but you need DuckDB's analytical performance for aggregations. Moving data manually between engines wastes time and introduces errors. Xorq's multi-engine execution lets you move data between backends within a single expression using `into_backend()`. This lets you use each engine for operations it performs best without manual data transfers.

## What you'll understand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said elsewhere, if you need a summary like this, then the concept is too long. Make sure concepts are formatted consistently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

[Build system](../reproducibility/build_system.qmd) discusses how `xorq build` generates manifests. [Content-addressed hashing](../reproducibility/content_addressed_hashing.qmd) explains how manifests get unique hashes. [Compute catalog](../reproducibility/compute_catalog.qmd) details how manifests get registered and discovered.



Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of empty lines here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

description: "Understand how the catalog enables discovery, versioning, and reuse of computations"
---

Three developers independently build customer segmentation features without knowing about each other's work. Each developer builds from scratch because they can't discover what others have already created in the team. Content hashes like `a3f5c9d2` sit in build directories where they remain invisible and unusable to other team members. The compute catalog solves this discovery problem by indexing builds with human-readable names, which enables team-wide discovery and reuse of computational work.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intros should be simple and direct, not a meandering story.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

ghoersti and others added 3 commits January 29, 2026 17:46
xorq-labs#1531)

**SUMMARY** 
Adds support for sklearn transformers whose output schema isn't known
until fit time
(OneHotEncoder, TfidfVectorizer, etc.) by using a KV-encoded format
(Array[Struct{key,
   value}]).  Along with some QOL restructuring. 
   
  Key changes:
- Structer: Added needs_target and is_series fields for transformer
metadata
- KVEncoder: Encode/decode between packed KV format and expanded columns
  - Registered: OneHotEncoder, TfidfVectorizer, SelectKBest
- Removed from_fitted_step - logic now in FittedStep._deferred_fit_other
  
  NOTES:
Sometimes we need to know if the transformer needs a target or not ;
select K best this helped let us deprecate `from_fitted_step` , we also
needed to be able to handle a series that is kv encoded to replicate the
behavior of TfidfVectorizer. I think this sets up a cleaner pattern of
registering structer's and routing the deferred function.

---------

Co-authored-by: George Hoersting <ghoersti@Georges-MacBook-Pro.local>
Co-authored-by: Claude <noreply@anthropic.com>
@letsql
Copy link

letsql bot commented Jan 29, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: Validated all 17 concept pages in the documentation for technical correctness

3 participants