docs: populated and validated the concept pages#1533
docs: populated and validated the concept pages#1533Olamideod wants to merge 7 commits intoxorq-labs:mainfrom
Conversation
Merging this PR will improve performance by ×2.1
Performance Changes
Comparing |
…ensure responsive design
|
|
||
| Hybrid computation balances the benefits of both batch and on-demand patterns while introducing its own complexity. | ||
|
|
||
| **You gain**: |
There was a problem hiding this comment.
Watch out for bold text masquerading as headings.
|
|
||
| ### Benefits: | ||
|
|
||
| - Extensibility: Add any logic you need; no waiting for Xorq to implement it. |
There was a problem hiding this comment.
Missing bold here on the terms before the colon.
|
|
||
| ### Costs: | ||
|
|
||
| - Performance: UDFs are slower than built-in operations, typically 2-10x depending on the operation. |
There was a problem hiding this comment.
Make sure bold is applied consistently throughout.
There was a problem hiding this comment.
Make sure you set up redirects for all of these.
|
|
||
| Your data lives in PostgreSQL, but you need DuckDB's analytical performance for aggregations. Moving data manually between engines wastes time and introduces errors. Xorq's multi-engine execution lets you move data between backends within a single expression using `into_backend()`. This lets you use each engine for operations it performs best without manual data transfers. | ||
|
|
||
| ## What you'll understand |
There was a problem hiding this comment.
As I said elsewhere, if you need a summary like this, then the concept is too long. Make sure concepts are formatted consistently.
| [Build system](../reproducibility/build_system.qmd) discusses how `xorq build` generates manifests. [Content-addressed hashing](../reproducibility/content_addressed_hashing.qmd) explains how manifests get unique hashes. [Compute catalog](../reproducibility/compute_catalog.qmd) details how manifests get registered and discovered. | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Lots of empty lines here.
| description: "Understand how the catalog enables discovery, versioning, and reuse of computations" | ||
| --- | ||
|
|
||
| Three developers independently build customer segmentation features without knowing about each other's work. Each developer builds from scratch because they can't discover what others have already created in the team. Content hashes like `a3f5c9d2` sit in build directories where they remain invisible and unusable to other team members. The compute catalog solves this discovery problem by indexing builds with human-readable names, which enables team-wide discovery and reuse of computational work. |
There was a problem hiding this comment.
Intros should be simple and direct, not a meandering story.
xorq-labs#1531) **SUMMARY** Adds support for sklearn transformers whose output schema isn't known until fit time (OneHotEncoder, TfidfVectorizer, etc.) by using a KV-encoded format (Array[Struct{key, value}]). Along with some QOL restructuring. Key changes: - Structer: Added needs_target and is_series fields for transformer metadata - KVEncoder: Encode/decode between packed KV format and expanded columns - Registered: OneHotEncoder, TfidfVectorizer, SelectKBest - Removed from_fitted_step - logic now in FittedStep._deferred_fit_other NOTES: Sometimes we need to know if the transformer needs a target or not ; select K best this helped let us deprecate `from_fitted_step` , we also needed to be able to handle a series that is kv encoded to replicate the behavior of TfidfVectorizer. I think this sets up a cleaner pattern of registering structer's and routing the deferred function. --------- Co-authored-by: George Hoersting <ghoersti@Georges-MacBook-Pro.local> Co-authored-by: Claude <noreply@anthropic.com>
fixes #1532