Skip to content

Commit 88b999c

Browse files
committed
Updated "The Annotated nanoTabPFN"
1 parent cd5fdb8 commit 88b999c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

annotated/annotated nanoTabPFN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,8 +164,8 @@ Through in-context learning, transformers can:
164164

165165
### **Scalable Parallelization**
166166
Unlike sequential models, transformers offer:
167-
- Parallelization across sequence length during training
168-
- Efficient batch processing on modern hardware (GPUs)
167+
- One-step computation across the full sequence: self-attention eliminates sequential dependencies, enabling all token–token interactions to be computed in a single set of matrix multiplications
168+
- Full exploitation of GPU parallelism: large batched matrix multiplications allow efficient use of modern hardware, with parallelism across batches and attention heads
169169

170170
These advantages make transformers well-suited for foundation models for diverse tabular datasets without task-specific modifications. While challenges remain - particularly the quadratic complexity for large datasets - the flexibility, and expressiveness make transformers the architecture of choice for tabular foundation models. It is important to note that TabPFN uses only the transformer encoder because tabular prediction is a task where we need to classify/regress all test samples simultaneously based on the provided context, not generate outputs sequentially like in language generation. The "decoder" in TabPFN is simply a MLP that maps the enriched target embeddings from the transformer encoder to final predictions - it's not a transformer decoder at all. This design mirrors architectures where transformer encoders extract rich representations that are then passed through task-specific heads, rather than GPT-style decoders that generate tokens autoregressively.
171171

0 commit comments

Comments
 (0)