[Bugfix] Make Gelu Activations consistent across frameworks #753

vrdn-23 · 2025-11-06T01:02:29Z

What does this PR do?

This PR fixes a consistency issue with how TEI handles GeLU activation compared to the transformers library and the candle library.

It seems that the value gelu is meant to serialize to an old incorrect version of how GeLU activation (based on the comment given here) was implemented based on this code snippet in transformers.

ACT2CLS = {
    "gelu": GELUActivation,
    "gelu_10": (ClippedGELUActivation, {"min": -10, "max": 10}),
    "gelu_fast": FastGELUActivation,
    "gelu_new": NewGELUActivation,
    "gelu_python": (GELUActivation, {"use_gelu_python": True}),
    "gelu_pytorch_tanh": GELUTanh,
    "gelu_python_tanh": (GELUTanh, {"use_gelu_tanh_python": True}),
    "gelu_accurate": AccurateGELUActivation,
    "laplace": LaplaceActivation,
    "leaky_relu": nn.LeakyReLU,
    "linear": LinearActivation,
    "mish": MishActivation,
    "quick_gelu": QuickGELUActivation,
    "relu": nn.ReLU,
    "relu2": ReLUSquaredActivation,
    "relu6": nn.ReLU6,
...

This means that any config that uses the value gelu for the hidden_activation using the GeluActivation function which uses the torch.erf function. The new GeLU activation is referenced using new_gelu or gelu_pytorch_tanh.

This behavior is also what is followed by the huggingface/candle repository here (gelu corresponds to xs.gelu_erf() and not xs.gelu())

This PR brings the TEI implementation in line with how transformers parses the config.json values and how candle resolves activations.

I came across this inconsistency while I was reviewing some of the code changes I had in #746, but thought this should be opened as a separate PR, given that it will slight vary (re: correct) existing model behavior. (h/t to @bbaldino for pointing this out to me)

Please do let me know if I'm missing something obvious here as to why TEI is not in-sync with how the activation functions are defined. My understanding is that this is just a bug that got carried over from legacy code that was introduced in #41

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

@Narsil OR @alvarobartt OR @kozistr

vrdn-23 · 2025-11-06T01:44:28Z

Okay so after some more digging, it seems one of the main reasons to not change this would be speed of gelu_erf() compared to gelu(). I was digging through the candle repository and saw some relevant issues here. I can run some benchmarks later this week to see if there is still a performance loss.

huggingface/candle#1062
huggingface/candle#2418
huggingface/candle#1926 (comment)

vrdn-23 · 2025-11-10T19:30:44Z

I was able to address the issue with latency by raising this PR. huggingface/candle#3168
So hopefully we can make this change without any loss in performance now!

kozistr · 2025-11-11T12:59:45Z

Wow awesome work huggingface/candle#3168 @vrdn-23!

I thought the compiler uses constant propagation, so sqrt(1/2) part doesn't actually execute at every call.

I roughly assume that TEI uses the approximate gelu (~= new gelu) for faster inference, while there's a marginal difference between the variants.

We could benchmark the speed by updating the candle-nn version to the latest commit, which includes your candle-nn PR, here!

One minor point for discussion is that, if the latency stays comparable, it might make sense to keep this implementation.

Anyway, looks great to me!

I’d appreciate any feedback or thoughts you have!

vrdn-23 · 2025-12-03T11:39:21Z

Sorry for the late response, but I was away for Thanksgiving break @kozistr !

I roughly assume that TEI uses the approximate gelu (~= new gelu) for faster inference, while there's a marginal difference between the variants.

I think that might have been true previously, but I think now that huggingface/candle#3168 has been merged, the gelu_erf (old gelu) implementation is actually faster than the new one. It would also be functionally most similar in producing outputs with the existing models, so since it seems to be a win in terms of both quality and latency, I would argue that maybe we stick to the consistent implementation across frameworks.

Would love to hear your thoughts @alvarobartt @Narsil

This commit adapts text-embeddings-inference for NVIDIA Jetson Orin (SM87) and L4 GPU (SM89), and integrates valuable community PRs. Changes: 1. SM87/SM89 CUDA Support - Added compute capability 8.7 and 8.9 support - Modified Dockerfile-cuda-all for multi-arch builds - Updated compute_cap.rs for SM87/89 detection Files: Dockerfile-cuda-all, cuda-all-entrypoint.sh, compute_cap.rs 2. PR huggingface#730: Qwen3 Reranker Support - Added classification head for Qwen3 reranking - Implemented template formatting system for chat-based reranking Files: models/qwen3.rs, core/templates.rs, core/lib.rs 3. PR huggingface#787: Batch Notification Performance Optimization - Implemented AtomicUsize counter for batch processing - Reduced unnecessary notify_one() calls - Only last request in batch triggers thread notification Files: core/infer.rs, router/http/server.rs, router/grpc/server.rs 4. PR huggingface#753: GeLU Activation Consistency Fix - Changed Gelu from approximate (gelu) to exact (gelu_erf) - Added NewGelu variant for backward compatibility Files: layers/linear.rs 5. PR huggingface#790: StaticEmbedding Model Support - Added support for 0_StaticEmbedding/ directory structure - Implemented fallback loading for model weights and tokenizer - Default to Mean pooling for StaticEmbedding models Files: models/static_embedding.rs (new), lib.rs, download.rs, router/lib.rs 6. PR huggingface#746: DebertaV2 Sequence Classification Support - Complete DebertaV2 model implementation - Support for sequence classification tasks (e.g., Llama Prompt Guard) - CPU and CUDA device support Files: models/debertav2.rs (new), lib.rs, models/mod.rs All changes have been tested and compile successfully with: cargo check --all-targets Compilation verified with CUDA support: cargo install --path router -F candle-cuda Target Hardware: NVIDIA Jetson Orin AGX (SM87), L4 GPU (SM89) Date: January 5, 2026

vrdn-23 · 2026-01-06T15:20:05Z

Just wanted to add a note that #784 should probably be merged in before this (if accepted), so that the speed-up gained by huggingface/candle#3168 can be utilized

vrdn-23 added 3 commits November 5, 2025 16:37

[Bugfix] Make Gelu Activations consistent across frameworks

a3f6719

simplify a bit further

aae4307

style change

2ca266c

vrdn-23 mentioned this pull request Nov 6, 2025

candle tensor operations are bit slower than pytorch tensor operations huggingface/candle#1926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Make Gelu Activations consistent across frameworks #753

[Bugfix] Make Gelu Activations consistent across frameworks #753

Uh oh!

vrdn-23 commented Nov 6, 2025 •

edited

Loading

Uh oh!

vrdn-23 commented Nov 6, 2025

Uh oh!

vrdn-23 commented Nov 10, 2025

Uh oh!

kozistr commented Nov 11, 2025

Uh oh!

vrdn-23 commented Dec 3, 2025

Uh oh!

vrdn-23 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bugfix] Make Gelu Activations consistent across frameworks #753

Are you sure you want to change the base?

[Bugfix] Make Gelu Activations consistent across frameworks #753

Uh oh!

Conversation

vrdn-23 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

vrdn-23 commented Nov 6, 2025

Uh oh!

vrdn-23 commented Nov 10, 2025

Uh oh!

kozistr commented Nov 11, 2025

Uh oh!

vrdn-23 commented Dec 3, 2025

Uh oh!

vrdn-23 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vrdn-23 commented Nov 6, 2025 •

edited

Loading