Skip to content

Conversation

ryan-mangeno
Copy link
Contributor

@ryan-mangeno ryan-mangeno commented Aug 28, 2025

adding support to run granite embedding small, and it primarily pulls the modern bert architecture - https://huggingface.co/ibm-granite/granite-embedding-small-english-r2, currently working on it still, havent figured out the pre-tokenizer type or if I need to impliment it, also for the ubatch size the assert fails in llama-graph.cpp, hacked it to accept ubatch size of 1 for testing, but it seems to keep failing there and not sure why,

if I comment out of the line in llama-graph.cpp

assert(!ubatch.equal_seqs());

then it works

@ryan-mangeno ryan-mangeno marked this pull request as draft August 28, 2025 17:05
@ryan-mangeno
Copy link
Contributor Author

ryan-mangeno commented Aug 28, 2025

@gabe-l-hart thanks in advance :)

@ryan-mangeno
Copy link
Contributor Author

@gabe-l-hart thanks in advance :)

also realizing this a little late haha, but should I be changing all of the modern bert stuff to a granite embedding macro like LLM_ARCH_GRANITE_EMBD or keep it as is

@CISC
Copy link
Collaborator

CISC commented Aug 28, 2025

You may want to check out an earlier attempt at ModernBert in #14014

@gabe-l-hart
Copy link
Collaborator

Thanks for getting this together @ryan-mangeno and thanks for pointing out the previous work @CISC. Ryan, let me know if/when you've looked over that PR and found anything to fix and I'll take a pass at review.

@gabe-l-hart
Copy link
Collaborator

also realizing this a little late haha, but should I be changing all of the modern bert stuff to a granite embedding macro like LLM_ARCH_GRANITE_EMBD or keep it as is

In general, we want to keep things as generic as possible, so since this uses the ModernBertModel architecture from transformers, it's best to keep the implementation here similarly robust unless there's a concrete reason to subset the transformers architecture to just work for granite (eg there's some non-trivial code path in the transformers version that would make sense as a separate architecture).

@github-actions github-actions bot added the python python script changes label Aug 28, 2025
@ryan-mangeno
Copy link
Contributor Author

Thanks for getting this together @ryan-mangeno and thanks for pointing out the previous work @CISC. Ryan, let me know if/when you've looked over that PR and found anything to fix and I'll take a pass at review.

will do

@ryan-mangeno
Copy link
Contributor Author

ryan-mangeno commented Sep 3, 2025

@gabe-l-hart im looking into modern berts research paper, I cant find a mention of symmetric sliding window attention but rather local sliding window attention so I am going to opt to use LLAMA_SWA_TYPE_LOCAL versus LLAMA_SWA_TYPE_SYMMETRIC used in the previous attempt. It also uses global attention every third layer so I am going to implement this stuff and then it should be ready for a review :)

@gabe-l-hart
Copy link
Collaborator

@ryan-mangeno That sounds good! I haven't unpacked any of those mechanics myself, but can try to get into it if you get stuck.

… per previous attempt, added local sliding window attention that alternates every third layer
@ryan-mangeno
Copy link
Contributor Author

@ryan-mangeno That sounds good! I haven't unpacked any of those mechanics myself, but can try to get into it if you get stuck.

ok 👍 , made some changes but not sure if its fully ready yet, I will ping you when I think its ready if thats ok

@ryan-mangeno
Copy link
Contributor Author

ryan-mangeno commented Sep 4, 2025

status update - I found out that modern bert uses an alternating rope method , per https://arxiv.org/pdf/2412.13663

In ModernBERT, every third layer employs global
attention with a RoPE theta of 160,000 and the
remaining layers use a 128 token, local sliding window attention with a RoPE theta of 10,000.

I am currently figuring out how to implement this

ryan-mangeno and others added 19 commits October 10, 2025 11:48
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
LLM_KV_ROPE_DIMENSION_SECTIONS,
LLM_KV_ROPE_FREQ_BASE,
LLM_KV_ROPE_SCALE_LINEAR,
LLM_KV_ROPE_FREQ_BASE_SWA,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Seems like this should be one line up so it's next to LLM_KV_ROPE_FREQ_BASE?

@ryan-mangeno
Copy link
Contributor Author

thanks for the insight and sugestions! I also added support to convert the modern bert base model to gguf

ryan-mangeno and others added 2 commits October 10, 2025 15:14
Co-authored-by: Gabe Goodhart <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
@gabe-l-hart
Copy link
Collaborator

I'm still seeing pretty substantial differences between running this with sentence_transformers and llama-embedding (un-normalized).

Sentence Transformers

from sentence_transformers import SentenceTransformer, util

model_path = "/Users/ghart/models/ibm-granite/granite-embedding-small-english-r2/"
model = SentenceTransformer(model_path)

input_queries = ["hello world"]

embedding = model.encode(input_queries)

print("Embedding shape:", embedding.shape)
print("Embedding vector:", embedding)
output
Embedding shape: (1, 384)
Embedding vector: [[ 4.70217347e-01 -8.18190426e-02 -9.70213771e-01  1.01167999e-01
  -1.64871857e-01 -4.12840217e-01 -2.86905438e-01 -6.37448728e-01
   1.79274410e-01 -1.50260532e+00  3.44578117e-01  7.94308245e-01
  -2.04864680e-03  3.85626018e-01  8.78880858e-01  2.96543926e-01
   4.99059081e-01  7.08115339e-01 -1.19966221e+00  6.35673940e-01
   7.15104043e-01 -9.02864635e-02  9.69102792e-03  3.65380764e-01
  -1.16669536e-01  6.54571116e-01 -1.53889850e-01  4.70741421e-01
  -9.94935691e-01  2.42902517e+00 -5.46438754e-01 -1.15128085e-01
   1.75146405e-02 -1.55478820e-01  2.92682856e-01  6.50800904e-03
   6.00654960e-01  4.67081279e-01  4.28003877e-01  1.84885100e-01
  -7.06958532e-01 -4.94579941e-01 -6.33943200e-01  7.03559741e-02
  -8.40279877e-01 -3.84374112e-01 -3.22316796e-01 -3.20590526e-01
  -1.31958455e-01  4.83759612e-01 -1.66656464e-01 -3.36742699e-01
  -9.10299778e-01  6.32237792e-02  3.42963338e-01 -7.61732638e-01
  -1.11289668e+00  3.88880402e-01 -7.68255070e-02 -6.35363519e-01
  -4.39734221e-01 -8.08298886e-01  5.53129792e-01 -6.34721398e-01
   4.41957325e-01  3.40856850e-01 -1.00600159e+00  9.23246801e-01
  -6.94876462e-02 -1.15989053e+00 -5.65697610e-01  1.13730025e+00
   1.40361202e+00 -3.32728088e-01  1.83640242e-01  9.24801350e-01
  -1.28525987e-01  2.30427593e-01 -7.30663419e-01 -2.51679420e-01
   2.08631605e-01 -3.98196250e-01  6.04916513e-01  6.92479372e-01
  -2.68131286e-01  5.37275791e-01  3.67665470e-01  4.62359250e-01
   5.15259765e-02  2.21476138e-01  1.62627205e-01  7.61412144e-01
   2.02754974e-01  4.61117446e-01 -2.36156415e-02 -2.77169538e-03
   4.10444587e-01 -4.31173712e-01  1.79928131e-02  4.41764712e-01
   8.85414928e-02  4.61687654e-01 -1.28001392e-01  2.03308940e-01
  -3.35421354e-01 -2.92037696e-01 -2.85763294e-01  2.57357419e-01
  -2.23153621e-01 -7.37229213e-02  3.68458658e-01  1.04053891e+00
   3.67554694e-01  3.56946707e-01  2.19433904e-01  5.93946040e-01
  -9.16530907e-01  9.50782597e-02 -2.84147203e-01 -2.66170263e-01
   4.52015027e-02  4.22911465e-01  6.15804315e-01  2.56652296e-01
  -5.34211025e-02 -3.40445042e-01  5.88869929e-01  2.59942144e-01
   5.74475765e-01 -2.14552194e-01  1.62897423e-01 -2.66583890e-01
   4.46479052e-01  8.93478513e-01 -4.89865780e-01 -3.45983088e-01
  -3.96398425e-01  4.21881407e-01  3.36765826e-01 -7.45260060e-01
   4.07101035e-01 -6.36805058e-01 -8.26972246e-01  2.05047861e-01
  -3.50470617e-02 -1.40885189e-01  9.15548578e-02 -1.29825026e-01
   7.14199618e-02  2.89682418e-01 -1.20579541e+00 -2.08854288e-01
   8.75092670e-02  2.79432684e-01  6.04210734e-01 -7.43936479e-01
  -7.73899769e-03  3.66188318e-01 -2.05767155e-01  1.23565328e+00
   5.73606312e-01  1.38267148e-02  2.73462981e-01  2.23422185e-01
   2.74584204e-01  1.02656931e-01  5.79929173e-01  4.17231709e-01
   3.02815944e-01  9.82625663e-01 -4.77541983e-01 -6.20512724e-01
   1.02436340e+00 -2.99752802e-01  1.88212663e-01 -4.94859040e-01
  -7.43159801e-02 -3.79382312e-01 -1.37864441e-01 -6.59950435e-01
   5.92544913e-01  6.23431921e-01 -4.77254480e-01  7.21439600e-01
  -8.56234729e-02 -8.18266928e-01  8.12745035e-01 -6.46160841e-01
   2.37971857e-01 -8.71692747e-02 -6.61815181e-02  8.86008620e-01
   2.15730108e-02  2.54340202e-01  7.17126250e-01  8.28052998e-01
  -1.14735365e-01  6.58829883e-02 -4.53476220e-01  6.95713162e-01
   2.87627801e-02  2.07480982e-01  3.21141034e-01 -5.60628533e-01
  -7.38990128e-01  3.20374638e-01 -3.72619092e-01  6.21285498e-01
  -5.61533749e-01  1.60751015e-01 -1.16645765e+00  2.06903487e-01
   1.02478206e+00  2.59608388e-01  5.12377501e-01 -1.21118881e-01
  -3.32327694e-01  4.41375494e-01  1.50044489e+00  5.56774974e-01
   1.67454869e-01  8.56810272e-01  6.48858249e-01 -8.13213289e-02
   4.57825482e-01  4.47174162e-01  1.95620254e-01  1.33322990e+00
  -2.41843641e-01 -1.16514385e+00 -3.62642229e-01  2.48650134e-01
   2.89881289e-01 -7.06838608e-01  3.76762271e-01 -2.41720259e-01
   1.61170110e-01 -7.89689049e-02 -1.70940578e-01 -2.15672906e-02
  -2.00075358e-01 -3.42099279e-01 -3.21283549e-01 -6.17988467e-01
   3.46758693e-01  9.28846002e-01 -1.29008755e-01  4.24487799e-01
   1.31246936e+00  1.19158238e-01  2.71307439e-01 -3.50256622e-01
   9.45588887e-01  1.54796913e-02  2.11319983e-01 -7.15448141e-01
  -1.15448117e+00  4.09550756e-01  7.33001903e-02 -7.16579318e-01
   6.24063194e-01  1.23233646e-01  2.12979992e-03  1.66028857e-01
   6.60794020e-01 -6.12196214e-02  3.51977050e-01 -1.20021403e+00
   1.56199694e+00  1.00744732e-01 -6.76441312e-01 -2.50055814e+00
   6.25652790e-01 -3.84571552e-01 -7.38006651e-01  9.49253067e-02
  -5.76953769e-01 -1.83812201e-01 -6.17856324e-01 -9.51130509e-01
   1.04269147e+00 -5.99453688e-01 -5.65711677e-01  2.30075777e-01
  -4.75055695e-01 -3.12701970e-01 -7.46210292e-02  7.24491835e-01
  -5.26384830e-01  7.73138165e-01 -1.53636843e-01  4.14805532e-01
   5.38851142e-01 -6.94997847e-01  1.79310352e-01 -9.93503213e-01
   4.46250848e-03 -8.43115628e-01  7.18722701e-01  2.67550111e-01
   1.59590449e-02 -6.12485111e-01  2.08336487e-01  3.70664537e-01
   9.94356155e-01 -4.52617228e-01  4.12353307e-01 -4.99993823e-02
   5.08870780e-01  9.13051367e-01 -3.69917184e-01 -3.08827847e-01
   1.80921048e-01  7.39191949e-01  4.40140873e-01 -6.92108572e-01
   2.20897928e-01  2.01410726e-01  7.72723556e-01 -1.03418577e+00
   4.20970887e-01  5.63116789e-01 -4.27787304e-01  3.84752065e-01
   1.89694867e-01  3.51506561e-01  1.85917273e-01  6.32850826e-01
  -4.85721290e-01 -6.36848271e-01  8.13779607e-02  5.39939217e-02
   8.01548883e-02  9.14924681e-01  7.74841368e-01 -9.94550884e-01
  -1.09117758e+00  2.75209993e-01  6.15678966e-01 -5.23275375e-01
   2.50375420e-01 -7.61688054e-01 -6.32449269e-01  4.58455920e-01
   4.80363891e-02 -8.20960850e-02  6.78908825e-01  3.29921871e-01
   5.47994852e-01  4.07231897e-01  7.14529812e-01 -1.31943989e+01
  -4.01630402e-01 -1.55762151e-01 -4.53522056e-01  2.45740965e-01
  -5.53757846e-01 -5.75390935e-01 -6.75122976e-01  8.75747204e-01
   3.36597413e-01 -3.19431037e-01 -1.75900921e-01 -7.52481461e-01
   1.61719930e+00  2.27893218e-01 -1.76411793e-01 -6.23723745e-01
   2.58633085e-02  5.65889142e-02 -2.84139723e-01  1.31950662e-01
   3.86155665e-01 -2.80776441e-01  2.20678419e-01 -7.70538807e-01
  -4.59486216e-01 -4.71884459e-01  1.45590639e+00  5.20368278e-01
  -1.15137987e-01 -3.55154455e-01 -7.28997350e-01 -2.06172019e-01]]

llama-embedding

./bin/llama-embedding -m ~/models/ibm-granite/granite-embedding-small-english-r2/granite-embedding-small-english-r2-F16.gguf -p "hello world" --temp 0.0 --embd-normalize -1
output
embedding 0:  0.568249 -1.102865 -1.088414  0.628141  0.435673  0.140165 -0.825630  0.569395 -1.355579  1.296733 -0.188991 -1.332968 -1.659819 -0.163685  2.019061 -0.763612  0.316477  0.540092 -0.483941 -0.601037  0.849754 -0.350578  0.117136 -0.163301 -0.740609  0.434784 -0.424452  0.989892 -0.005708  2.659041  0.093947 -0.455799 -1.407754  0.266642  0.340321  0.295892 -2.029016  0.670465 -0.155738 -0.366403  0.396539 -0.446674 -1.563189 -1.351764 -1.468906 -0.305278 -0.160654 -0.739566  0.347783 -0.548403 -0.841011 -0.803285 -1.917000  0.544958 -0.307814 -0.942326  0.318404  1.446683 -1.553440  1.960882 -0.724775 -0.249797 -1.344297  0.500228  2.130406  0.100833  0.866113 -0.956398  0.331393 -1.106812 -0.592947 -0.681050 -0.387320  0.204928  1.380802  1.337411  0.667894 -0.364470 -0.100715  0.372153  1.023228  0.309065 -0.342585  0.983340  0.735965  1.227278 -0.760515 -1.277397 -0.959252  0.862863  0.828302  1.035999 -0.797216  0.544344  0.099130  0.063429  0.195235 -0.050321 -0.196032 -0.551251 -0.446113 -0.089210  0.156701  0.854473 -0.029933  0.820748  1.247807  0.701443  0.538937  0.826270  0.831927 -1.148859 -0.770648  0.093597  0.732745  0.868883  0.898995  0.513416 -2.463722 -0.721097 -0.043873 -0.186638  0.459600  1.057895  0.982080 -0.477963 -0.255301 -1.667569 -0.182434 -1.198569 -1.681092 -1.404461  0.052894  2.658105  0.594473 -1.521481  1.055344 -0.046829  0.815539 -0.846015 -0.080340  0.736601  0.790374 -0.433596 -0.190964 -0.032811  0.721297 -1.105219 -0.403508  0.438565 -0.572751  0.401420  0.260073 -1.272813  0.026109  0.495126  0.082491  0.543874 -1.245073 -1.376684  1.218533  0.222838 -0.080038  1.106308 -1.632745 -0.053139  0.334172  0.876374 -1.147309  0.905257  0.825852  0.510852 -1.356259 -2.032928  0.302841 -2.057141 -0.628268 -0.047453 -0.607855 -0.791247  0.628165  0.597724  0.411670  0.544401 -1.126730 -1.688581 -0.811954  0.754893 -0.634977 -0.573494  0.587990  0.357522  0.296951 -0.683505  0.527630  0.909412 -0.628533 -0.000679 -0.463175 -0.976989 -0.076420  1.076263  0.151658 -1.406827  0.065899 -0.765301 -0.711156  0.164907  0.157327  0.811254 -1.395452 -0.794318 -0.010010  1.250279  0.839933 -2.987955  2.251887 -0.961484  0.349363  0.014319 -0.117970 -0.752395 -0.102783 -0.273292 -0.031080 -0.921506  0.035557  0.368253 -0.923831 -0.013870 -1.010777  0.819993  0.879220 -0.007424  0.963922 -0.585020 -0.945013 -0.724626  0.508109  0.377335 -0.811757 -1.625339  1.935595 -1.635783 -0.197971  0.821073  1.490920  0.165050  1.353740  1.401306  2.303133 -0.531407  1.161280  0.430977 -0.344598 -0.045762 -1.575037 -0.066762  1.723299  0.478923  1.922584  0.701813  0.312897  0.134588  1.031139  0.164750 -1.103339 -1.569718  0.411553 -0.592738 -1.515197 -2.767313 -1.293968 -0.460595 -0.512960 -0.985016 -1.433908 -0.116418  0.518241 -2.162348 -0.056288 -0.732991 -0.148847 -0.553713  1.583351 -2.165025  1.006617 -1.449021 -0.293910  1.405685  1.267641  0.924248  0.550291 -1.313447 -0.503694  0.024458  0.912898  1.298810  0.869915 -0.397105  0.973694 -0.293642 -0.580548  0.932416 -2.108671 -0.143023  1.116756  0.804878 -1.930016  0.921108  0.574931 -0.234913  0.098448 -0.912042  0.411021 -1.173502  1.647756  0.590936  1.181889 -0.105523  0.325184 -0.145099 -0.003637  0.079033 -0.504248 -0.767999  0.323732  1.515833 -2.411234 -1.132179 -1.619978  0.735111  0.644118  1.450703  0.254557 -0.192990 -0.405021  0.531369  0.253399 -0.182248 -0.424743  0.100196  0.103259  0.889191  1.233283  1.204769  3.173801  0.138994 -0.448429  2.016135 -0.395702 -1.671610  0.452191  0.079507  0.922235  2.345280 -0.273261 -0.374734 -0.072647  0.982375  0.313425  1.451827 -0.048904 -0.729561  3.021303  0.635819  0.298551  0.637383  0.184186 -0.496591 -0.374360  2.396369  0.629135 -0.939463  0.089835 -0.417336 -1.209866  0.800044 -0.207582 -0.733470  0.093954  1.536019 -0.234691  0.988711 

Comment on lines +27 to +34
LLM_TYPE_47M,
LLM_TYPE_60M,
LLM_TYPE_70M,
LLM_TYPE_80M,
LLM_TYPE_109M,
LLM_TYPE_137M,
LLM_TYPE_140M,
LLM_TYPE_149M,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the descriptions too:

case LLM_TYPE_33M: return "33M";
case LLM_TYPE_60M: return "60M";
case LLM_TYPE_70M: return "70M";
case LLM_TYPE_80M: return "80M";
case LLM_TYPE_109M: return "109M";
case LLM_TYPE_137M: return "137M";
case LLM_TYPE_140M: return "140M";
case LLM_TYPE_160M: return "160M";

Comment on lines +8979 to +8987
# rename custom "head" layers to standard bert "cls.predictions" names for compatibility
if name == "head.norm.weight":
name = "cls.predictions.transform.LayerNorm.weight"
elif name == "head.norm.bias":
name = "cls.predictions.transform.LayerNorm.bias"
elif name == "head.dense.weight":
name = "cls.predictions.transform.dense.weight"
elif name == "head.dense.bias":
name = "cls.predictions.transform.dense.bias"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to commit the mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had made support for granite small embedding and it was using the modern arch under the hood

@ryan-mangeno
Copy link
Contributor Author

I'm still seeing pretty substantial differences between running this with sentence_transformers and llama-embedding (un-normalized).

Sentence Transformers

from sentence_transformers import SentenceTransformer, util

model_path = "/Users/ghart/models/ibm-granite/granite-embedding-small-english-r2/"
model = SentenceTransformer(model_path)

input_queries = ["hello world"]

embedding = model.encode(input_queries)

print("Embedding shape:", embedding.shape)
print("Embedding vector:", embedding)

output

Embedding shape: (1, 384)
Embedding vector: [[ 4.70217347e-01 -8.18190426e-02 -9.70213771e-01  1.01167999e-01
  -1.64871857e-01 -4.12840217e-01 -2.86905438e-01 -6.37448728e-01
   1.79274410e-01 -1.50260532e+00  3.44578117e-01  7.94308245e-01
  -2.04864680e-03  3.85626018e-01  8.78880858e-01  2.96543926e-01
   4.99059081e-01  7.08115339e-01 -1.19966221e+00  6.35673940e-01
   7.15104043e-01 -9.02864635e-02  9.69102792e-03  3.65380764e-01
  -1.16669536e-01  6.54571116e-01 -1.53889850e-01  4.70741421e-01
  -9.94935691e-01  2.42902517e+00 -5.46438754e-01 -1.15128085e-01
   1.75146405e-02 -1.55478820e-01  2.92682856e-01  6.50800904e-03
   6.00654960e-01  4.67081279e-01  4.28003877e-01  1.84885100e-01
  -7.06958532e-01 -4.94579941e-01 -6.33943200e-01  7.03559741e-02
  -8.40279877e-01 -3.84374112e-01 -3.22316796e-01 -3.20590526e-01
  -1.31958455e-01  4.83759612e-01 -1.66656464e-01 -3.36742699e-01
  -9.10299778e-01  6.32237792e-02  3.42963338e-01 -7.61732638e-01
  -1.11289668e+00  3.88880402e-01 -7.68255070e-02 -6.35363519e-01
  -4.39734221e-01 -8.08298886e-01  5.53129792e-01 -6.34721398e-01
   4.41957325e-01  3.40856850e-01 -1.00600159e+00  9.23246801e-01
  -6.94876462e-02 -1.15989053e+00 -5.65697610e-01  1.13730025e+00
   1.40361202e+00 -3.32728088e-01  1.83640242e-01  9.24801350e-01
  -1.28525987e-01  2.30427593e-01 -7.30663419e-01 -2.51679420e-01
   2.08631605e-01 -3.98196250e-01  6.04916513e-01  6.92479372e-01
  -2.68131286e-01  5.37275791e-01  3.67665470e-01  4.62359250e-01
   5.15259765e-02  2.21476138e-01  1.62627205e-01  7.61412144e-01
   2.02754974e-01  4.61117446e-01 -2.36156415e-02 -2.77169538e-03
   4.10444587e-01 -4.31173712e-01  1.79928131e-02  4.41764712e-01
   8.85414928e-02  4.61687654e-01 -1.28001392e-01  2.03308940e-01
  -3.35421354e-01 -2.92037696e-01 -2.85763294e-01  2.57357419e-01
  -2.23153621e-01 -7.37229213e-02  3.68458658e-01  1.04053891e+00
   3.67554694e-01  3.56946707e-01  2.19433904e-01  5.93946040e-01
  -9.16530907e-01  9.50782597e-02 -2.84147203e-01 -2.66170263e-01
   4.52015027e-02  4.22911465e-01  6.15804315e-01  2.56652296e-01
  -5.34211025e-02 -3.40445042e-01  5.88869929e-01  2.59942144e-01
   5.74475765e-01 -2.14552194e-01  1.62897423e-01 -2.66583890e-01
   4.46479052e-01  8.93478513e-01 -4.89865780e-01 -3.45983088e-01
  -3.96398425e-01  4.21881407e-01  3.36765826e-01 -7.45260060e-01
   4.07101035e-01 -6.36805058e-01 -8.26972246e-01  2.05047861e-01
  -3.50470617e-02 -1.40885189e-01  9.15548578e-02 -1.29825026e-01
   7.14199618e-02  2.89682418e-01 -1.20579541e+00 -2.08854288e-01
   8.75092670e-02  2.79432684e-01  6.04210734e-01 -7.43936479e-01
  -7.73899769e-03  3.66188318e-01 -2.05767155e-01  1.23565328e+00
   5.73606312e-01  1.38267148e-02  2.73462981e-01  2.23422185e-01
   2.74584204e-01  1.02656931e-01  5.79929173e-01  4.17231709e-01
   3.02815944e-01  9.82625663e-01 -4.77541983e-01 -6.20512724e-01
   1.02436340e+00 -2.99752802e-01  1.88212663e-01 -4.94859040e-01
  -7.43159801e-02 -3.79382312e-01 -1.37864441e-01 -6.59950435e-01
   5.92544913e-01  6.23431921e-01 -4.77254480e-01  7.21439600e-01
  -8.56234729e-02 -8.18266928e-01  8.12745035e-01 -6.46160841e-01
   2.37971857e-01 -8.71692747e-02 -6.61815181e-02  8.86008620e-01
   2.15730108e-02  2.54340202e-01  7.17126250e-01  8.28052998e-01
  -1.14735365e-01  6.58829883e-02 -4.53476220e-01  6.95713162e-01
   2.87627801e-02  2.07480982e-01  3.21141034e-01 -5.60628533e-01
  -7.38990128e-01  3.20374638e-01 -3.72619092e-01  6.21285498e-01
  -5.61533749e-01  1.60751015e-01 -1.16645765e+00  2.06903487e-01
   1.02478206e+00  2.59608388e-01  5.12377501e-01 -1.21118881e-01
  -3.32327694e-01  4.41375494e-01  1.50044489e+00  5.56774974e-01
   1.67454869e-01  8.56810272e-01  6.48858249e-01 -8.13213289e-02
   4.57825482e-01  4.47174162e-01  1.95620254e-01  1.33322990e+00
  -2.41843641e-01 -1.16514385e+00 -3.62642229e-01  2.48650134e-01
   2.89881289e-01 -7.06838608e-01  3.76762271e-01 -2.41720259e-01
   1.61170110e-01 -7.89689049e-02 -1.70940578e-01 -2.15672906e-02
  -2.00075358e-01 -3.42099279e-01 -3.21283549e-01 -6.17988467e-01
   3.46758693e-01  9.28846002e-01 -1.29008755e-01  4.24487799e-01
   1.31246936e+00  1.19158238e-01  2.71307439e-01 -3.50256622e-01
   9.45588887e-01  1.54796913e-02  2.11319983e-01 -7.15448141e-01
  -1.15448117e+00  4.09550756e-01  7.33001903e-02 -7.16579318e-01
   6.24063194e-01  1.23233646e-01  2.12979992e-03  1.66028857e-01
   6.60794020e-01 -6.12196214e-02  3.51977050e-01 -1.20021403e+00
   1.56199694e+00  1.00744732e-01 -6.76441312e-01 -2.50055814e+00
   6.25652790e-01 -3.84571552e-01 -7.38006651e-01  9.49253067e-02
  -5.76953769e-01 -1.83812201e-01 -6.17856324e-01 -9.51130509e-01
   1.04269147e+00 -5.99453688e-01 -5.65711677e-01  2.30075777e-01
  -4.75055695e-01 -3.12701970e-01 -7.46210292e-02  7.24491835e-01
  -5.26384830e-01  7.73138165e-01 -1.53636843e-01  4.14805532e-01
   5.38851142e-01 -6.94997847e-01  1.79310352e-01 -9.93503213e-01
   4.46250848e-03 -8.43115628e-01  7.18722701e-01  2.67550111e-01
   1.59590449e-02 -6.12485111e-01  2.08336487e-01  3.70664537e-01
   9.94356155e-01 -4.52617228e-01  4.12353307e-01 -4.99993823e-02
   5.08870780e-01  9.13051367e-01 -3.69917184e-01 -3.08827847e-01
   1.80921048e-01  7.39191949e-01  4.40140873e-01 -6.92108572e-01
   2.20897928e-01  2.01410726e-01  7.72723556e-01 -1.03418577e+00
   4.20970887e-01  5.63116789e-01 -4.27787304e-01  3.84752065e-01
   1.89694867e-01  3.51506561e-01  1.85917273e-01  6.32850826e-01
  -4.85721290e-01 -6.36848271e-01  8.13779607e-02  5.39939217e-02
   8.01548883e-02  9.14924681e-01  7.74841368e-01 -9.94550884e-01
  -1.09117758e+00  2.75209993e-01  6.15678966e-01 -5.23275375e-01
   2.50375420e-01 -7.61688054e-01 -6.32449269e-01  4.58455920e-01
   4.80363891e-02 -8.20960850e-02  6.78908825e-01  3.29921871e-01
   5.47994852e-01  4.07231897e-01  7.14529812e-01 -1.31943989e+01
  -4.01630402e-01 -1.55762151e-01 -4.53522056e-01  2.45740965e-01
  -5.53757846e-01 -5.75390935e-01 -6.75122976e-01  8.75747204e-01
   3.36597413e-01 -3.19431037e-01 -1.75900921e-01 -7.52481461e-01
   1.61719930e+00  2.27893218e-01 -1.76411793e-01 -6.23723745e-01
   2.58633085e-02  5.65889142e-02 -2.84139723e-01  1.31950662e-01
   3.86155665e-01 -2.80776441e-01  2.20678419e-01 -7.70538807e-01
  -4.59486216e-01 -4.71884459e-01  1.45590639e+00  5.20368278e-01
  -1.15137987e-01 -3.55154455e-01 -7.28997350e-01 -2.06172019e-01]]

llama-embedding

./bin/llama-embedding -m ~/models/ibm-granite/granite-embedding-small-english-r2/granite-embedding-small-english-r2-F16.gguf -p "hello world" --temp 0.0 --embd-normalize -1

output

embedding 0:  0.568249 -1.102865 -1.088414  0.628141  0.435673  0.140165 -0.825630  0.569395 -1.355579  1.296733 -0.188991 -1.332968 -1.659819 -0.163685  2.019061 -0.763612  0.316477  0.540092 -0.483941 -0.601037  0.849754 -0.350578  0.117136 -0.163301 -0.740609  0.434784 -0.424452  0.989892 -0.005708  2.659041  0.093947 -0.455799 -1.407754  0.266642  0.340321  0.295892 -2.029016  0.670465 -0.155738 -0.366403  0.396539 -0.446674 -1.563189 -1.351764 -1.468906 -0.305278 -0.160654 -0.739566  0.347783 -0.548403 -0.841011 -0.803285 -1.917000  0.544958 -0.307814 -0.942326  0.318404  1.446683 -1.553440  1.960882 -0.724775 -0.249797 -1.344297  0.500228  2.130406  0.100833  0.866113 -0.956398  0.331393 -1.106812 -0.592947 -0.681050 -0.387320  0.204928  1.380802  1.337411  0.667894 -0.364470 -0.100715  0.372153  1.023228  0.309065 -0.342585  0.983340  0.735965  1.227278 -0.760515 -1.277397 -0.959252  0.862863  0.828302  1.035999 -0.797216  0.544344  0.099130  0.063429  0.195235 -0.050321 -0.196032 -0.551251 -0.446113 -0.089210  0.156701  0.854473 -0.029933  0.820748  1.247807  0.701443  0.538937  0.826270  0.831927 -1.148859 -0.770648  0.093597  0.732745  0.868883  0.898995  0.513416 -2.463722 -0.721097 -0.043873 -0.186638  0.459600  1.057895  0.982080 -0.477963 -0.255301 -1.667569 -0.182434 -1.198569 -1.681092 -1.404461  0.052894  2.658105  0.594473 -1.521481  1.055344 -0.046829  0.815539 -0.846015 -0.080340  0.736601  0.790374 -0.433596 -0.190964 -0.032811  0.721297 -1.105219 -0.403508  0.438565 -0.572751  0.401420  0.260073 -1.272813  0.026109  0.495126  0.082491  0.543874 -1.245073 -1.376684  1.218533  0.222838 -0.080038  1.106308 -1.632745 -0.053139  0.334172  0.876374 -1.147309  0.905257  0.825852  0.510852 -1.356259 -2.032928  0.302841 -2.057141 -0.628268 -0.047453 -0.607855 -0.791247  0.628165  0.597724  0.411670  0.544401 -1.126730 -1.688581 -0.811954  0.754893 -0.634977 -0.573494  0.587990  0.357522  0.296951 -0.683505  0.527630  0.909412 -0.628533 -0.000679 -0.463175 -0.976989 -0.076420  1.076263  0.151658 -1.406827  0.065899 -0.765301 -0.711156  0.164907  0.157327  0.811254 -1.395452 -0.794318 -0.010010  1.250279  0.839933 -2.987955  2.251887 -0.961484  0.349363  0.014319 -0.117970 -0.752395 -0.102783 -0.273292 -0.031080 -0.921506  0.035557  0.368253 -0.923831 -0.013870 -1.010777  0.819993  0.879220 -0.007424  0.963922 -0.585020 -0.945013 -0.724626  0.508109  0.377335 -0.811757 -1.625339  1.935595 -1.635783 -0.197971  0.821073  1.490920  0.165050  1.353740  1.401306  2.303133 -0.531407  1.161280  0.430977 -0.344598 -0.045762 -1.575037 -0.066762  1.723299  0.478923  1.922584  0.701813  0.312897  0.134588  1.031139  0.164750 -1.103339 -1.569718  0.411553 -0.592738 -1.515197 -2.767313 -1.293968 -0.460595 -0.512960 -0.985016 -1.433908 -0.116418  0.518241 -2.162348 -0.056288 -0.732991 -0.148847 -0.553713  1.583351 -2.165025  1.006617 -1.449021 -0.293910  1.405685  1.267641  0.924248  0.550291 -1.313447 -0.503694  0.024458  0.912898  1.298810  0.869915 -0.397105  0.973694 -0.293642 -0.580548  0.932416 -2.108671 -0.143023  1.116756  0.804878 -1.930016  0.921108  0.574931 -0.234913  0.098448 -0.912042  0.411021 -1.173502  1.647756  0.590936  1.181889 -0.105523  0.325184 -0.145099 -0.003637  0.079033 -0.504248 -0.767999  0.323732  1.515833 -2.411234 -1.132179 -1.619978  0.735111  0.644118  1.450703  0.254557 -0.192990 -0.405021  0.531369  0.253399 -0.182248 -0.424743  0.100196  0.103259  0.889191  1.233283  1.204769  3.173801  0.138994 -0.448429  2.016135 -0.395702 -1.671610  0.452191  0.079507  0.922235  2.345280 -0.273261 -0.374734 -0.072647  0.982375  0.313425  1.451827 -0.048904 -0.729561  3.021303  0.635819  0.298551  0.637383  0.184186 -0.496591 -0.374360  2.396369  0.629135 -0.939463  0.089835 -0.417336 -1.209866  0.800044 -0.207582 -0.733470  0.093954  1.536019 -0.234691  0.988711 

Yeah I was getting differences but wasnt sure if it can be accredited to in the graph build

 auto * inp_attn = build_attn_inp_kv_iswa(); // TODO: support cacheless iSWA embeddings [TAG_NO_CACHE_ISWA]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants