Skip to content

Conversation

OneZero-Y
Copy link
Contributor

What type of PR is this?
support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

What this PR does / why we need it:

1. Qwen3-Embedding-0.6B

Model Specifications:

  • Architecture: 28 layers, 1024 hidden size, 16 attention heads, 8 KV heads (GQA)
  • Context Length: 32,768 tokens
  • Embedding Dimensions: [1024, 512, 256, 128] (Matryoshka)
  • Pooling Strategy: Last-token pooling with L2 normalization
  • Activation: SwiGLU
  • Position Encoding: RoPE (θ=1,000,000)

Key Features:

  • Grouped Query Attention (GQA) for efficient long-context processing
  • Left-padding strategy optimized for last-token pooling
  • High-quality embeddings for retrieval and semantic search

2. EmbeddingGemma-300M

Model Specifications:

  • Architecture: 14 layers, 768 hidden size, 3 query heads, 1 KV head (MQA)
  • Context Length: 8,192 tokens
  • Embedding Dimensions: [768, 512, 256, 128] (Matryoshka)
  • Pooling Strategy: Mean pooling with L2 normalization
  • Dense Bottleneck: 768 → 3072 → 768 for quality enhancement
  • Activation: GeGLU
  • Attention: Hybrid sliding window (4K) + full attention
  • Position Encoding: Global RoPE (θ=1,000,000) + Local RoPE (θ=10,000)

Key Features:

  • Multi-Query Attention (MQA) for low-latency inference
  • Dense bottleneck layer for improved embedding quality
  • Right-padding strategy optimized for mean pooling

3. Intelligent Routing System

Routing Logic:

Condition Selected Model Rationale
quality_priority > 0.7 Qwen3 Higher quality (1024-dim, last-token pooling)
latency_priority > 0.7 Gemma Lower latency
Balanced (latency ≤ 0.7) Qwen3 Default to quality
512 < seq_len ≤ 4096 + Balanced Gemma Optimized for medium sequences
seq_len > 4096 Qwen3 Long-context advantage (32K max)

Features:

  • Sequence length awareness (0-512, 513-4096, 4097+)
  • Priority-based selection (quality vs latency trade-off)
  • Target dimension validation (supports Matryoshka truncation)
  • Fallback to quality-optimized model by default

4. Enhanced API Endpoints

4.1 Embedding Generation API

Endpoint: POST /api/v1/embeddings
Request:

curl -X POST http://localhost:8080/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Hello world", "How are you?"],
    "model": "auto",
    "dimension": 768,
    "quality_priority": 0.5,
    "latency_priority": 0.3
  }'

Response:

{"embeddings":[{"text":"Hello world","embedding":[-0.0011776701,0.007934736,-0.011614905,-0.069743134,0.0038819378,-0.013523025,-0.013302667,-0.013149623,-0.110761255,-0.017031778,-0.005930664,-0.030125689,0.040773585,-0.011440966,-0.04770526,0.09613129,-0.0014638528,0.09509718,0.09494334,-0.054391436,-0.0020897032,0.035339814,-0.021833058,0.13721538,-0.038567428,-0.037829112,-0.063753285,0.12518048,0.018808221,-0.020483524,0.0059032775,0.046593938,-0.018551195,-0.016806917,-0.036421772,-0.015456601,0.026749283,-0.01122462,-0.029844467,0.04526407,0.018204559,-0.0072549977,0.052898858,-0.016359843,0.029457115,0.01298179,0.01801,-0.016472356,0.022225356,0.009826025,-0.033700198,-0.007061259,-0.0033656745,-0.0056725773,0.01836465,-0.043071102,0.019256165,-0.025562814,0.02602406,-0.013416847,-0.065772854,0.046992764,-0.06785469,-0.011567845,0.015336861,0.04758737,0.0057834517,-0.034286067,-0.08161117,-0.02324808,-0.012708064,-0.039974876,-0.06001572,0.011223743,-0.0139067015,0.044922095,-0.011723515,-0.07243128,-0.007279289,0.014650157,-0.042103495,0.010381119,0.0057879593,0.015964841,-0.027991252,0.01771154,0.039843146,-0.01941307,0.030058417,-0.011496778,0.022528622,0.03423984,-0.01498692,-0.021434994,-0.010240162,-0.034212504,-0.009323697,0.027459912,-0.009895696,0.01241982,-0.019939952,-0.017852187,-0.022444885,0.031270206,-0.007970279,0.054569997,-0.08672202,0.006629516,-0.029559178,-0.025445472,0.04059863,0.012183859,-0.024404349,0.047179647,-0.013648596,-0.01490265,-0.00013017676,-0.02450505,-0.0018584615,0.020799987,0.026012216,-0.03292143,-0.011597405,-0.0060056485,-0.038779255,0.018817881,-0.013597122,-0.0014876737,0.026870968,-0.039269753,0.022592615,-0.0061427583,-0.029515263,-0.0620767,0.03594276,0.014359092,0.028502243,-0.0045799874,0.053815804,0.004320876,-0.024249097,0.023128437,0.0056351507,-0.009190675,-0.001348589,0.012393652,-0.0070440415,0.00425259,0.03977148,0.023439541,-0.006672565,0.0057664854,0.0026115666,-0.029725123,-0.005135876,0.03751997,-0.024319746,-0.040855538,0.020056438,0.015429411,0.003315916,0.0035442382,-0.021802701,-0.038691856,-0.029281877,0.040952794,-0.017199202,-0.02344868,-0.02573083,-0.003891258,-0.0060664453,0.033763517,-0.0053477865,0.0045854473,0.017451534,-0.0032039755,-0.031705134,-0.06155096,-0.028876746,-0.0045428276,-0.0030209003,-0.0159006,0.003289891,-0.023817137,0.003466424,0.036604717,0.002943647,0.0026952373,0.0038659417,-0.03182722,-0.041021496,-0.034839902,-0.04418572,-0.019022595,-0.027335886,-0.032770332,0.0340952,0.001639541,0.010570816,-0.0018090276,-0.029376939,-0.02756262,0.0024074377,0.031434618,-0.034492984,-0.0144294975,0.016222548,-0.016283115,-0.019075776,-0.0365932,0.004635058,-0.024825921,-0.00034818443,0.023385312,-0.02441677,0.023864735,-0.0044570942,0.03043529,-0.012903999,0.0019909157,0.047047853,-0.0070095602,-0.037368115,0.0033693789,0.03574133,-0.0379522,0.023330256,-0.019325629,-0.018294191,0.018658416,-0.013180991,-0.013505713,0.03746225,0.051020514,0.052633204,-0.05528711,0.005697388,0.039139148,0.047146965,-0.024603525,-0.006362081,-0.0060550203,0.022352997,-0.0006954921,-0.04412668,0.02547887,-0.016690737,0.0390946,0.003148849,0.02722218,0.014348823,-0.09372626,-0.014871775,0.034534905,0.0138108535,-0.020523604,-0.0053432235,-0.011875785,0.003273271,-0.000023344579,0.054128446,-0.050679486,-0.026407199,0.01884287,0.067124575,0.0066466467,-0.00109404,0.004614791,-0.04541507,-0.0038610692,0.033234425,0.029314004,-0.057476472,0.017554866,-0.0016484737,0.018493941,0.008613782,-0.07667692,0.0342826,0.033308208,0.0022795652,-0.0056974525,0.0047485805,-0.054709766,0.040707782,0.02155165,0.032027535,0.017938884,-0.049721897,-0.022716342,0.04905613,0.011560995,-0.024089629,0.04493244,-0.005221984,0.0179325,0.0038464053,0.06546941,0.0572408,-0.05611776,-0.011494613,-0.019644028,0.011298983,0.008333101,-0.0068284166,-0.020574884,0.039558575,-0.014883925,0.0005924551,-0.017006062,0.053173944,-0.022080317,0.0013523472,-0.02487049,0.017781265,-0.02438059,-0.0555372,0.06903056,-0.026278012,0.026846303,0.024213865,0.0062420913,0.0049660066,-0.007000429,-0.0073207105,0.012079489,-0.042659756,0.015751842,-0.006360046,0.0032720033,-0.013028973,0.018975629,0.03335066,-0.019504344,0.016434032,0.03638623,0.03764156,0.027010813,-0.013275654,-0.00036471194,0.028852843,0.03690148,-0.029017035,0.006302584,-0.007589289,-0.037588358,-0.07803552,0.006188399,0.020933114,0.027815672,-0.040471938,0.006155548,-0.009000069,-0.031578418,-0.06337097,-0.013561679,0.0077440287,0.0058699124,-0.051710144,-0.049422517,-0.018016346,-0.07374648,0.0050060567,0.011810754,-0.054510564,0.013661655,0.033607606,-0.0031822105,0.050140835,0.006457927,-0.041210532,-0.050923835,-0.031870425,0.009073121,0.08176662,0.054262713,0.08285925,-0.012549974,0.01976759,0.034951117,0.019130118,-0.0019579947,-0.014740428,-0.024253204,-0.03413451,-0.02880418,0.03704287,-0.04562056,0.019868223,-0.0025967413,0.027464813,-0.026074952,0.022079965,-0.04297823,0.056112465,0.009332766,-0.037954587,0.017639387,-0.0041110967,0.0071016955,0.0057284785,-0.0037316189,-0.07983752,-0.0035650276,0.01779454,-0.045581352,-0.05751492,-0.022371396,0.0428655,-0.013850221,0.049352434,-0.011919211,-0.028414508,-0.029559545,0.023474066,-0.015694713,0.0956935,-0.002338615,0.02301684,0.02540595,0.016548797,-0.012145937,-0.03643208,-0.000035791756,-0.02870492,-0.014922357,-0.041055497,-0.05336095,-0.0064700088,0.08074731,-0.011011576,0.0028937547,-0.0017490197,0.0024193553,0.010197607,-0.0031211467,0.030824924,-0.005040887,-0.008652503,0.013261951,-0.029954879,0.024015605,-0.008939344,-0.0066627734,-0.027829394,-0.030609315,0.014128119,0.0076351394,0.00051385857,-0.0023315917,0.057905756,-0.025545806,-0.003066322,0.04158488,0.011264878,-0.04644039,-0.010959641,0.0147877475,0.07088528,-0.002603211,-0.035321735,0.03522705,0.018787868,-0.021267159,-0.013037708,0.019945694,0.022165427,-0.024118215,-0.014929879,0.021486001,0.009236394,0.04342486,0.023448026,0.003105599,0.0049067917,0.020933064,-0.050960075,-0.0028150147,-0.00883601,0.02930497,0.0150023755,-0.025582412,0.052369747,-0.017015394,0.054080874,-0.041011646,-0.012358218,-0.029215049,-0.06170366,-0.03131988,-0.017766573,0.026997631,-0.0020162256,-0.0429391,0.015939536,-0.01791068,-0.017200438,-0.046608217,-0.10629024,0.04663697,0.0044270786,-0.0041578654,-0.03158995,0.037761103,-0.027226388,-0.0125292465,-0.011220218,-0.00017500523,-0.0015987612,0.016892895,0.0116157485,-0.020097783,0.011054542,0.01417011,-0.025809815,-0.028117172,0.010655313,0.043879397,-0.006845394,0.04727274,-0.021628017,0.013827505,0.04140578,0.010139222,-0.013650472,0.036142208,0.015272486,-0.02941387,0.020383876,0.03659396,0.01221859,-0.013096179,-0.0037262228,-0.014649352,0.018505676,-0.03366923,0.0020116826,-0.017140944,-0.00082362874,0.012734089,-0.012500266,-0.013031929,-0.032808498,0.015234133,0.02649142,-0.05603723,-0.01362289,0.017649988,-0.04294344,0.02675647,0.009724848,0.03177922,0.04511783,-0.044304796,0.02328322,-0.03249888,0.06610185,0.041508447,0.011661373,-0.030429814,-0.043557815,-0.009455762,0.00038080724,-0.042194176,0.013991351,0.021280484,0.005238205,0.012856736,0.054863553,-0.0130526,0.051673315,-0.027055388,0.007017527,0.009234334,0.0033340005,-0.032555733,-0.026003402,0.013932356,0.013662331,-0.05472701,0.049953703,0.0016582012,-0.012542819,-0.02233808,-0.021187767,0.006996858,0.009891145,0.028843902,0.035764515,-0.019012483,-0.029690294,-0.054012332,-0.022831751,-0.0060906727,0.053507887,-0.0048476397,-0.033779908,-0.008568681,0.042945843,0.015557778,0.008947764,0.040858883,0.01535743,-0.031733554,-0.014365906,0.0012145478,-0.012653413,-0.02241744,-0.00005415466,-0.005944772,-0.0033701686,0.0410025,-0.047128197,0.021731382,-0.009132736,-0.040104598,-0.041204758,0.03361227,-0.015102556,-0.085486546,0.054217294,0.008097941,0.015693776,0.055691294,0.025338741,0.026753506,-0.0043068724,-0.0052188477,-0.026860956,-0.02339998,-0.006184486,0.003941785,-0.015213447,0.011434913,0.002850069,0.0109541295,0.008238506,-0.025728777,0.042504136,0.0019291276,-0.021913018,-0.026364287,-0.0061725266,-0.021406079,0.0048704785,0.03895028,-0.045250814,0.011278067,0.027054194,-0.031413674,0.03416442,0.023965828,0.0053399606,0.034325417,-0.013425412,0.037339628,-0.019279372,0.03566186,-0.006655148,-0.06658192,0.005046546,-0.015504481,0.010546908,-0.043291777,-0.04413891,-0.04988152,0.08177896,0.05036275,0.01644124,-0.0014379107,0.024255613,-0.024105396,0.013691204,0.028028153,0.0018223953,-0.03015957,-0.067614034,-0.026223592,-0.043722246,-0.057574153,0.044068407,-0.04160083,0.0148691535,-0.06199515,0.04576727,-0.02813782,-0.025926787,0.05294758,-0.0055075022,-0.0070233946,-0.037080146,-0.008094346,0.008963273,-0.034346994,-0.014043747,0.015577774,-0.0015659996,-0.0018457673,-0.013213707,-0.04806063,-0.021678824,0.0075029647,-0.024100443,0.02930732,0.041339755,0.014990503,-0.07801809,0.045161545,-0.047620393,-0.051731557,0.032499865,0.02855045,0.03614515,-0.0057707555,-0.0011333714,-0.05693559,-0.039355364,-0.0028181495,-0.0010701724,0.015420613,0.011587787,0.0031880771,0.034810882,0.038862407,-0.06303768,-0.017583659,0.03763303,0.022899076,-0.012741052,0.034025397,0.02415911,-0.022370078,0.022137877,-0.013461385,0.024673723,0.040311642,0.06364941,0.03162614,-0.016534086,-0.02222795,0.027945325,0.025674498,-0.023718059,-0.039659604,-0.007962601,0.03203866,-0.017277177,-0.057789598,0.005863454,0.0016228104,0.053332873,-0.034830425,0.0029137426,0.040061526,-0.029931733,-0.038198125,-0.026972355,0.075112574,0.010935784,0.013148635,0.0037961367,-0.022672692,0.014630495,0.057828974,0.036382347,-0.03838268,0.02845551,-0.05291444],"dimension":768,"model_used":"qwen3","processing_time_ms":314},{"text":"How are you?","embedding":[-0.004742121,-0.028285975,-0.0072721327,-0.023122756,0.02300375,-0.04387419,0.004954264,0.04251561,-0.0491771,-0.012283453,-0.02045992,0.019563705,0.12380614,-0.006649246,-0.04119169,0.084975705,-0.015707437,-0.011854641,0.11029557,-0.06286131,-0.02862425,0.01392823,-0.022030517,0.08863494,-0.04438717,-0.007904946,0.0089219455,0.11182111,-0.034948338,0.0014121088,0.029627359,0.025279567,-0.036695328,0.005713874,-0.015776183,-0.009724424,0.007724148,-0.03939848,-0.019047318,0.016799081,-0.015868356,0.0012757834,0.059271697,0.032568574,0.02712838,0.045280803,0.06352715,-0.00054432935,-0.028824734,-0.002798857,-0.012593757,-0.04633947,-0.007043044,0.030283278,0.00053529453,-0.047833357,0.038306165,-0.019806758,0.0017395378,-0.026097547,-0.06576478,0.059153724,-0.08835784,0.024431298,-0.017478283,0.08649806,-0.015814288,0.010202637,-0.078500494,-0.015419936,0.0013826173,-0.029458683,-0.038203027,0.044586495,-0.019551905,0.055639654,0.007637812,0.006334496,0.01749197,-0.013765181,-0.01587337,-0.010905278,-0.012761322,0.0067865103,-0.016098632,-0.009505348,-0.05046187,-0.015848674,0.02056981,0.029309079,0.026195785,0.01743074,-0.028351681,-0.058485366,0.03965784,-0.0139008,-0.0010841925,0.00076815323,-0.02160749,-0.012906479,0.005067272,-0.0022386685,0.017492378,0.01349513,-0.003485689,0.06710639,-0.061084576,-0.040261433,0.053684413,-0.029575104,0.024042133,-0.028008267,0.01242674,0.032149214,-0.010928923,-0.02605322,-0.034427546,-0.006290597,-0.010307823,-0.00085097936,0.0315166,-0.00082867796,-0.018429723,-0.024253322,-0.005941684,-0.019003594,-0.006715353,-0.012296489,0.03153839,-0.042339012,0.009028099,-0.062028658,0.0041337768,-0.08464613,0.021148834,-0.02120839,0.06863514,0.0017258474,0.046101935,0.00022861086,-0.009189849,0.03578932,0.013432448,-0.016813492,-0.029284487,0.0040750075,-0.010597939,-0.011949942,0.004893096,-0.030070214,0.0027656879,0.00981376,-0.048103377,0.0020191774,-0.013131878,0.03909326,-0.0014112958,-0.011418491,0.025684496,0.018660918,0.023939298,-0.010801204,-0.008825848,-0.013051798,0.0027830072,-0.013286585,-0.04734741,-0.030824775,-0.02007871,-0.006282119,0.016728504,0.038556837,0.00025223056,0.015442045,0.028124014,-0.021863954,-0.01819012,-0.04737181,-0.011459871,-0.045483,0.0117348805,-0.024147453,-0.0038768197,-0.00087706704,0.015867695,-0.003975911,-0.015404027,-0.0057546725,-0.040083196,-0.021699606,-0.013607079,-0.015696064,-0.046015617,-0.013876569,-0.029143695,-0.013502779,0.032003533,-0.022986852,-0.010617916,-0.0068795527,-0.0070732757,0.004651698,0.026508862,0.0067972657,-0.0068252594,0.039020266,0.005015373,-0.015142995,-0.024486432,-0.029444246,-0.028027913,-0.037478004,0.030825725,-0.024102855,-0.00039065117,0.012248081,-0.05989472,-0.010369811,0.020237772,0.041243076,0.043743595,0.019364018,-0.044358943,0.0060412567,-0.016332727,-0.05356422,0.016576061,-0.044831406,0.00981636,-0.0027974932,0.014728481,-0.03166245,-0.010558563,0.053922266,0.041455913,-0.00023815308,0.007998732,-0.0034772877,0.023351688,-0.0027515898,0.0013077952,-0.0003158032,0.030560354,0.007155038,-0.05029961,0.033159453,-0.011783757,0.0053332383,-0.028177777,0.043118656,0.04403452,-0.10019251,0.031794198,-0.006223954,0.010048124,-0.008794733,-0.007937976,-0.0058941427,-0.0016739252,-0.012042538,0.04051639,-0.062117416,-0.03747257,-0.0070180553,0.0077634067,0.03228482,-0.015337232,0.017296385,-0.04709118,-0.033393998,-0.016730761,0.010873913,-0.055355776,-0.012467278,0.0011429243,-0.0050222683,-0.004178195,-0.010054097,0.033084095,0.08591332,-0.020546472,0.0009829424,-0.014462255,-0.022103354,-0.008910164,0.02968877,0.038335506,0.0485233,0.016177205,-0.013931837,0.05894499,-0.01678934,-0.013133997,0.04679121,0.0025400675,-0.033992715,0.031413432,0.101362154,0.008341887,-0.040579375,-0.029936625,-0.037956852,0.02873222,0.023380416,0.008474923,-0.0080737695,0.023045098,-0.04179496,-0.05239921,-0.029243393,0.09000636,0.035241496,0.017359879,0.03351349,-0.026584303,-0.034339573,0.014456286,0.08310269,-0.010723889,-0.0035047636,0.0047213617,0.03752203,-0.015626797,-0.030771697,0.018974915,0.03688094,0.028808106,0.01299607,-0.02646611,0.010367574,-0.029577486,-0.020939577,0.0037850065,-0.005706879,0.028572198,0.011903314,0.011338788,0.00056717056,-0.014614151,0.0044952547,0.040730413,0.010679485,-0.04651278,0.029708868,-0.012592005,-0.07479545,-0.06124988,0.03688197,0.042450424,0.019806128,-0.029359413,0.011022945,-0.011480791,0.0062191044,-0.033819564,-0.02256535,0.009280946,0.0289638,-0.045576673,-0.010879443,0.031900734,-0.05466447,0.03871028,-0.031194963,-0.06367194,0.015447054,-0.022382928,-0.0026909066,0.02442468,-0.04069324,-0.06712901,-0.067578174,-0.013299861,0.025912803,0.036509115,0.009499263,0.057530798,-0.018951282,0.019382305,0.006507781,0.002260826,0.017497536,0.025840065,0.029949224,-0.00005820468,-0.06308767,0.00238863,0.013499019,-0.039016068,-0.00461814,-0.013897039,-0.060581412,-0.0016828496,-0.017547663,0.05695455,-0.017773246,-0.0013027551,-0.0129412385,-0.006333807,0.026665656,0.01331541,0.04050049,-0.011166124,-0.01845173,-0.026021212,0.00600454,-0.0047208867,-0.037983313,0.037177272,0.0083367,-0.004504212,0.018764764,0.024190072,-0.024886206,0.04041205,-0.029774267,0.042187706,-0.018168738,-0.008566579,0.020522937,-0.005989644,-0.017683536,-0.04108127,-0.04437971,-0.03616209,-0.038050998,-0.0018844125,-0.04265985,-0.03817574,0.098413266,0.010823644,-0.053950045,-0.031617824,-0.016048584,-0.03657667,-0.05541076,0.04341141,-0.028076202,0.010740894,-0.030611105,-0.018865634,-0.0064055305,-0.0033578533,-0.052898776,0.010772084,-0.029496204,-0.005896643,0.009005542,-0.0059324587,0.03289233,0.008982322,-0.025317544,-0.017107489,0.07630055,-0.026558887,-0.03812595,0.032542296,-0.0068340637,0.02728917,-0.024924856,0.010982762,-0.0379342,-0.017215684,-0.019579789,-0.0049873246,-0.01207459,0.04922398,-0.024217775,-0.02863802,0.02234329,-0.0081413,0.09835404,-0.0093606245,0.016514672,0.031964347,0.04491389,-0.024566635,-0.0456833,-0.044337105,0.01583726,0.02357277,-0.008125632,-0.0044831326,-0.020189174,0.025289278,-0.04897233,-0.03315391,0.00255569,-0.06627637,0.00057151867,-0.010638313,-0.042805444,-0.011312308,-0.06904263,0.014298265,0.004588954,-0.0066755866,-0.029363792,-0.07720426,0.04446103,0.031201938,0.034721237,0.011206575,0.06680524,-0.0040864185,-0.027047392,-0.043731205,-0.0062415563,0.002546915,-0.0014488219,-0.0033805296,-0.04651463,0.013561382,0.006432569,0.0048152264,-0.005544479,0.024959546,-0.0036618826,-0.027945211,0.008158624,-0.028250184,0.012272111,0.075540446,-0.0035446256,0.0048639593,0.037892703,0.024033187,-0.0074768728,0.0056394604,0.05376065,-0.0139173,-0.0304107,0.02610214,0.005069008,0.0507784,0.011988787,0.011517066,-0.016467296,0.02690165,0.0019133531,0.0333204,-0.020487076,0.012627776,-0.026392542,0.019360872,-0.052182212,-0.052011803,0.0022448245,-0.03354484,0.011008455,0.04969766,0.035096318,0.05270286,-0.061451208,0.03482762,-0.026678909,0.041115478,0.00087154424,0.0636229,-0.039217442,-0.037868716,-0.039364543,0.020218236,-0.008923043,-0.015169439,0.029874552,-0.040044844,-0.027080335,0.04658867,-0.026996791,0.025437353,-0.052263707,-0.026606783,0.033479944,0.012992962,-0.022131244,0.02723854,0.009117898,0.056999244,-0.020913377,0.010846081,-0.027287986,-0.008105181,-0.011248863,-0.024009999,-0.012726499,0.016902085,0.02247331,0.037120566,-0.006740706,-0.01818739,-0.037737638,0.0050196294,0.0108994,0.026049772,-0.03845868,0.0022166567,-0.00032577864,0.021713391,0.03313875,0.04735475,0.0184704,0.006614805,-0.00831309,-0.02153048,0.00040700636,-0.044003274,-0.008425354,-0.0061063026,-0.005935196,-0.03914773,0.031122468,-0.047421448,0.032249507,-0.0018543709,-0.03893734,-0.04380069,-0.017249523,-0.026432501,-0.070519485,-0.0009578351,-0.007230888,0.03827858,0.03310439,0.029970752,0.04646099,-0.0027785709,-0.033861943,-0.007135693,0.0045042234,-0.05602748,-0.03918842,-0.009734658,0.00332631,-0.009053177,0.00820691,-0.015753768,-0.013843714,0.019595698,0.023725398,0.03289698,-0.013157111,0.0037748162,-0.048580445,0.03157986,0.043894954,-0.02542404,0.019395685,0.008829985,-0.010605843,0.009736577,0.03511596,0.024664767,0.054230794,-0.024993017,0.034359545,0.004787591,0.04147105,0.017196782,-0.045503024,-0.009973089,-0.0356511,0.012115947,-0.06340016,-0.042865634,-0.017628035,0.035293356,0.03444595,-0.008006205,0.016221715,-0.0024592548,-0.031804293,0.0032293093,0.0055933595,-0.03265013,0.02971065,-0.031825524,-0.056254506,-0.030121876,-0.017659053,-0.011574499,-0.036685944,0.042729937,-0.07855001,0.028347334,-0.02562543,-0.018166982,0.028740086,0.016807614,-0.013451873,-0.032672618,-0.0029982068,0.0068974667,0.0082025835,0.017460017,-0.027402185,-0.048427783,-0.011092605,-0.030617438,-0.008829505,0.004550576,0.023406442,-0.06138006,0.0063103614,0.042465337,0.020130305,-0.061726503,-0.01362692,-0.037806734,0.011986168,-0.02988337,0.04709838,-0.013343217,-0.026808793,0.017226757,0.01110629,-0.032751862,-0.02378901,-0.008270518,0.0042557865,0.008961744,0.012691355,0.0051510367,0.05235891,-0.004054577,0.012921136,0.035038035,0.05982486,-0.003860448,0.009283418,0.036680862,-0.04382969,-0.0124488175,-0.019157771,0.028029567,0.007136516,0.05167178,0.01699086,0.0022451044,-0.029495176,0.028553052,0.06344924,-0.01848515,0.0044675423,0.00599733,-0.0016001803,-0.03281533,-0.046496406,0.038195916,-0.0001727386,0.010726275,-0.044711288,-0.05551199,0.04481388,-0.08869976,-0.010203734,0.0055523873,0.025730776,0.06698987,0.06503828,-0.018888751,0.0010825741,0.0272216,0.012563091,0.052060258,-0.0142641915,0.0362488,-0.032540787],"dimension":768,"model_used":"qwen3","processing_time_ms":262}],"total_count":2,"total_processing_time_ms":576,"avg_processing_time_ms":288}

4.2 Cosine Similarity Calculation API

Endpoint: POST /api/v1/similarity

Request:

curl -X POST http://localhost:8080/api/v1/similarity \
  -H "Content-Type: application/json" \
-d '{
  "text1": "Hello world",
  "text2": "Hi there",
  "model": "auto",  
  "dimension": 768
}'

Response:

{"model_used":"qwen3","similarity":0.74183154,"processing_time_ms":30.0131}

4.3 Batch Similarity Matching API

Endpoint: POST /api/v1/similarity/batch

Request:

curl -X POST http://localhost:8080/api/v1/similarity/batch \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning",
    "candidates": ["artificial intelligence", "cooking recipes", "deep learning", "gardening tips"],
    "top_k": 2,
    "model": "auto",
    "dimension": 768
  }'

Response:

{"matches":[{"index":2,"similarity":0.9605684,"text":"deep learning"},{"index":0,"similarity":0.9055326,"text":"artificial intelligence"}],"total_candidates":4,"model_used":"gemma","processing_time_ms":356.3315}

4.4 Embedding Models Information API

Endpoint: GET /api/v1/embeddings/models

Request:

curl -X GET http://localhost:8080/api/v1/embeddings/models

Response:

{"count":2,"models":[{"name":"qwen3_embedding_model","type":"embedding","loaded":true,"model_path":"models/Qwen3-Embedding-0.6B","metadata":{"default_dimension":"1024","matryoshka_supported":"true","max_sequence_length":"32768","model_type":"qwen3"}},{"name":"gemma_embedding_model","type":"embedding","loaded":true,"model_path":"models/embeddinggemma-300m","metadata":{"default_dimension":"768","matryoshka_supported":"true","max_sequence_length":"8192","model_type":"gemma"}}]}

Configuration

config.yaml

semantic_router:
  models:
    qwen3_embedding:
      path: "models/Qwen3-Embedding-0.6B"
    gemma_embedding:
      path: "models/embeddinggemma-300m"

Which issue(s) this PR fixes:

part of #266

Release Notes: Yes/No

….6B and EmbeddingGemma-300M)

Signed-off-by: OneZero-Y <[email protected]>

feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

Signed-off-by: OneZero-Y <[email protected]>
Copy link

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/ffi/embedding.rs
  • candle-binding/src/ffi/embedding_test.rs
  • candle-binding/src/model_architectures/embedding/dense_layers.rs
  • candle-binding/src/model_architectures/embedding/dense_layers_test.rs
  • candle-binding/src/model_architectures/embedding/gemma3_model.rs
  • candle-binding/src/model_architectures/embedding/gemma3_model_test.rs
  • candle-binding/src/model_architectures/embedding/gemma_embedding.rs
  • candle-binding/src/model_architectures/embedding/gemma_embedding_test.rs
  • candle-binding/src/model_architectures/embedding/mod.rs
  • candle-binding/src/model_architectures/embedding/pooling.rs
  • candle-binding/src/model_architectures/embedding/pooling_test.rs
  • candle-binding/src/model_architectures/embedding/qwen3_embedding.rs
  • candle-binding/src/model_architectures/embedding/qwen3_embedding_test.rs
  • candle-binding/test_data/gemma_reference_outputs.json
  • candle-binding/test_data/qwen3_reference_outputs.json
  • candle-binding/Cargo.toml
  • candle-binding/semantic-router.go
  • candle-binding/semantic-router_test.go
  • candle-binding/src/classifiers/lora/mod.rs
  • candle-binding/src/classifiers/mod.rs
  • candle-binding/src/classifiers/traditional/mod.rs
  • candle-binding/src/classifiers/unified.rs
  • candle-binding/src/classifiers/unified_test.rs
  • candle-binding/src/core/config_loader.rs
  • candle-binding/src/core/mod.rs
  • candle-binding/src/core/tokenization.rs
  • candle-binding/src/core/unified_error.rs
  • candle-binding/src/ffi/mod.rs
  • candle-binding/src/ffi/similarity.rs
  • candle-binding/src/ffi/types.rs
  • candle-binding/src/model_architectures/config.rs
  • candle-binding/src/model_architectures/mod.rs
  • candle-binding/src/model_architectures/model_factory.rs
  • candle-binding/src/model_architectures/routing.rs
  • candle-binding/src/model_architectures/traditional/modernbert.rs
  • candle-binding/src/model_architectures/traits.rs
  • candle-binding/src/test_fixtures.rs
  • candle-binding/src/utils/memory.rs

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • scripts/generate_gemma_reference.py
  • scripts/generate_qwen3_reference.py

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/cmd/main.go
  • src/semantic-router/pkg/api/server.go
  • src/semantic-router/pkg/config/config.go

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/models.mk
  • tools/make/rust.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Oct 16, 2025

Test failure is fixed in main branch, merging it for testing.

@rootfs rootfs merged commit fa4f5c7 into vllm-project:feat-candle-refactoring Oct 16, 2025
3 of 4 checks passed
@OneZero-Y OneZero-Y deleted the feat/support-embedding-models-1 branch October 18, 2025 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants