feat:support for two long-context embedding models (Qwen3 and Gemma) #453

OneZero-Y · 2025-10-16T12:25:52Z

What type of PR is this?
support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

What this PR does / why we need it:

1. Qwen3-Embedding-0.6B

Model Specifications:

Architecture: 28 layers, 1024 hidden size, 16 attention heads, 8 KV heads (GQA)
Context Length: 32,768 tokens
Embedding Dimensions: [1024, 512, 256, 128] (Matryoshka)
Pooling Strategy: Last-token pooling with L2 normalization
Activation: SwiGLU
Position Encoding: RoPE (θ=1,000,000)

Key Features:

Grouped Query Attention (GQA) for efficient long-context processing
Left-padding strategy optimized for last-token pooling
High-quality embeddings for retrieval and semantic search

2. EmbeddingGemma-300M

Model Specifications:

Architecture: 14 layers, 768 hidden size, 3 query heads, 1 KV head (MQA)
Context Length: 8,192 tokens
Embedding Dimensions: [768, 512, 256, 128] (Matryoshka)
Pooling Strategy: Mean pooling with L2 normalization
Dense Bottleneck: 768 → 3072 → 768 for quality enhancement
Activation: GeGLU
Attention: Hybrid sliding window (4K) + full attention
Position Encoding: Global RoPE (θ=1,000,000) + Local RoPE (θ=10,000)

Key Features:

Multi-Query Attention (MQA) for low-latency inference
Dense bottleneck layer for improved embedding quality
Right-padding strategy optimized for mean pooling

3. Intelligent Routing System

Routing Logic:

Condition	Selected Model	Rationale
`quality_priority > 0.7`	Qwen3	Higher quality (1024-dim, last-token pooling)
`latency_priority > 0.7`	Gemma	Lower latency
Balanced (`latency ≤ 0.7`)	Qwen3	Default to quality
`512 < seq_len ≤ 4096` + Balanced	Gemma	Optimized for medium sequences
`seq_len > 4096`	Qwen3	Long-context advantage (32K max)

Features:

Sequence length awareness (0-512, 513-4096, 4097+)
Priority-based selection (quality vs latency trade-off)
Target dimension validation (supports Matryoshka truncation)
Fallback to quality-optimized model by default

4. Enhanced API Endpoints

4.1 Embedding Generation API

Endpoint: POST /api/v1/embeddings
Request:

curl -X POST http://localhost:8080/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Hello world", "How are you?"],
    "model": "auto",
    "dimension": 768,
    "quality_priority": 0.5,
    "latency_priority": 0.3
  }'

Response:

{"embeddings":[{"text":"Hello world","embedding":[-0.0011776701,0.007934736,-0.011614905,-0.069743134,0.0038819378,-0.013523025,-0.013302667,-0.013149623,-0.110761255,-0.017031778,-0.005930664,-0.030125689,0.040773585,-0.011440966,-0.04770526,0.09613129,-0.0014638528,0.09509718,0.09494334,-0.054391436,-0.0020897032,0.035339814,-0.021833058,0.13721538,-0.038567428,-0.037829112,-0.063753285,0.12518048,0.018808221,-0.020483524,0.0059032775,0.046593938,-0.018551195,-0.016806917,-0.036421772,-0.015456601,0.026749283,-0.01122462,-0.029844467,0.04526407,0.018204559,-0.0072549977,0.052898858,-0.016359843,0.029457115,0.01298179,0.01801,-0.016472356,0.022225356,0.009826025,-0.033700198,-0.007061259,-0.0033656745,-0.0056725773,0.01836465,-0.043071102,0.019256165,-0.025562814,0.02602406,-0.013416847,-0.065772854,0.046992764,-0.06785469,-0.011567845,0.015336861,0.04758737,0.0057834517,-0.034286067,-0.08161117,-0.02324808,-0.012708064,-0.039974876,-0.06001572,0.011223743,-0.0139067015,0.044922095,-0.011723515,-0.07243128,-0.007279289,0.014650157,-0.042103495,0.010381119,0.0057879593,0.015964841,-0.027991252,0.01771154,0.039843146,-0.01941307,0.030058417,-0.011496778,0.022528622,0.03423984,-0.01498692,-0.021434994,-0.010240162,-0.034212504,-0.009323697,0.027459912,-0.009895696,0.01241982,-0.019939952,-0.017852187,-0.022444885,0.031270206,-0.007970279,0.054569997,-0.08672202,0.006629516,-0.029559178,-0.025445472,0.04059863,0.012183859,-0.024404349,0.047179647,-0.013648596,-0.01490265,-0.00013017676,-0.02450505,-0.0018584615,0.020799987,0.026012216,-0.03292143,-0.011597405,-0.0060056485,-0.038779255,0.018817881,-0.013597122,-0.0014876737,0.026870968,-0.039269753,0.022592615,-0.0061427583,-0.029515263,-0.0620767,0.03594276,0.014359092,0.028502243,-0.0045799874,0.053815804,0.004320876,-0.024249097,0.023128437,0.0056351507,-0.009190675,-0.001348589,0.012393652,-0.0070440415,0.00425259,0.03977148,0.023439541,-0.006672565,0.0057664854,0.0026115666,-0.029725123,-0.005135876,0.03751997,-0.024319746,-0.040855538,0.020056438,0.015429411,0.003315916,0.0035442382,-0.021802701,-0.038691856,-0.029281877,0.040952794,-0.017199202,-0.02344868,-0.02573083,-0.003891258,-0.0060664453,0.033763517,-0.0053477865,0.0045854473,0.017451534,-0.0032039755,-0.031705134,-0.06155096,-0.028876746,-0.0045428276,-0.0030209003,-0.0159006,0.003289891,-0.023817137,0.003466424,0.036604717,0.002943647,0.0026952373,0.0038659417,-0.03182722,-0.041021496,-0.034839902,-0.04418572,-0.019022595,-0.027335886,-0.032770332,0.0340952,0.001639541,0.010570816,-0.0018090276,-0.029376939,-0.02756262,0.0024074377,0.031434618,-0.034492984,-0.0144294975,0.016222548,-0.016283115,-0.019075776,-0.0365932,0.004635058,-0.024825921,-0.00034818443,0.023385312,-0.02441677,0.023864735,-0.0044570942,0.03043529,-0.012903999,0.0019909157,0.047047853,-0.0070095602,-0.037368115,0.0033693789,0.03574133,-0.0379522,0.023330256,-0.019325629,-0.018294191,0.018658416,-0.013180991,-0.013505713,0.03746225,0.051020514,0.052633204,-0.05528711,0.005697388,0.039139148,0.047146965,-0.024603525,-0.006362081,-0.0060550203,0.022352997,-0.0006954921,-0.04412668,0.02547887,-0.016690737,0.0390946,0.003148849,0.02722218,0.014348823,-0.09372626,-0.014871775,0.034534905,0.0138108535,-0.020523604,-0.0053432235,-0.011875785,0.003273271,-0.000023344579,0.054128446,-0.050679486,-0.026407199,0.01884287,0.067124575,0.0066466467,-0.00109404,0.004614791,-0.04541507,-0.0038610692,0.033234425,0.029314004,-0.057476472,0.017554866,-0.0016484737,0.018493941,0.008613782,-0.07667692,0.0342826,0.033308208,0.0022795652,-0.0056974525,0.0047485805,-0.054709766,0.040707782,0.02155165,0.032027535,0.017938884,-0.049721897,-0.022716342,0.04905613,0.011560995,-0.024089629,0.04493244,-0.005221984,0.0179325,0.0038464053,0.06546941,0.0572408,-0.05611776,-0.011494613,-0.019644028,0.011298983,0.008333101,-0.0068284166,-0.020574884,0.039558575,-0.014883925,0.0005924551,-0.017006062,0.053173944,-0.022080317,0.0013523472,-0.02487049,0.017781265,-0.02438059,-0.0555372,0.06903056,-0.026278012,0.026846303,0.024213865,0.0062420913,0.0049660066,-0.007000429,-0.0073207105,0.012079489,-0.042659756,0.015751842,-0.006360046,0.0032720033,-0.013028973,0.018975629,0.03335066,-0.019504344,0.016434032,0.03638623,0.03764156,0.027010813,-0.013275654,-0.00036471194,0.028852843,0.03690148,-0.029017035,0.006302584,-0.007589289,-0.037588358,-0.07803552,0.006188399,0.020933114,0.027815672,-0.040471938,0.006155548,-0.009000069,-0.031578418,-0.06337097,-0.013561679,0.0077440287,0.0058699124,-0.051710144,-0.049422517,-0.018016346,-0.07374648,0.0050060567,0.011810754,-0.054510564,0.013661655,0.033607606,-0.0031822105,0.050140835,0.006457927,-0.041210532,-0.050923835,-0.031870425,0.009073121,0.08176662,0.054262713,0.08285925,-0.012549974,0.01976759,0.034951117,0.019130118,-0.0019579947,-0.014740428,-0.024253204,-0.03413451,-0.02880418,0.03704287,-0.04562056,0.019868223,-0.0025967413,0.027464813,-0.026074952,0.022079965,-0.04297823,0.056112465,0.009332766,-0.037954587,0.017639387,-0.0041110967,0.0071016955,0.0057284785,-0.0037316189,-0.07983752,-0.0035650276,0.01779454,-0.045581352,-0.05751492,-0.022371396,0.0428655,-0.013850221,0.049352434,-0.011919211,-0.028414508,-0.029559545,0.023474066,-0.015694713,0.0956935,-0.002338615,0.02301684,0.02540595,0.016548797,-0.012145937,-0.03643208,-0.000035791756,-0.02870492,-0.014922357,-0.041055497,-0.05336095,-0.0064700088,0.08074731,-0.011011576,0.0028937547,-0.0017490197,0.0024193553,0.010197607,-0.0031211467,0.030824924,-0.005040887,-0.008652503,0.013261951,-0.029954879,0.024015605,-0.008939344,-0.0066627734,-0.027829394,-0.030609315,0.014128119,0.0076351394,0.00051385857,-0.0023315917,0.057905756,-0.025545806,-0.003066322,0.04158488,0.011264878,-0.04644039,-0.010959641,0.0147877475,0.07088528,-0.002603211,-0.035321735,0.03522705,0.018787868,-0.021267159,-0.013037708,0.019945694,0.022165427,-0.024118215,-0.014929879,0.021486001,0.009236394,0.04342486,0.023448026,0.003105599,0.0049067917,0.020933064,-0.050960075,-0.0028150147,-0.00883601,0.02930497,0.0150023755,-0.025582412,0.052369747,-0.017015394,0.054080874,-0.041011646,-0.012358218,-0.029215049,-0.06170366,-0.03131988,-0.017766573,0.026997631,-0.0020162256,-0.0429391,0.015939536,-0.01791068,-0.017200438,-0.046608217,-0.10629024,0.04663697,0.0044270786,-0.0041578654,-0.03158995,0.037761103,-0.027226388,-0.0125292465,-0.011220218,-0.00017500523,-0.0015987612,0.016892895,0.0116157485,-0.020097783,0.011054542,0.01417011,-0.025809815,-0.028117172,0.010655313,0.043879397,-0.006845394,0.04727274,-0.021628017,0.013827505,0.04140578,0.010139222,-0.013650472,0.036142208,0.015272486,-0.02941387,0.020383876,0.03659396,0.01221859,-0.013096179,-0.0037262228,-0.014649352,0.018505676,-0.03366923,0.0020116826,-0.017140944,-0.00082362874,0.012734089,-0.012500266,-0.013031929,-0.032808498,0.015234133,0.02649142,-0.05603723,-0.01362289,0.017649988,-0.04294344,0.02675647,0.009724848,0.03177922,0.04511783,-0.044304796,0.02328322,-0.03249888,0.06610185,0.041508447,0.011661373,-0.030429814,-0.043557815,-0.009455762,0.00038080724,-0.042194176,0.013991351,0.021280484,0.005238205,0.012856736,0.054863553,-0.0130526,0.051673315,-0.027055388,0.007017527,0.009234334,0.0033340005,-0.032555733,-0.026003402,0.013932356,0.013662331,-0.05472701,0.049953703,0.0016582012,-0.012542819,-0.02233808,-0.021187767,0.006996858,0.009891145,0.028843902,0.035764515,-0.019012483,-0.029690294,-0.054012332,-0.022831751,-0.0060906727,0.053507887,-0.0048476397,-0.033779908,-0.008568681,0.042945843,0.015557778,0.008947764,0.040858883,0.01535743,-0.031733554,-0.014365906,0.0012145478,-0.012653413,-0.02241744,-0.00005415466,-0.005944772,-0.0033701686,0.0410025,-0.047128197,0.021731382,-0.009132736,-0.040104598,-0.041204758,0.03361227,-0.015102556,-0.085486546,0.054217294,0.008097941,0.015693776,0.055691294,0.025338741,0.026753506,-0.0043068724,-0.0052188477,-0.026860956,-0.02339998,-0.006184486,0.003941785,-0.015213447,0.011434913,0.002850069,0.0109541295,0.008238506,-0.025728777,0.042504136,0.0019291276,-0.021913018,-0.026364287,-0.0061725266,-0.021406079,0.0048704785,0.03895028,-0.045250814,0.011278067,0.027054194,-0.031413674,0.03416442,0.023965828,0.0053399606,0.034325417,-0.013425412,0.037339628,-0.019279372,0.03566186,-0.006655148,-0.06658192,0.005046546,-0.015504481,0.010546908,-0.043291777,-0.04413891,-0.04988152,0.08177896,0.05036275,0.01644124,-0.0014379107,0.024255613,-0.024105396,0.013691204,0.028028153,0.0018223953,-0.03015957,-0.067614034,-0.026223592,-0.043722246,-0.057574153,0.044068407,-0.04160083,0.0148691535,-0.06199515,0.04576727,-0.02813782,-0.025926787,0.05294758,-0.0055075022,-0.0070233946,-0.037080146,-0.008094346,0.008963273,-0.034346994,-0.014043747,0.015577774,-0.0015659996,-0.0018457673,-0.013213707,-0.04806063,-0.021678824,0.0075029647,-0.024100443,0.02930732,0.041339755,0.014990503,-0.07801809,0.045161545,-0.047620393,-0.051731557,0.032499865,0.02855045,0.03614515,-0.0057707555,-0.0011333714,-0.05693559,-0.039355364,-0.0028181495,-0.0010701724,0.015420613,0.011587787,0.0031880771,0.034810882,0.038862407,-0.06303768,-0.017583659,0.03763303,0.022899076,-0.012741052,0.034025397,0.02415911,-0.022370078,0.022137877,-0.013461385,0.024673723,0.040311642,0.06364941,0.03162614,-0.016534086,-0.02222795,0.027945325,0.025674498,-0.023718059,-0.039659604,-0.007962601,0.03203866,-0.017277177,-0.057789598,0.005863454,0.0016228104,0.053332873,-0.034830425,0.0029137426,0.040061526,-0.029931733,-0.038198125,-0.026972355,0.075112574,0.010935784,0.013148635,0.0037961367,-0.022672692,0.014630495,0.057828974,0.036382347,-0.03838268,0.02845551,-0.05291444],"dimension":768,"model_used":"qwen3","processing_time_ms":314},{"text":"How are you?","embedding":[-0.004742121,-0.028285975,-0.0072721327,-0.023122756,0.02300375,-0.04387419,0.004954264,0.04251561,-0.0491771,-0.012283453,-0.02045992,0.019563705,0.12380614,-0.006649246,-0.04119169,0.084975705,-0.015707437,-0.011854641,0.11029557,-0.06286131,-0.02862425,0.01392823,-0.022030517,0.08863494,-0.04438717,-0.007904946,0.0089219455,0.11182111,-0.034948338,0.0014121088,0.029627359,0.025279567,-0.036695328,0.005713874,-0.015776183,-0.009724424,0.007724148,-0.03939848,-0.019047318,0.016799081,-0.015868356,0.0012757834,0.059271697,0.032568574,0.02712838,0.045280803,0.06352715,-0.00054432935,-0.028824734,-0.002798857,-0.012593757,-0.04633947,-0.007043044,0.030283278,0.00053529453,-0.047833357,0.038306165,-0.019806758,0.0017395378,-0.026097547,-0.06576478,0.059153724,-0.08835784,0.024431298,-0.017478283,0.08649806,-0.015814288,0.010202637,-0.078500494,-0.015419936,0.0013826173,-0.029458683,-0.038203027,0.044586495,-0.019551905,0.055639654,0.007637812,0.006334496,0.01749197,-0.013765181,-0.01587337,-0.010905278,-0.012761322,0.0067865103,-0.016098632,-0.009505348,-0.05046187,-0.015848674,0.02056981,0.029309079,0.026195785,0.01743074,-0.028351681,-0.058485366,0.03965784,-0.0139008,-0.0010841925,0.00076815323,-0.02160749,-0.012906479,0.005067272,-0.0022386685,0.017492378,0.01349513,-0.003485689,0.06710639,-0.061084576,-0.040261433,0.053684413,-0.029575104,0.024042133,-0.028008267,0.01242674,0.032149214,-0.010928923,-0.02605322,-0.034427546,-0.006290597,-0.010307823,-0.00085097936,0.0315166,-0.00082867796,-0.018429723,-0.024253322,-0.005941684,-0.019003594,-0.006715353,-0.012296489,0.03153839,-0.042339012,0.009028099,-0.062028658,0.0041337768,-0.08464613,0.021148834,-0.02120839,0.06863514,0.0017258474,0.046101935,0.00022861086,-0.009189849,0.03578932,0.013432448,-0.016813492,-0.029284487,0.0040750075,-0.010597939,-0.011949942,0.004893096,-0.030070214,0.0027656879,0.00981376,-0.048103377,0.0020191774,-0.013131878,0.03909326,-0.0014112958,-0.011418491,0.025684496,0.018660918,0.023939298,-0.010801204,-0.008825848,-0.013051798,0.0027830072,-0.013286585,-0.04734741,-0.030824775,-0.02007871,-0.006282119,0.016728504,0.038556837,0.00025223056,0.015442045,0.028124014,-0.021863954,-0.01819012,-0.04737181,-0.011459871,-0.045483,0.0117348805,-0.024147453,-0.0038768197,-0.00087706704,0.015867695,-0.003975911,-0.015404027,-0.0057546725,-0.040083196,-0.021699606,-0.013607079,-0.015696064,-0.046015617,-0.013876569,-0.029143695,-0.013502779,0.032003533,-0.022986852,-0.010617916,-0.0068795527,-0.0070732757,0.004651698,0.026508862,0.0067972657,-0.0068252594,0.039020266,0.005015373,-0.015142995,-0.024486432,-0.029444246,-0.028027913,-0.037478004,0.030825725,-0.024102855,-0.00039065117,0.012248081,-0.05989472,-0.010369811,0.020237772,0.041243076,0.043743595,0.019364018,-0.044358943,0.0060412567,-0.016332727,-0.05356422,0.016576061,-0.044831406,0.00981636,-0.0027974932,0.014728481,-0.03166245,-0.010558563,0.053922266,0.041455913,-0.00023815308,0.007998732,-0.0034772877,0.023351688,-0.0027515898,0.0013077952,-0.0003158032,0.030560354,0.007155038,-0.05029961,0.033159453,-0.011783757,0.0053332383,-0.028177777,0.043118656,0.04403452,-0.10019251,0.031794198,-0.006223954,0.010048124,-0.008794733,-0.007937976,-0.0058941427,-0.0016739252,-0.012042538,0.04051639,-0.062117416,-0.03747257,-0.0070180553,0.0077634067,0.03228482,-0.015337232,0.017296385,-0.04709118,-0.033393998,-0.016730761,0.010873913,-0.055355776,-0.012467278,0.0011429243,-0.0050222683,-0.004178195,-0.010054097,0.033084095,0.08591332,-0.020546472,0.0009829424,-0.014462255,-0.022103354,-0.008910164,0.02968877,0.038335506,0.0485233,0.016177205,-0.013931837,0.05894499,-0.01678934,-0.013133997,0.04679121,0.0025400675,-0.033992715,0.031413432,0.101362154,0.008341887,-0.040579375,-0.029936625,-0.037956852,0.02873222,0.023380416,0.008474923,-0.0080737695,0.023045098,-0.04179496,-0.05239921,-0.029243393,0.09000636,0.035241496,0.017359879,0.03351349,-0.026584303,-0.034339573,0.014456286,0.08310269,-0.010723889,-0.0035047636,0.0047213617,0.03752203,-0.015626797,-0.030771697,0.018974915,0.03688094,0.028808106,0.01299607,-0.02646611,0.010367574,-0.029577486,-0.020939577,0.0037850065,-0.005706879,0.028572198,0.011903314,0.011338788,0.00056717056,-0.014614151,0.0044952547,0.040730413,0.010679485,-0.04651278,0.029708868,-0.012592005,-0.07479545,-0.06124988,0.03688197,0.042450424,0.019806128,-0.029359413,0.011022945,-0.011480791,0.0062191044,-0.033819564,-0.02256535,0.009280946,0.0289638,-0.045576673,-0.010879443,0.031900734,-0.05466447,0.03871028,-0.031194963,-0.06367194,0.015447054,-0.022382928,-0.0026909066,0.02442468,-0.04069324,-0.06712901,-0.067578174,-0.013299861,0.025912803,0.036509115,0.009499263,0.057530798,-0.018951282,0.019382305,0.006507781,0.002260826,0.017497536,0.025840065,0.029949224,-0.00005820468,-0.06308767,0.00238863,0.013499019,-0.039016068,-0.00461814,-0.013897039,-0.060581412,-0.0016828496,-0.017547663,0.05695455,-0.017773246,-0.0013027551,-0.0129412385,-0.006333807,0.026665656,0.01331541,0.04050049,-0.011166124,-0.01845173,-0.026021212,0.00600454,-0.0047208867,-0.037983313,0.037177272,0.0083367,-0.004504212,0.018764764,0.024190072,-0.024886206,0.04041205,-0.029774267,0.042187706,-0.018168738,-0.008566579,0.020522937,-0.005989644,-0.017683536,-0.04108127,-0.04437971,-0.03616209,-0.038050998,-0.0018844125,-0.04265985,-0.03817574,0.098413266,0.010823644,-0.053950045,-0.031617824,-0.016048584,-0.03657667,-0.05541076,0.04341141,-0.028076202,0.010740894,-0.030611105,-0.018865634,-0.0064055305,-0.0033578533,-0.052898776,0.010772084,-0.029496204,-0.005896643,0.009005542,-0.0059324587,0.03289233,0.008982322,-0.025317544,-0.017107489,0.07630055,-0.026558887,-0.03812595,0.032542296,-0.0068340637,0.02728917,-0.024924856,0.010982762,-0.0379342,-0.017215684,-0.019579789,-0.0049873246,-0.01207459,0.04922398,-0.024217775,-0.02863802,0.02234329,-0.0081413,0.09835404,-0.0093606245,0.016514672,0.031964347,0.04491389,-0.024566635,-0.0456833,-0.044337105,0.01583726,0.02357277,-0.008125632,-0.0044831326,-0.020189174,0.025289278,-0.04897233,-0.03315391,0.00255569,-0.06627637,0.00057151867,-0.010638313,-0.042805444,-0.011312308,-0.06904263,0.014298265,0.004588954,-0.0066755866,-0.029363792,-0.07720426,0.04446103,0.031201938,0.034721237,0.011206575,0.06680524,-0.0040864185,-0.027047392,-0.043731205,-0.0062415563,0.002546915,-0.0014488219,-0.0033805296,-0.04651463,0.013561382,0.006432569,0.0048152264,-0.005544479,0.024959546,-0.0036618826,-0.027945211,0.008158624,-0.028250184,0.012272111,0.075540446,-0.0035446256,0.0048639593,0.037892703,0.024033187,-0.0074768728,0.0056394604,0.05376065,-0.0139173,-0.0304107,0.02610214,0.005069008,0.0507784,0.011988787,0.011517066,-0.016467296,0.02690165,0.0019133531,0.0333204,-0.020487076,0.012627776,-0.026392542,0.019360872,-0.052182212,-0.052011803,0.0022448245,-0.03354484,0.011008455,0.04969766,0.035096318,0.05270286,-0.061451208,0.03482762,-0.026678909,0.041115478,0.00087154424,0.0636229,-0.039217442,-0.037868716,-0.039364543,0.020218236,-0.008923043,-0.015169439,0.029874552,-0.040044844,-0.027080335,0.04658867,-0.026996791,0.025437353,-0.052263707,-0.026606783,0.033479944,0.012992962,-0.022131244,0.02723854,0.009117898,0.056999244,-0.020913377,0.010846081,-0.027287986,-0.008105181,-0.011248863,-0.024009999,-0.012726499,0.016902085,0.02247331,0.037120566,-0.006740706,-0.01818739,-0.037737638,0.0050196294,0.0108994,0.026049772,-0.03845868,0.0022166567,-0.00032577864,0.021713391,0.03313875,0.04735475,0.0184704,0.006614805,-0.00831309,-0.02153048,0.00040700636,-0.044003274,-0.008425354,-0.0061063026,-0.005935196,-0.03914773,0.031122468,-0.047421448,0.032249507,-0.0018543709,-0.03893734,-0.04380069,-0.017249523,-0.026432501,-0.070519485,-0.0009578351,-0.007230888,0.03827858,0.03310439,0.029970752,0.04646099,-0.0027785709,-0.033861943,-0.007135693,0.0045042234,-0.05602748,-0.03918842,-0.009734658,0.00332631,-0.009053177,0.00820691,-0.015753768,-0.013843714,0.019595698,0.023725398,0.03289698,-0.013157111,0.0037748162,-0.048580445,0.03157986,0.043894954,-0.02542404,0.019395685,0.008829985,-0.010605843,0.009736577,0.03511596,0.024664767,0.054230794,-0.024993017,0.034359545,0.004787591,0.04147105,0.017196782,-0.045503024,-0.009973089,-0.0356511,0.012115947,-0.06340016,-0.042865634,-0.017628035,0.035293356,0.03444595,-0.008006205,0.016221715,-0.0024592548,-0.031804293,0.0032293093,0.0055933595,-0.03265013,0.02971065,-0.031825524,-0.056254506,-0.030121876,-0.017659053,-0.011574499,-0.036685944,0.042729937,-0.07855001,0.028347334,-0.02562543,-0.018166982,0.028740086,0.016807614,-0.013451873,-0.032672618,-0.0029982068,0.0068974667,0.0082025835,0.017460017,-0.027402185,-0.048427783,-0.011092605,-0.030617438,-0.008829505,0.004550576,0.023406442,-0.06138006,0.0063103614,0.042465337,0.020130305,-0.061726503,-0.01362692,-0.037806734,0.011986168,-0.02988337,0.04709838,-0.013343217,-0.026808793,0.017226757,0.01110629,-0.032751862,-0.02378901,-0.008270518,0.0042557865,0.008961744,0.012691355,0.0051510367,0.05235891,-0.004054577,0.012921136,0.035038035,0.05982486,-0.003860448,0.009283418,0.036680862,-0.04382969,-0.0124488175,-0.019157771,0.028029567,0.007136516,0.05167178,0.01699086,0.0022451044,-0.029495176,0.028553052,0.06344924,-0.01848515,0.0044675423,0.00599733,-0.0016001803,-0.03281533,-0.046496406,0.038195916,-0.0001727386,0.010726275,-0.044711288,-0.05551199,0.04481388,-0.08869976,-0.010203734,0.0055523873,0.025730776,0.06698987,0.06503828,-0.018888751,0.0010825741,0.0272216,0.012563091,0.052060258,-0.0142641915,0.0362488,-0.032540787],"dimension":768,"model_used":"qwen3","processing_time_ms":262}],"total_count":2,"total_processing_time_ms":576,"avg_processing_time_ms":288}

4.2 Cosine Similarity Calculation API

Endpoint: POST /api/v1/similarity

Request:

curl -X POST http://localhost:8080/api/v1/similarity \
  -H "Content-Type: application/json" \
-d '{
  "text1": "Hello world",
  "text2": "Hi there",
  "model": "auto",  
  "dimension": 768
}'

Response:

{"model_used":"qwen3","similarity":0.74183154,"processing_time_ms":30.0131}

4.3 Batch Similarity Matching API

Endpoint: POST /api/v1/similarity/batch

Request:

curl -X POST http://localhost:8080/api/v1/similarity/batch \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning",
    "candidates": ["artificial intelligence", "cooking recipes", "deep learning", "gardening tips"],
    "top_k": 2,
    "model": "auto",
    "dimension": 768
  }'

Response:

{"matches":[{"index":2,"similarity":0.9605684,"text":"deep learning"},{"index":0,"similarity":0.9055326,"text":"artificial intelligence"}],"total_candidates":4,"model_used":"gemma","processing_time_ms":356.3315}

4.4 Embedding Models Information API

Endpoint: GET /api/v1/embeddings/models

Request:

curl -X GET http://localhost:8080/api/v1/embeddings/models

Response:

{"count":2,"models":[{"name":"qwen3_embedding_model","type":"embedding","loaded":true,"model_path":"models/Qwen3-Embedding-0.6B","metadata":{"default_dimension":"1024","matryoshka_supported":"true","max_sequence_length":"32768","model_type":"qwen3"}},{"name":"gemma_embedding_model","type":"embedding","loaded":true,"model_path":"models/embeddinggemma-300m","metadata":{"default_dimension":"768","matryoshka_supported":"true","max_sequence_length":"8192","model_type":"gemma"}}]}

Configuration

config.yaml

semantic_router:
  models:
    qwen3_embedding:
      path: "models/Qwen3-Embedding-0.6B"
    gemma_embedding:
      path: "models/embeddinggemma-300m"

Which issue(s) this PR fixes:

part of #266

Release Notes: Yes/No

….6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]>

github-actions · 2025-10-16T12:26:10Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/ffi/embedding.rs
candle-binding/src/ffi/embedding_test.rs
candle-binding/src/model_architectures/embedding/dense_layers.rs
candle-binding/src/model_architectures/embedding/dense_layers_test.rs
candle-binding/src/model_architectures/embedding/gemma3_model.rs
candle-binding/src/model_architectures/embedding/gemma3_model_test.rs
candle-binding/src/model_architectures/embedding/gemma_embedding.rs
candle-binding/src/model_architectures/embedding/gemma_embedding_test.rs
candle-binding/src/model_architectures/embedding/mod.rs
candle-binding/src/model_architectures/embedding/pooling.rs
candle-binding/src/model_architectures/embedding/pooling_test.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding_test.rs
candle-binding/test_data/gemma_reference_outputs.json
candle-binding/test_data/qwen3_reference_outputs.json
candle-binding/Cargo.toml
candle-binding/semantic-router.go
candle-binding/semantic-router_test.go
candle-binding/src/classifiers/lora/mod.rs
candle-binding/src/classifiers/mod.rs
candle-binding/src/classifiers/traditional/mod.rs
candle-binding/src/classifiers/unified.rs
candle-binding/src/classifiers/unified_test.rs
candle-binding/src/core/config_loader.rs
candle-binding/src/core/mod.rs
candle-binding/src/core/tokenization.rs
candle-binding/src/core/unified_error.rs
candle-binding/src/ffi/mod.rs
candle-binding/src/ffi/similarity.rs
candle-binding/src/ffi/types.rs
candle-binding/src/model_architectures/config.rs
candle-binding/src/model_architectures/mod.rs
candle-binding/src/model_architectures/model_factory.rs
candle-binding/src/model_architectures/routing.rs
candle-binding/src/model_architectures/traditional/modernbert.rs
candle-binding/src/model_architectures/traits.rs
candle-binding/src/test_fixtures.rs
candle-binding/src/utils/memory.rs

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

scripts/generate_gemma_reference.py
scripts/generate_qwen3_reference.py

📁 `config`

Owners: @rootfs
Files changed:

config/config.yaml

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/cmd/main.go
src/semantic-router/pkg/api/server.go
src/semantic-router/pkg/config/config.go

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/models.mk
tools/make/rust.mk

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-16T12:30:35Z

Test failure is fixed in main branch, merging it for testing.

….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]>

….6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]>

….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

….6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

* refactor: Implement modular candle-binding architecture (#254) - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. refactor: Implement modular candle-binding architecture - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. Signed-off-by: OneZero-Y <[email protected]> * feat:unit tests for candle refactoring (#296) feat:unit tests for candle refactoring feat:unit tests for candle refactoring Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix:Improve rust unit test and optimize concurrent tests with rayon (#471) - Add 6 new unit test files - Replace std::thread::spawn with rayon::par_iter Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix: resolve syntax errors after rebase Signed-off-by: Huamin Chen <[email protected]> * add additional update Signed-off-by: Huamin Chen <[email protected]> * Change label count params to c_int (#494) Signed-off-by: carlory <[email protected]> * update embedding setting in config (#489) Signed-off-by: Huamin Chen <[email protected]> * make CUDA and Flash Attention 2 optional features (#511) Signed-off-by: OneZero-Y <[email protected]> * fix: Fix duplicate UNIFIED_CLASSIFIER definition and optimize lock contention (#516) - Remove duplicate UNIFIED_CLASSIFIER global state - Optimize PARALLEL_LORA_ENGINE lock contention by using Arc clone Signed-off-by: OneZero-Y <[email protected]> * Merge main to candle refactoring (#523) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Candle refactoring to main (#524) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Merge candle refactoring 3 (#525) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * merge main to feat branch Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * chore: fix unit test (#527) * chore: fix unit test Signed-off-by: Huamin Chen <[email protected]> * fix go vet Signed-off-by: Huamin Chen <[email protected]> * fix ci Signed-off-by: Huamin Chen <[email protected]> * fix ci Signed-off-by: Huamin Chen <[email protected]> * split test-binding to two stages on ci Signed-off-by: Huamin Chen <[email protected]> * ignore test failure due to embeddinggemma restriction Signed-off-by: Huamin Chen <[email protected]> * reorder ci test sequences to avoid missing models Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review (#528) * refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review #266 (comment) Signed-off-by: Huamin Chen <[email protected]> * update tests Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * chore: fix lint error (#530) Signed-off-by: Huamin Chen <[email protected]> * Fix lint error2 (#531) * chore: fix lint error Signed-off-by: Huamin Chen <[email protected]> * chore: fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Co-authored-by: OneZero-Y <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]>

OneZero-Y requested review from Xunzhuo, rootfs and wangchen615 as code owners October 16, 2025 12:25

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Oct 16, 2025

rootfs merged commit fa4f5c7 into vllm-project:feat-candle-refactoring Oct 16, 2025
3 of 4 checks passed

OneZero-Y deleted the feat/support-embedding-models-1 branch October 18, 2025 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

Uh oh!

OneZero-Y commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

rootfs commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

Uh oh!

Conversation

OneZero-Y commented Oct 16, 2025

1. Qwen3-Embedding-0.6B

2. EmbeddingGemma-300M

3. Intelligent Routing System

4. Enhanced API Endpoints

4.1 Embedding Generation API

4.2 Cosine Similarity Calculation API

4.3 Batch Similarity Matching API

4.4 Embedding Models Information API

Configuration

config.yaml

Uh oh!

github-actions bot commented Oct 16, 2025

👥 vLLM Semantic Team Notification

📁 candle-binding

📁 Root Directory

📁 config

📁 src

📁 tools

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

📁 `candle-binding`

📁 `Root Directory`

📁 `config`

📁 `src`

📁 `tools`