feat(backend): locate-anything-cpp (open-vocabulary object detection via ggml)#10264
Merged
Conversation
…via la_capi) A Go/purego backend wrapping locate-anything.cpp's la_capi C ABI, implementing the gRPC Detect RPC: image + open-vocabulary text prompt -> labeled boxes. Mirrors backend/go/rfdetr-cpp; static-links ggml into a per-CPU-variant .so. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
| detections := make([]*pb.Detection, 0, n) | ||
| for i := int32(0); i < n; i++ { | ||
| var xyxy [4]float32 // x1, y1, x2, y2 | ||
| if CapiGetDetectionBox(r.handle, i, uintptr(unsafe.Pointer(&xyxy[0]))) != 0 { |
| need := CapiGetDetectionLabel(r.handle, i, 0, 0) | ||
| if need > 0 { | ||
| buf := make([]byte, need) | ||
| CapiGetDetectionLabel(r.handle, i, uintptr(unsafe.Pointer(&buf[0])), need) |
733b646 to
188cad1
Compare
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io>
…3B + image, runs Detect) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a new LocalAI backend
locate-anything-cppfor open-vocabulary object detection (visual grounding): given an image and a free-text prompt, it returns labeled boxes. It wraps locate-anything.cpp - a C++/ggml port of NVIDIA'sLocateAnything-3B- via its flat C ABI.Same family as the existing detection backends: it mirrors
backend/go/rfdetr-cpp/almost exactly (Go + puregoDlopen, no cgo; a CMakeMODULElib static-links ggml into a per-CPU-variant.so) and implements the gRPCDetectRPC. Open-vocab means it consumesDetectOptions.prompt(the same field SAM3 uses).How it maps
Detect(DetectOptions{Src, Prompt})-> base64 image to temp file ->la_capi_locate_path(ctx, img, prompt, hybrid)->DetectResponse{ Detection{X, Y, Width, Height, Confidence=1.0, ClassName=label} }. A prompt is required (open-vocabulary).la_capiexposes a label per box (no class-id/score/mask), soConfidenceis reported as1.0(greedy detection is deterministic)..soonly depends on libc/libstdc++/libgomp (bundled by the package step), like rfdetr-cpp.Changes
backend/go/locate-anything-cpp/- the backend (main.go, golocateanythingcpp.go, CMakeLists.txt, Makefile, run.sh, package.sh, + Load/Detect wire test)..github/backend-matrix.yml- 9 build entries mirroring rfdetr-cpp (cuda-12/13, l4t-arm64, cpu, intel-sycl f16/f32, vulkan).backend/index.yaml+core/gallery/importers/locate-anything.go(+ test) - gallery entry + model importer (routes locate-anything GGUFs to this backend).Makefile- wired intotest-extra.Validation
Built locally (4
.sovariants + the Go binary,CGO_ENABLED=0,go vet/gofmtclean). TheDetectwire test passes end-to-end on the real model: gRPCLoadModel(q8_0 GGUF) ->Detecton a COCO street image with promptperson</c>car-> labeled boxes (TestDetectPASS, ~50s on CPU). Importer tests pass (go test ./core/gallery/importers/...).Follow-ups
bump_deps.yaml: the daily auto-bump registration forLOCATEANYTHING_VERSIONis not in this PR - the pushing token lackedworkflowscope. It should be added to.github/workflows/bump_deps.yaml:Assisted-by:trailer per the AI-assistance policy; they need the human submitter'sSigned-off-byto pass the DCO check.AI-assisted (Claude Code, Opus 4.8); the human submitter owns, reviews, and is responsible for the change.