Skip to content

feat: Replace mock LLM with local SmolLM2-135M inference#3014

Draft
ps48 wants to merge 2 commits intoopen-telemetry:mainfrom
ps48:feature/local-llm
Draft

feat: Replace mock LLM with local SmolLM2-135M inference#3014
ps48 wants to merge 2 commits intoopen-telemetry:mainfrom
ps48:feature/local-llm

Conversation

@ps48
Copy link
Contributor

@ps48 ps48 commented Feb 19, 2026

Changes

Summary

  • Replace the mock LLM service with SmolLM2-135M running locally via llama-cpp-python, removing the need for an external API while still exposing an OpenAI-compatible /v1/chat/completions endpoint
  • Simplify product_reviews_server.py by dropping the custom tool-calling flow in favor of a direct chat completion with review context in the prompt
  • Add manual OpenTelemetry GenAI semantic-convention spans and metrics to the LLM service (token usage histogram, operation duration)
  • Pin OTel library versions in src/llm/requirements.txt to 1.39.1 / 0.60b1
  • Update comments in .env, .env.override, and src/product-reviews/README.md to reflect the new default model

Merge Requirements

For new features contributions, please make sure you have completed the following
essential items:

  • CHANGELOG.md updated to document new feature additions
  • Appropriate documentation updates in the docs
  • Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

Signed-off-by: ps48 <pshenoy36@gmail.com>
@github-actions github-actions bot added the helm-update-required Requires an update to the Helm chart when released label Feb 19, 2026
@ps48 ps48 changed the title Replace mock llm service with a smollm feat: Replace mock LLM with local SmolLM2-135M inference Feb 19, 2026
Signed-off-by: ps48 <pshenoy36@gmail.com>
@kylehounslow
Copy link

Real inference, nice! One thought on the instrumentation side:

The product_reviews_server.py already uses the openai Python client to talk to this service. Instead of hand-rolling the GenAI spans and metrics in the LLM server, we could instrument the client side with opentelemetry-instrumentation-openai-v2. That would:

  1. Dogfood OTel's own instrumentation library in OTel's own demo, which feels like the right place to showcase it
  2. Stay in sync with semconv changes as the library evolves, vs maintaining attribute names by hand
  3. Drop ~60 lines of manual OTel setup from src/llm/app.py (TracerProvider, MeterProvider, manual span creation, histogram recording, the resource attribute parsing, etc.)

The LLM service can stay simple: Flask + llama-cpp-python + Flask auto-instrumentation. The GenAI telemetry makes more sense on the client side where the openai SDK lives.

Setup would just be:

from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

helm-update-required Requires an update to the Helm chart when released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments