feat: Replace mock LLM with local SmolLM2-135M inference by ps48 · Pull Request #3014 · open-telemetry/opentelemetry-demo

ps48 · 2026-02-19T22:11:38Z

Changes

Summary

Replace the mock LLM service with SmolLM2-135M running locally via llama-cpp-python, removing the need for an external API while still exposing an OpenAI-compatible /v1/chat/completions endpoint
Simplify product_reviews_server.py by dropping the custom tool-calling flow in favor of a direct chat completion with review context in the prompt
Add manual OpenTelemetry GenAI semantic-convention spans and metrics to the LLM service (token usage histogram, operation duration)
Pin OTel library versions in src/llm/requirements.txt to 1.39.1 / 0.60b1
Update comments in .env, .env.override, and src/product-reviews/README.md to reflect the new default model

Merge Requirements

For new features contributions, please make sure you have completed the following
essential items:

CHANGELOG.md updated to document new feature additions
Appropriate documentation updates in the docs
Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

Signed-off-by: ps48 <pshenoy36@gmail.com>

kylehounslow · 2026-02-19T23:31:49Z

Real inference, nice! One thought on the instrumentation side:

The product_reviews_server.py already uses the openai Python client to talk to this service. Instead of hand-rolling the GenAI spans and metrics in the LLM server, we could instrument the client side with opentelemetry-instrumentation-openai-v2. That would:

Dogfood OTel's own instrumentation library in OTel's own demo, which feels like the right place to showcase it
Stay in sync with semconv changes as the library evolves, vs maintaining attribute names by hand
Drop ~60 lines of manual OTel setup from src/llm/app.py (TracerProvider, MeterProvider, manual span creation, histogram recording, the resource attribute parsing, etc.)

The LLM service can stay simple: Flask + llama-cpp-python + Flask auto-instrumentation. The GenAI telemetry makes more sense on the client side where the openai SDK lives.

Setup would just be:

from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

What do you think?

Replace mock llm service with a smollm

b40d218

Signed-off-by: ps48 <pshenoy36@gmail.com>

github-actions bot added the helm-update-required Requires an update to the Helm chart when released label Feb 19, 2026

ps48 changed the title ~~Replace mock llm service with a smollm~~ feat: Replace mock LLM with local SmolLM2-135M inference Feb 19, 2026

Add changelog and update git install

f7be68b

Signed-off-by: ps48 <pshenoy36@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Replace mock LLM with local SmolLM2-135M inference#3014

feat: Replace mock LLM with local SmolLM2-135M inference#3014
ps48 wants to merge 2 commits intoopen-telemetry:mainfrom
ps48:feature/local-llm

ps48 commented Feb 19, 2026 •

edited

Loading

Uh oh!

kylehounslow commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ps48 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary

Merge Requirements

Uh oh!

kylehounslow commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ps48 commented Feb 19, 2026 •

edited

Loading