Skip to content

Conversation

@MaxiLein
Copy link
Contributor

@MaxiLein MaxiLein commented Jul 29, 2025

Matching Micro-Service

Overview
This PR introduces the first end‑to‑end version of our "Competence Matcher" microservice. It provides a REST API that lets clients:

  1. Create and store "competence lists" (each a snapshot of resources + associated competencies)
  2. Embed all competency descriptions into a vector database (SQLite + sqlite‑vec)
  3. Match arbitrary task descriptions against those stored competencies, returning the nearest neighbors

Behind the scenes, we:

  • Persist vectors in SQLite via the vec0 extension. Each competence embedding is stored alongside its metadata; at query time, we issue a k‑NN search (cosine / L2) to find the closest matches.
  • Semantically split long competency descriptions with a local LLM (via Ollama). This improves coverage by breaking large text blobs into coherent chunks before embedding.
  • Zero‑shot classify each candidate match to detect "semantic opposites" or contradictions (e.g. task asks for "good in X" vs. competence about "not good in X" We down‑weight or filter out matches whose zero‑shot label ("contradicting" vs. "neutral" vs. "aligning") indicates low relevance.
  • Offload heavy work (embedding and matching) into a pool of worker threads, coordinated by a simple WorkerManager with configurable concurrency. Each job spins up a worker, updates its status in the DB (pending → preprocessing → pending → running → completed/failed), and closes when done.
image

matching-workflow.pdf


TODOs:

  • Currently, each worker thread loads local models into RAM (CPU/GPU) separately, which could lead to the process running out of memory. Hence, I plan on writing a wrapper for transformer.js pipeline. The worker-manager in the main thread should not only manage a worker-pool, but also a (configurable) model-sets. Most likely, SharedArrayBuffer can be used. This does not only mean no redundant models are held in memory, but also that the models are ensured to be available straight away, as the main thread will load them into memory. → opted for dedicated threads (one or multiple) per model
    • Make Matching Worker use the embedding worker for embedding, rahter than running model inference it self
  • While the current workflow checks for contradictions between task and competence, the creation of a competence-list should also trigger a pairwise check for contradictions between competences themselves. If a contradiction is found, a warning should be send back to the client.
  • Proper Error responses
  • Before the server starts, checks for ollama's, as well as models used via huggingface are run, which will cause a modelpull, if they are not already available. Since the ollama instance is behind a nginx proxy and pulling may require quite a long time, getting ollama to pull a model can end in a timeout error (nginx assumes the upstream server - here ollama - never answered and returns a 504 gateway timeout. This should be caught by the server instead of exiting with an error, as it is now.
  • Quantisise a bigger model for embedding (maybe this one)
  • Make semantic splitting deactivatable via env, strings size should be setable as well
  • Logging:
    • Track ip via X-Real-IP header
    • Add log delete (time based) - setable via env
  • Make Figure to show workflow of workerthreads
  • Simplify Ranking to give one clear result
  • Create small benchmark to compare different models
  • Worker lifecycle - kill-and-respawn after x (time/loads/...)? or maybe alive checks/ pings to check for status
  • Add holistic and comprehensive readme with all instructions needed to run, including Flow-Charts

MaxiLein added 15 commits July 29, 2025 15:37
- Update dev script in package.json to watch for .env changes
- Add dotenv dependency for environment variable management
- Modify config to include ollamaBearerToken from environment variables
- Ensure asynchronous model loading in ensureAllHuggingfaceModelsAreAvailable
- Include ollamaBearerToken in Ollama instance headers
@github-actions

This comment has been minimized.

@MaxiLein MaxiLein marked this pull request as ready for review August 19, 2025 11:56
…r service

- Introduced custom error classes for better error context and handling.
- Updated middleware to handle database errors and validation errors gracefully.
- Improved logging for worker management, model initialization, and semantic splitting tasks.
- Added verbose logging options to provide detailed runtime information.
- Refactored resource retrieval functions to throw specific errors for better debugging.
- Enhanced the reasoning and semantic splitting tasks with detailed error logging.
- Implemented error handling in worker management to capture and log worker failures.
- Updated the server initialisation process to handle model availability checks with error handling.
@github-actions

This comment has been minimized.

…atching tasks

- Updated default batch size for Ollama from 5 to 20.
- Introduced new configuration options for embedding and matching workers.
- Improved error handling and logging in worker processes.
- Refactored worker manager to support static worker pools for embedding and matching tasks.
- Added health check mechanism for worker responsiveness.
- Implemented job processing logic to handle multiple tasks efficiently.
- Enhanced logging for better traceability of worker actions and statuses.
- Replaced console logging with a centralized logger in model, ollama, worker, and embedder modules.
- Introduced structured logging with log levels (DEBUG, INFO, WARN, ERROR) and log types (server, request, worker, etc.).
- Enhanced worker context management to propagate request IDs and log worker activities.
- Removed verbose flag usage and replaced it with appropriate logging levels.
- Improved error handling and logging in worker pools and job processing.
- Cleaned up deprecated log structures and ensured consistent logging practices throughout the codebase.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@MaxiLein MaxiLein marked this pull request as ready for review January 20, 2026 10:51
@github-actions
Copy link

CLOUDRUN ACTIONS

✅ Successfully created Preview Deployment.

https://pr-627---ms-server-staging-c4f6qdpj7q-ew.a.run.app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants