MCP server for the SEMIC Data Modelling Chatbot. It exposes a set of tools and resources that allow an AI agent to validate, analyse, and improve semantic data models against the SEMIC Style Guide.
Resources are static files and directories the agent can read and reference.
| Resource | Path | Description |
|---|---|---|
| SEMIC Style Guide (Excel) | resources/semantic_conventions/style_guide/SEMIC_Style_Guide.xlsx |
Machine-readable conventions and rules of the SEMIC style guide |
| SEMIC Style Guide (Text) | resources/semantic_conventions/style_guide/style_guide.txt |
Human-readable description of the SEMIC style guide |
| User Models | resources/semantic_model/models/<user>/<session>/ |
Uploaded data models — a folder is created per user and session |
Tools are callable functions the agent can invoke to perform analysis and modifications on data models.
Retrieves available resources and model files for a given user and session.
Searches the Azure AI Search index for relevant concepts from standard vocabularies (e.g. FOAF, Schema.org, DCAT). Used to find candidate standard concepts for reuse recommendations.
Validates metadata completeness and terminology consistency of a semantic model against the SEMIC style guide general conventions.
Checks the following rules:
- GC-R3 — Metadata completeness: every class, attribute and association must have a URI, label, definition, and usage note
- GC-R4 — Consistent terminology style across the vocabulary
- GC-R5 — Consistent definition elaboration across the vocabulary
- GC-R7 — Deontic modality indicators must not be used as semantic or normative values
Supports two modes:
- Full model check — inspects all classes, attributes and associations in the model
- Targeted check — inspects only a specified subset of classes/associations using an LLM
Supported model formats: UML XMI JSON and JSON-LD (Turtle).
Generates a step-by-step plan for answering a user's question using the available tools. Acts as a planner agent that produces a structured plan for a separate executor agent.
Reads and modifies the user's uploaded data model. Used to apply corrections or improvements suggested by other tools.
Checks whether classes in a data model properly reuse standard concepts from well-known vocabularies, according to the SEMIC style guide reuse conventions.
For each class, searches the index for relevant standard concepts and evaluates reuse quality:
- Reuse as-is — adopt original URI, label, and definition with no changes
- Terminological adaptation — create a subclass, indicate adaptation, do not change semantics
- Semantic adaptation — create a subclass with new label/definition, document the reuse chain
- Mandatory properties from the original should be included; optional ones only if relevant
- Reuse must be made explicit with notes, hyperlinks, or dereferenceable URIs
Supports full model and targeted (class-level) checks. If no relevant standard is found, suggests domain vocabularies to consider.
Produces a structured Markdown assessment report summarising the results of a full style guide evaluation, combining SHACL validation results, metadata quality checks, and reuse of standards checks into a single human-readable report.
Validates a data model against the SEMIC style guide using the ITB SHACL validator and generates LLM-based explanations for each violation.
- Extracts the Turtle serialization of the model
- Posts it to the ITB validation API
- For each SHACL violation, calls an LLM to produce a human-readable explanation and actionable resolution suggestion
Supported validation types: owl, shacl, uml.
ai4semantics_mcp_server/
│
├── server.py # MCP server entry point
│
├── resources/
│ ├── semantic_conventions/
│ │ └── style_guide/
│ │ ├── SEMIC_Style_Guide.xlsx # Machine-readable SEMIC style guide rules and conventions
│ │ └── style_guide.txt # Human-readable description of the SEMIC style guide
│ └── semantic_model/
│ └── models/
│ └── <user>/<session>/ # Uploaded data models, organised by user and session
│
├── tools/
│ ├── get_resources/ # Lists available resources and models
│ ├── index_search/ # Azure AI Search integration for standard vocabulary lookup
│ ├── model_metadata_checks/ # Metadata completeness and terminology checks (GC-R3/4/5/7)
│ ├── planning_orchestrator/ # Plans and orchestrates multi-step tool execution
│ ├── semantic_model/ # Reads and modifies the user's data model
│ ├── semantic_reuse_of_existing_concepts_checks/ # Reuse of standard concepts checks
│ ├── style_guide_checks/ # Aggregates all checks into a Markdown assessment report
│ └── style_guide_validator/ # SHACL validation via ITB API with LLM explanations
│
├── config.py # Loads and exposes configuration to the application
├── config.yaml # Application configuration (paths, settings)
├── docker-compose.yml # Runs Qdrant vector store locally
├── .env # API keys and environment variables (not committed)
└── .env.sample # Sample environment variable file
git clone <repository-url>
cd ai4semantics_mcp_serverpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCopy .env.sample to .env and fill in your values:
cp .env.sample .envThe index search tool uses a Qdrant vector database to search for relevant concepts from standard vocabularies. We have pre-indexed a set of core vocabularies including Core Vocabularies, Schema.org, and LOV — these are not committed to the repository due to size. You can either request access to the pre-built index or build your own using your own documents.
a) Place your documents in the documents folder:
mkdir -p tools/index_search/load_documents/documentsAdd your files to the documents/ folder. These can be any vocabulary or ontology documents you want the agent to search over.
b) Start the Qdrant database:
docker-compose up -dThis starts Qdrant on localhost:6333 with persistent storage in ./data/qdrant.
c) Run the indexing script:
python tools/index_search/load_documents/load.pyThis loads all documents, creates embeddings, and stores them in Qdrant.
d) (Optional) Test the index:
python tools/index_search/load_documents/retrieve.pyUse retrieve_documents(query) to quickly test the index and verify the top results are relevant.
python -m server