Skip to content

This repository is dedicated to dig a little into the application of Sematic models in Materials Science & Engineering.

Notifications You must be signed in to change notification settings

submerged-in-matrix/Semantic_models_for-MSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini Materials Knowledge Graph (LLM-assisted Ontology)

A short, beginner-friendly project to dive into semantic modeling for materials KG.

🎯 Goal

Build a tiny, end-to-end pipeline that:

  1. defines a small ontology for a narrow materials sub-domain,
  2. ingests a tiny dataset (abstracts or tabular snippets),
  3. uses an LLM-assisted extractor to propose entities/relations,
  4. converts outputs into RDF triples (Linked Data),
  5. loads into a lightweight triplestore and runs a few SPARQL queries.

Target skill: ontology design, semantic consistency, FAIR/linked data, and LLM support—

🧭 Roadmap (minimal scope)

  • Ontology: start with 6–10 classes (e.g., Material, SynthesisMethod, Property, Publication) and 10–20 relations (e.g., hasProperty, synthesizedBy, measuredIn).
  • Sample data: a handful of abstracts or 1–2 CSV tables.
  • LLM-assisted parsing: prototype prompts/rules to extract (subject, predicate, object) candidates.
  • RDF conversion: map to a namespace, produce .ttl or .rdf.
  • Queries: run 3–5 SPARQL queries that demonstrate retrieval & simple reasoning (e.g., “materials synthesized via solvothermal with band gap > X”).
  • (Optional) Basic visualization (screenshot or simple graph view).

Keep it tiny and clean. Depth over breadth.

🛠️ Stack (flexible)

  • Python: rdflib for RDF generation.
  • SPARQL: rdflib SPARQL in-notebook or a local triplestore (e.g., GraphDB Free, QLever Docker, or Apache Jena/Fuseki).
  • LLM parsing: Chatgpt and OLLAMA tried. Ollama is used, chatgpt does not provide api access for a plus user as in my case.
  • Validation: simple checks for semantic consistency (domain/range sanity).

📦 Minimal Setup

# after cloning
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install rdflib pandas
# (Optional for local store) GraphDB/Jena/QLever via Docker if desired

🔍 Planned Sub-domain Ideas

  • MOFs 101 (very small set)
  • Common semiconductors (Si, GaAs, perovskites)
  • Alloys (Fe-based) with 2–3 properties (e.g., density, hardness, band gap if relevant)

🧪 Notebook

All code will live in: notebooks/build_mini_mkg.ipynb

📚 Useful References (for later reading)

Author: Md. Saidul Islam

About

This repository is dedicated to dig a little into the application of Sematic models in Materials Science & Engineering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published