Skip to content

Commit 8adc08e

Browse files
committed
Updated readme and documentation
1 parent 0a7e894 commit 8adc08e

File tree

6 files changed

+59
-25
lines changed

6 files changed

+59
-25
lines changed

README.md

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img width="450" src="assets/schema-miner-pro-logo.jpg" alt="schema miner pro logo" />
2+
<img width="450" src="https://github.com/sciknoworg/schema-miner/blob/main/assets/schema-miner-pro-logo.jpg?raw=true" alt="schema miner pro logo" />
33
</p>
44

55
<div align="center">
@@ -21,14 +21,14 @@ This is an open-source implementation of Schema-Miner<sup>pro</sup>.
2121

2222
## 📋 Schema-miner<sup>pro</sup> Overview
2323

24-
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner<sup>pro</sup> extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., QUDT). Comprehensive documentation for Schema-Miner Pro, including detailed guides and examples, is available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
24+
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner<sup>pro</sup> extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., [QUDT](https://www.qudt.org/pages/HomePage.html)). Comprehensive documentation for Schema-Miner Pro, including detailed guides and examples, is available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
2525

2626
<p align="center">
2727
<img src="https://raw.githubusercontent.com/sciknoworg/schema-miner/refs/heads/main/assets/LLM4SchemaMining%20-%20Workflow%20design.svg" height="300">
2828
</p>
2929

3030
<p align="center">
31-
Figure 1: Overview of the LLMs4SchemaDiscovery workflow.
31+
Figure 1: Overview of the LLMs4SchemaDiscovery workflow implemented in the SCHEMA-MINER tool. Stage 1 generates an initial process schema using domain specifications, while Stage 2, refines this schema using a small, curated scientific corpus. In Stage 3, schema is further enriched using a larger, non-curated corpus. The final stage involves grounding the properties in formal ontologies.
3232
</p>
3333

3434
## ⚙️ System Requirements
@@ -52,7 +52,7 @@ For our experiments, we used the following hardware setup:
5252

5353
## 🧪 Installation
5454

55-
Install the package directly from PyPI:
55+
Install the package directly from PyPI using ``pip``:
5656

5757
```bash
5858
pip install schema-miner
@@ -75,22 +75,30 @@ For a quick start, see the provided example notebooks highlighting the overall w
7575

7676
| | Notebook |
7777
| --- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
78-
| 1 | [Schema Mining With LLMs and expert Example](tutorials/notebooks/schema_mining_with_LLMs_and_expert_example.ipynb) |
79-
| 2 | [Schema Ontology Grounding Example](tutorials/notebooks/schema_mining_ontology_grounding_example.ipynb) |
78+
| 1 | [Schema Mining With LLMs and expert Example](https://github.com/sciknoworg/schema-miner/blob/main/tutorials/notebooks/schema_mining_with_LLMs_and_expert_example.ipynb) |
79+
| 2 | [Schema Ontology Grounding Example](https://github.com/sciknoworg/schema-miner/blob/main/tutorials/notebooks/schema_mining_ontology_grounding_example.ipynb) |
8080

8181
</div>
8282

8383
## 🧑‍💻 Schema-miner<sup>pro</sup> Tool Usage
8484

85-
Schema_Miner enables schema discovery and refinement through a 3-stage pipeline (Stage 1 to 3) powered by LLMs, domain expertise, and scientific literature. Schema-Miner<sup>pro</sup> extends this pipeline with an automated ontology-grounding component (Stage 4), performing multi-step reasoning and semantic alignment to formal ontologies, while preserving human-in-the-loop validation.
85+
Schema-Miner enables schema discovery and refinement through a 3-stage pipeline (Stage 1 to 3) powered by LLMs, domain expertise, and scientific literature. Schema-Miner<sup>pro</sup> extends this pipeline with an automated ontology-grounding component (Stage 4), performing multi-step reasoning and semantic alignment to formal ontologies, while preserving human-in-the-loop validation.
8686

8787
### 🛠️ Configuration
88-
Before running schema-miner, configure your environment. For example:
88+
Before running schema-miner, configure your environment:
8989

9090
```python
9191
from schema_miner.config.envConfig import EnvConfig
92+
93+
# OpenAI Keys
9294
EnvConfig.OPENAI_api_key = '<insert-your-openai-key>'
9395
EnvConfig.OPENAI_organization_id = '<insert-your-openi-organization-id>'
96+
97+
# Ollama
98+
EnvConfig.OLLAMA_base_url = '<Ollama Base URL or empty if Ollama running locally>'
99+
100+
# HuggingFace
101+
EnvConfig.HUGGINGFACE_access_token = '<Your huggingface access token>'
94102
```
95103

96104
### 📂 Data Setup
@@ -139,7 +147,7 @@ llm_model_name = 'gpt-4o'
139147
process_specification = pdf_text_extractor(process_specification_filepath, process_specification_filename, return_text = True)
140148

141149
# Extract schema
142-
results_file_path = Path("./results/stage-1/Atomic-Layer-Deposition/experimental-schema")
150+
results_file_path = "./results/stage-1/Atomic-Layer-Deposition/experimental-schema"
143151
schema = extract_schema_stage1(llm_model_name, process_specification, results_file_path, save_schema = True)
144152
```
145153

@@ -156,7 +164,7 @@ expert_review = Path("./data/stage-2/Atomic-Layer-Deposition/domain-expert-revie
156164
scientific_paper = pdf_text_extractor(scientific_paper_stage2_dir, '1 Groner et al.pdf', return_text = True)
157165

158166
# Refine schema
159-
results_file_path = Path("./results/stage-2/Atomic-Layer-Deposition/experimental-schema")
167+
results_file_path = "./results/stage-2/Atomic-Layer-Deposition/experimental-schema"
160168
schema = extract_schema_stage2(llm_model_name, schema, expert_review, scientific_paper, results_file_path, save_schema = True)
161169
```
162170

@@ -173,7 +181,7 @@ expert_review = Path("./data/stage-3/Atomic-Layer-Deposition/domain-expert-revie
173181
scientific_paper = pdf_text_extractor(scientific_paper_stage3_dir, '1-Mattinen et al.pdf', return_text = True)
174182

175183
# Finalize schema
176-
results_file_path = Path("./results/stage-3/Atomic-Layer-Deposition/experimental-schema")
184+
results_file_path = "./results/stage-3/Atomic-Layer-Deposition/experimental-schema"
177185
schema = extract_schema_stage3(llm_model_name, schema, expert_review, scientific_paper, results_file_path, save_schema = True)
178186

179187
# View Final Schema
@@ -182,7 +190,7 @@ logging.info(f"{ProcessConfig.Process_name} Schema:\n{json.dumps(schema, indent=
182190

183191
### 🌐 Stage 4 – Ontology Grounding with QUDT
184192

185-
Once a process schema is extracted, it can be semantically grounded using the [QUDT Ontologies](https://www.qudt.org/pages/HomePage.html) (Quantities, Units, Dimensions, and Data Types).
193+
Once a process schema is extracted, it can be semantically grounded using the [QUDT](https://www.qudt.org/pages/HomePage.html) (Quantities, Units, Dimensions, and Data Types) Ontology.
186194

187195
The grounding workflow uses either LLM prompting or an agentic LLM approach to align schema fields with QUDT concepts. Following is an example of an agent based qudt grounding.
188196

@@ -194,7 +202,7 @@ llm_model_name = 'gpt-4o'
194202

195203
# Ground the schema with QUDT Ontology
196204
process_schema = Path('./results/Ideal Schema/Atomic-Layer-Deposition/experimental-ideal-schema.json')
197-
results_file_path = Path("./results/qudt-grounded/Atomic-Layer-Deposition/experimental-schema")
205+
results_file_path = "./results/qudt-grounded/Atomic-Layer-Deposition/experimental-schema"
198206
schema = agentic_qudt_grounding(llm_model_name, process_schema, results_file_path, save_schema = True)
199207

200208
# Display grounded schema
@@ -244,7 +252,7 @@ If you use this repository in your research or applications, please cite the app
244252
## 👥 Contact & Contributions
245253

246254
We’d love to hear from you!
247-
Whether you're interested in collaborating on `schema miner pro` or have ideas to extend its capabilities, feel free to reach out:
255+
Whether you're interested in collaborating on `Schema-MinerPro` or have ideas to extend its capabilities, feel free to reach out:
248256

249257
- **Collaboration inquiries:** Contact Jennifer D'Souza at jennifer.dsouza [at] tib.eu
250258

README_PYPI.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img width="450" src="assets/schema-miner-pro-logo.jpg" alt="schema-miner logo" />
2+
<img width="450" src="https://github.com/sciknoworg/schema-miner/blob/main/assets/schema-miner-pro-logo.jpg?raw=true" alt="schema-miner pro logo" />
33
</p>
44

55
<div align="center">
@@ -17,11 +17,11 @@
1717

1818
<h3 align="center">SCHEMA-MINER<sup>pro</sup>: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow</h3>
1919

20-
Schema-Miner is an open-source framework for scientific schema mining. It combines Large Language Models (LLMs) with human-in-the-loop refinement to extract, and semantically ground schema properties from unstructured text. Schema-Miner Pro extends this framework with an automated ontology-grounding component, aligning the schema with formal ontologies (e.g., QUDT). Documentation and usage guides are available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
20+
Schema-Miner is an open-source framework for scientific schema mining. It combines Large Language Models (LLMs) with human-in-the-loop refinement to extract, and semantically ground schema properties from unstructured text. Schema-Miner Pro extends this framework with an automated ontology-grounding component, aligning the schema with formal ontologies (e.g., [QUDT](https://www.qudt.org/pages/HomePage.html)). Documentation and usage guides are available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
2121

2222
## 🧪 Installation
2323

24-
Install the package directly from PyPI:
24+
Install the package directly from PyPI using ``pip``:
2525

2626
```bash
2727
pip install schema-miner
@@ -39,7 +39,7 @@ pip install -r requirements.txt
3939
## ⚙️ System Requirements
4040
Running with OpenAI models (e.g., [**GPT-4o**](https://platform.openai.com/docs/models#gpt-4o), [**GPT-4-turbo**](https://platform.openai.com/docs/models#gpt-4-turbo-and-gpt-4)) requires no special hardware beyond a basic system with internet access, since inference is API-based. For **open-source models** (e.g., [**Llama 3.1 8B**](https://ai.meta.com/blog/meta-llama-3-1/)), local execution is possible on CPU but slow; for practical performance, a GPU with sufficient VRAM (per model specifications) is strongly recommended.
4141

42-
For more details, please check the documentation [here](https://schema-miner.readthedocs.io/en/latest/).
42+
For more details, please check the documentation: [https://schema-miner.readthedocs.io/en/latest/](https://schema-miner.readthedocs.io/en/latest/).
4343

4444
## 🚀 Quick Start
4545

@@ -97,7 +97,7 @@ If you use this repository in your research or applications, please cite the app
9797
## 👥 Contact & Contributions
9898

9999
We’d love to hear from you!
100-
Whether you're interested in collaborating on `schema miner pro` or have ideas to extend its capabilities, feel free to reach out:
100+
Whether you're interested in collaborating on `Schema-MinerPro` or have ideas to extend its capabilities, feel free to reach out:
101101

102102
- **Collaboration inquiries:** Contact Jennifer D'Souza at jennifer.dsouza [at] tib.eu
103103

docs/source/gettingstarted/installation.rst

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Schema miner pro is published on PyPI, you can install it directly:
2424

2525
.. code-block:: bash
2626
27-
pip install -i schema-miner
27+
pip install schema-miner
2828
2929
This will install the latest stable release along with its dependencies.
3030

@@ -39,12 +39,13 @@ To work with the development version or contribute to the project, clone the Git
3939
cd schema-miner
4040
pip install -r requirements.txt
4141
42-
.. hint:: This installs the package in editable mode, so changes to the source code are reflected immediately without reinstallation.
43-
4442
Configuration of API keys
4543
*************************
4644

47-
Schema-miner uses large language models (LLMs) that require API access (e.g., OpenAI). API keys and other secrets are managed via a .env file at the project root.
45+
Schema-miner pro uses large language models (LLMs) that require API access (e.g., OpenAI). API keys and other secrets are managed either via a .env file at the project root or with the EnvConfig Class.
46+
47+
Configuration Using ``.env``
48+
----------------------------
4849

4950
1. Copy the example configuration file:
5051

@@ -61,6 +62,23 @@ Schema-miner uses large language models (LLMs) that require API access (e.g., Op
6162
6263
3. Schema-miner automatically loads these values at runtime using the provided configuration utilities.
6364

65+
Configuration Using ``EnvConfig``
66+
---------------------------------
67+
68+
.. code-block:: python
69+
70+
from schema_miner.config.envConfig import EnvConfig
71+
72+
# OpenAI Keys
73+
EnvConfig.OPENAI_api_key = '<insert-your-openai-key>'
74+
EnvConfig.OPENAI_organization_id = '<insert-your-openi-organization-id>'
75+
76+
# Ollama
77+
EnvConfig.OLLAMA_base_url = '<Ollama Base URL or empty if Ollama running locally>'
78+
79+
# HuggingFace
80+
EnvConfig.HUGGINGFACE_access_token = '<Your huggingface access token>'
81+
6482
Next steps
6583
**********
6684

docs/source/gettingstarted/quickstart.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,17 @@ Also make sure your LLM keys/config are set:
4343
.. code-block:: python
4444
4545
from schema_miner.config.envConfig import EnvConfig
46+
47+
# OpenAI Keys
4648
EnvConfig.OPENAI_api_key = '<insert-your-openai-key>'
4749
EnvConfig.OPENAI_organization_id = '<insert-your-openi-organization-id>'
4850
51+
# Ollama
52+
EnvConfig.OLLAMA_base_url = '<Ollama Base URL or empty if Ollama running locally>'
53+
54+
# HuggingFace
55+
EnvConfig.HUGGINGFACE_access_token = '<Your huggingface access token>'
56+
4957
Step 3: Stage 1 - Initial Schema Mining
5058
***************************************
5159

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
SCHEMA-MINER :sup:`Pro`: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow
4242
************************************************************************************************************************************
4343

44-
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner :sup:`pro` extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., QUDT).
44+
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner :sup:`pro` extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., `QUDT <https://www.qudt.org/pages/HomePage.html>`_).
4545

4646
Below is the workflow diagram of Schema-Miner.
4747

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "schema_miner"
7-
version = "2.0.0"
7+
version = "2.0.1"
88
description = "A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models"
99
authors = [
1010
{name = "Sameer Sadruddin", email = "[email protected]"},

0 commit comments

Comments
 (0)