You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-14Lines changed: 22 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
<palign="center">
2
-
<imgwidth="450"src="assets/schema-miner-pro-logo.jpg"alt="schema miner pro logo" />
2
+
<imgwidth="450"src="https://github.com/sciknoworg/schema-miner/blob/main/assets/schema-miner-pro-logo.jpg?raw=true"alt="schema miner pro logo" />
3
3
</p>
4
4
5
5
<divalign="center">
@@ -21,14 +21,14 @@ This is an open-source implementation of Schema-Miner<sup>pro</sup>.
21
21
22
22
## 📋 Schema-miner<sup>pro</sup> Overview
23
23
24
-
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner<sup>pro</sup> extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., QUDT). Comprehensive documentation for Schema-Miner Pro, including detailed guides and examples, is available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
24
+
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner<sup>pro</sup> extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., [QUDT](https://www.qudt.org/pages/HomePage.html)). Comprehensive documentation for Schema-Miner Pro, including detailed guides and examples, is available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
Figure 1: Overview of the LLMs4SchemaDiscovery workflow.
31
+
Figure 1: Overview of the LLMs4SchemaDiscovery workflow implemented in the SCHEMA-MINER tool. Stage 1 generates an initial process schema using domain specifications, while Stage 2, refines this schema using a small, curated scientific corpus. In Stage 3, schema is further enriched using a larger, non-curated corpus. The final stage involves grounding the properties in formal ontologies.
32
32
</p>
33
33
34
34
## ⚙️ System Requirements
@@ -52,7 +52,7 @@ For our experiments, we used the following hardware setup:
52
52
53
53
## 🧪 Installation
54
54
55
-
Install the package directly from PyPI:
55
+
Install the package directly from PyPI using ``pip``:
56
56
57
57
```bash
58
58
pip install schema-miner
@@ -75,22 +75,30 @@ For a quick start, see the provided example notebooks highlighting the overall w
| 1 |[Schema Mining With LLMs and expert Example](https://github.com/sciknoworg/schema-miner/blob/main/tutorials/notebooks/schema_mining_with_LLMs_and_expert_example.ipynb)|
Schema_Miner enables schema discovery and refinement through a 3-stage pipeline (Stage 1 to 3) powered by LLMs, domain expertise, and scientific literature. Schema-Miner<sup>pro</sup> extends this pipeline with an automated ontology-grounding component (Stage 4), performing multi-step reasoning and semantic alignment to formal ontologies, while preserving human-in-the-loop validation.
85
+
Schema-Miner enables schema discovery and refinement through a 3-stage pipeline (Stage 1 to 3) powered by LLMs, domain expertise, and scientific literature. Schema-Miner<sup>pro</sup> extends this pipeline with an automated ontology-grounding component (Stage 4), performing multi-step reasoning and semantic alignment to formal ontologies, while preserving human-in-the-loop validation.
86
86
87
87
### 🛠️ Configuration
88
-
Before running schema-miner, configure your environment. For example:
88
+
Before running schema-miner, configure your environment:
89
89
90
90
```python
91
91
from schema_miner.config.envConfig import EnvConfig
Once a process schema is extracted, it can be semantically grounded using the [QUDT Ontologies](https://www.qudt.org/pages/HomePage.html) (Quantities, Units, Dimensions, and Data Types).
193
+
Once a process schema is extracted, it can be semantically grounded using the [QUDT](https://www.qudt.org/pages/HomePage.html) (Quantities, Units, Dimensions, and Data Types) Ontology.
186
194
187
195
The grounding workflow uses either LLM prompting or an agentic LLM approach to align schema fields with QUDT concepts. Following is an example of an agent based qudt grounding.
<imgwidth="450"src="https://github.com/sciknoworg/schema-miner/blob/main/assets/schema-miner-pro-logo.jpg?raw=true"alt="schema-miner pro logo" />
3
3
</p>
4
4
5
5
<divalign="center">
@@ -17,11 +17,11 @@
17
17
18
18
<h3align="center">SCHEMA-MINER<sup>pro</sup>: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow</h3>
19
19
20
-
Schema-Miner is an open-source framework for scientific schema mining. It combines Large Language Models (LLMs) with human-in-the-loop refinement to extract, and semantically ground schema properties from unstructured text. Schema-Miner Pro extends this framework with an automated ontology-grounding component, aligning the schema with formal ontologies (e.g., QUDT). Documentation and usage guides are available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
20
+
Schema-Miner is an open-source framework for scientific schema mining. It combines Large Language Models (LLMs) with human-in-the-loop refinement to extract, and semantically ground schema properties from unstructured text. Schema-Miner Pro extends this framework with an automated ontology-grounding component, aligning the schema with formal ontologies (e.g., [QUDT](https://www.qudt.org/pages/HomePage.html)). Documentation and usage guides are available at [schema-miner.readthedocs.io](https://schema-miner.readthedocs.io/en/latest/).
21
21
22
22
## 🧪 Installation
23
23
24
-
Install the package directly from PyPI:
24
+
Install the package directly from PyPI using ``pip``:
25
25
26
26
```bash
27
27
pip install schema-miner
@@ -39,7 +39,7 @@ pip install -r requirements.txt
39
39
## ⚙️ System Requirements
40
40
Running with OpenAI models (e.g., [**GPT-4o**](https://platform.openai.com/docs/models#gpt-4o), [**GPT-4-turbo**](https://platform.openai.com/docs/models#gpt-4-turbo-and-gpt-4)) requires no special hardware beyond a basic system with internet access, since inference is API-based. For **open-source models** (e.g., [**Llama 3.1 8B**](https://ai.meta.com/blog/meta-llama-3-1/)), local execution is possible on CPU but slow; for practical performance, a GPU with sufficient VRAM (per model specifications) is strongly recommended.
41
41
42
-
For more details, please check the documentation[here](https://schema-miner.readthedocs.io/en/latest/).
42
+
For more details, please check the documentation: [https://schema-miner.readthedocs.io/en/latest/](https://schema-miner.readthedocs.io/en/latest/).
43
43
44
44
## 🚀 Quick Start
45
45
@@ -97,7 +97,7 @@ If you use this repository in your research or applications, please cite the app
97
97
## 👥 Contact & Contributions
98
98
99
99
We’d love to hear from you!
100
-
Whether you're interested in collaborating on `schema miner pro` or have ideas to extend its capabilities, feel free to reach out:
100
+
Whether you're interested in collaborating on `Schema-MinerPro` or have ideas to extend its capabilities, feel free to reach out:
101
101
102
102
-**Collaboration inquiries:** Contact Jennifer D'Souza at jennifer.dsouza [at] tib.eu
Copy file name to clipboardExpand all lines: docs/source/gettingstarted/installation.rst
+22-4Lines changed: 22 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ Schema miner pro is published on PyPI, you can install it directly:
24
24
25
25
.. code-block:: bash
26
26
27
-
pip install -i schema-miner
27
+
pip install schema-miner
28
28
29
29
This will install the latest stable release along with its dependencies.
30
30
@@ -39,12 +39,13 @@ To work with the development version or contribute to the project, clone the Git
39
39
cd schema-miner
40
40
pip install -r requirements.txt
41
41
42
-
.. hint:: This installs the package in editable mode, so changes to the source code are reflected immediately without reinstallation.
43
-
44
42
Configuration of API keys
45
43
*************************
46
44
47
-
Schema-miner uses large language models (LLMs) that require API access (e.g., OpenAI). API keys and other secrets are managed via a .env file at the project root.
45
+
Schema-miner pro uses large language models (LLMs) that require API access (e.g., OpenAI). API keys and other secrets are managed either via a .env file at the project root or with the EnvConfig Class.
46
+
47
+
Configuration Using ``.env``
48
+
----------------------------
48
49
49
50
1. Copy the example configuration file:
50
51
@@ -61,6 +62,23 @@ Schema-miner uses large language models (LLMs) that require API access (e.g., Op
61
62
62
63
3. Schema-miner automatically loads these values at runtime using the provided configuration utilities.
63
64
65
+
Configuration Using ``EnvConfig``
66
+
---------------------------------
67
+
68
+
.. code-block:: python
69
+
70
+
from schema_miner.config.envConfig import EnvConfig
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner :sup:`pro` extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., QUDT).
44
+
Schema-Miner is a novel framework that leverages Large Language Models (LLMs) and continuous human feedback to automate and enhance the schema mining task. Through an iterative process, the framework uses LLMs to extract and organize properties from unstructured text and refines schemas with expert input. Schema-Miner :sup:`pro` extends Schema-Miner with an ontology grounding component powered by agentic AI. It performs multi-step reasoning using lexical heuristics and semantic similarity search, and grounds schema elements in formal ontologies (e.g., `QUDT<https://www.qudt.org/pages/HomePage.html>`_).
0 commit comments