sciknoworg · HamedBabaei · Oct 22, 2025 · Oct 13, 2025 · Oct 22, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,16 @@
 ## Changelog
 
+### v1.4.7 (October 1, 2025)
+- add custom LLM based learner
+- add Falcon-H and Mistral-Small custom AutoLLMs.
+- Add custom LLm documentations.
+- Minor bug fix and improvements in documentation and code.
+
+### v1.4.6 (September 22, 2025)
+- add type annotation to metrics
+- add minor fix to retriever taxonomy discovery
+- add count metrics in evaluation.
+
 ### v1.4.5 (September 16, 2025)
 - add batch retriever feature to `AutoRetrieverLearner`
 

diff --git a/CITATION.cff b/CITATION.cff
@@ -31,5 +31,5 @@ keywords:
   - Large Language Models
   - Text-to-ontology
 license: MIT
-version: 1.4.5
+version: 1.4.7
 date-released: '2025'
diff --git a/docs/source/learners/llm.rst b/docs/source/learners/llm.rst
@@ -135,5 +135,155 @@ The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that
     # Print all returned outputs (include predictions)
     print(outputs)
 
+
+Custom AutoLLM
+-----------------
+
+OntoLearner provides a default ``AutoLLM`` wrapper for handling popular model families (Mistral, Llama, Qwen, etc.) through HuggingFace or external providers. However, in some cases you may want to integrate a model family that is not natively supported (e.g., Falcon, DeepSeek, or a proprietary LLM).
+
+For this, you can extend the ``AutoLLM`` class and implement the required
+``load`` and ``generate`` methods. Basic requirements are:
+
+1. Inherit from ``AutoLLM``
+2. Implement ``load(model_id)``, if your loging model is different (as an example `mistralai/Mistral-Small-3.2-24B-Instruct-2506 <https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>`_ uses different type of loading)
+3. Implement ``generate(inputs, max_new_tokens)`` to encodes prompts, performs generation, decodes outputs, and maps them to labels.
+
+
+.. tab:: Falcon-H
+
+	The following example shows how to build a Falcon integration:
+
+	::
+
+	    from ontolearner import AutoLLM
+	    from typing import List
+	    import torch
+
+	    class FalconLLM(AutoLLM):
+
+	        def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
+	            encoded_inputs = self.tokenizer(
+	                inputs,
+	                return_tensors="pt",
+	                padding=True,
+	                truncation=True
+	            ).to(self.model.device)
+
+	            input_ids = encoded_inputs["input_ids"]
+	            input_length = input_ids.shape[1]
+
+	            outputs = self.model.generate(
+	                input_ids,
+	                max_new_tokens=max_new_tokens,
+	                pad_token_id=self.tokenizer.eos_token_id
+	            )
+
+	            generated_tokens = outputs[:, input_length:]
+	            decoded_outputs = [
+	                self.tokenizer.decode(g, skip_special_tokens=True).strip()
+	                for g in generated_tokens
+	            ]
+
+	            return self.label_mapper.predict(decoded_outputs)
+
+.. tab:: Mistral-Small
+
+	For Mistral, you can integrate the official ``mistral-common`` tokenizer and chat completion interface:
+
+	::
+
+		from ontolearner import AutoLLM
+		from typing import List
+		import torch
+
+		class MistralLLM(AutoLLM):
+
+		    def load(self, model_id: str) -> None:
+		        from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
+		        from mistral_common.models.modeling_mistral import Mistral3ForConditionalGeneration
+
+		        self.tokenizer = MistralTokenizer.from_hf_hub(model_id)
+
+		        device_map = "cpu" if self.device == "cpu" else "balanced"
+		        self.model = Mistral3ForConditionalGeneration.from_pretrained(
+		            model_id,
+		            device_map=device_map,
+		            torch_dtype=torch.bfloat16,
+		            token=self.token
+		        )
+
+		        if not hasattr(self.tokenizer, "pad_token_id") or self.tokenizer.pad_token_id is None:
+		            self.tokenizer.pad_token_id = self.model.generation_config.eos_token_id
+
+		        self.label_mapper.fit()
+
+		    def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
+		        from mistral_common.protocol.instruct.messages import ChatCompletionRequest
+
+		        tokenized_list = []
+		        for prompt in inputs:
+		            messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
+		            tokenized = self.tokenizer.encode_chat_completion(ChatCompletionRequest(messages=messages))
+		            tokenized_list.append(tokenized.tokens)
+
+		        # Pad inputs and create attention masks
+		        max_len = max(len(tokens) for tokens in tokenized_list)
+		        input_ids, attention_masks = [], []
+		        for tokens in tokenized_list:
+		            pad_length = max_len - len(tokens)
+		            input_ids.append(tokens + [self.tokenizer.pad_token_id] * pad_length)
+		            attention_masks.append([1] * len(tokens) + [0] * pad_length)
+
+		        input_ids = torch.tensor(input_ids).to(self.model.device)
+		        attention_masks = torch.tensor(attention_masks).to(self.model.device)
+
+		        outputs = self.model.generate(
+		            input_ids=input_ids,
+		            attention_mask=attention_masks,
+		            eos_token_id=self.model.generation_config.eos_token_id,
+		            pad_token_id=self.tokenizer.pad_token_id,
+		            max_new_tokens=max_new_tokens,
+		        )
+
+		        decoded_outputs = []
+		        for i, tokens in enumerate(outputs):
+		            output_text = self.tokenizer.decode(tokens[len(tokenized_list[i]):])
+		            decoded_outputs.append(output_text)
+
+		        return self.label_mapper.predict(decoded_outputs)
+
+
+Once your custom class is defined, you can pass it into ``AutoLLMLearner``:
+
+.. code-block:: python
+
+    from ontolearner import AutoLLMLearner, LabelMapper, StandardizedPrompting
+
+    falcon_learner = AutoLLMLearner(
+        prompting=StandardizedPrompting,
+        label_mapper=LabelMapper(),
+        llm=FalconLLM,      # 👈 plug in custom Falcon
+        token="...",
+        device="cuda"
+    )
+
+    falcon_learner.llm.load(model_id="tiiuae/Falcon-H1-1.5B-Deep-Instruct")
+
+    # Train and evaluate
+    falcon_learner.fit(train_data, task="term-typing")
+    predictions = falcon_learner.predict(test_data, task="term-typing")
+
+    print(predictions)
+
+The following models are specialized within the OntoLearner:
+
+- To use `mistralai/Mistral-Small-3.2-24B-Instruct-2506 <https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>`_ you can use ``MistralLLM`` instead of ``AutoLLM``.
+- To use `Falcon-H` series of LLMs (e.g. `tiiuae/Falcon-H1-1.5B-Deep-Instruct <https://huggingface.co/tiiuae/Falcon-H1-1.5B-Deep-Instruct>`_ you can ``FalconLLM`` instead of ``AutoLLM``.
+
+.. note::
+
+   You can implement as many custom AutoLLM classes as needed (e.g., for proprietary APIs, local models, or new HF releases). As long as they subclass ``AutoLLM`` and implement ``load`` + ``generate``, they will work seamlessly with ``AutoLLMLearner``.
+
+
 .. hint::
     See `Learning Tasks <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_ for possible tasks within Learners.