MLCIL · j-adamczyk · Dec 27, 2025 · Dec 20, 2025 · Dec 20, 2025 · Dec 20, 2025
@@ -12,17 +12,17 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Checkout repo
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
         with:
           fetch-depth: 0
 
       - name: Set up Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
         with:
           python-version: "3.10"
 
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
 
       # set version (e.g. 1.2.3) from the latest Git tag on the master branch
       - name: Set package release version

@@ -22,17 +22,17 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Checkout repo
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
         with:
           fetch-depth: 0
 
       - name: Set up Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
         with:
           python-version: "3.10"
 
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
 
       # set version (e.g. 1.2.3) from the latest Git tag on master branch
       - name: Set package release version

@@ -14,29 +14,28 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        # specific version in 3.13 due to bug https://github.com/python/cpython/issues/138031
-        python-version: ["3.10", "3.11", "3.12", "3.13.6"]
+        python-version: ["3.10", "3.11", "3.12", "3.13"]
         os: [macos-latest, ubuntu-latest, windows-latest]
     runs-on: ${{ matrix.os }}
     steps:
       - name: Checkout repo
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
 
       - name: Set up Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
         with:
           python-version: ${{ matrix.python-version }}
 
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
 
-      - uses: actions/cache@v4
+      - uses: actions/cache@v5
         name: Cache venv
         with:
           path: ./.venv
           key: ${{ matrix.os }}-venv-${{ matrix.python-version }}-${{ hashFiles('**/uv.lock') }}
 
-      - uses: actions/cache@v4
+      - uses: actions/cache@v5
         name: Cache datasets
         with:
           path: ~/scikit_learn_data

@@ -9,22 +9,9 @@ repos:
         language: system
         pass_filenames: false
 
-  - repo: https://github.com/pypa/pip-audit
-    rev: v2.9.0
-    hooks:
-      - id: pip-audit
-        args: [
-          --vulnerability-service, "pypi",
-          --cache-dir, ".pip_audit_cache",
-          # false alert for setuptools, we have a much newer version
-          --ignore-vuln, "GHSA-5rjg-fvgr-3xxf",
-          # false alert for pip
-          --ignore-vuln, "GHSA-4xh5-x5gv-qwph"
-        ]
-
   - repo: https://github.com/astral-sh/ruff-pre-commit
     rev: v0.13.1
     hooks:
       - id: ruff-check  # linter
-        args: [ --fix ]
+        args: [ --fix, --exit-zero ]
       - id: ruff-format  # formatter
@@ -53,12 +53,14 @@ Main features:
 
 |             | `python3.10` | `python3.11` | `python3.12` | `python3.13` |
 |:-----------:|:------------:|:------------:|:------------:|:------------:|
-|  **Linux**  |       ✅      |       ✅      |       ✅      |       ✅      |
-| **Windows** |       ✅      |       ✅      |       ✅      |       ✅      |
-|  **macOS**  |       ✅      |       ✅      |       ✅      |       ✅      |
+|  **Linux**  |       ✅      |       ✅      |       ✅      |      ✅       |
+| **Windows** |       ✅      |       ✅      |       ✅      |      ✅       |
+|  **macOS**  |       ✅      |       ✅      |       ✅      |      ✅       |
 
 Python 3.9 was supported up to scikit-fingerprints 1.13.0.
 
+Python 3.13 is officially supported, but underlying libraries may not be fully compatible yet.
+
 ## Installation
 
 You can install the library using pip:
@@ -159,7 +161,7 @@ Examples and tutorials:
 
 ## Project overview
 
-`scikit-fingerprint` brings molecular fingerprints and related functionalities into
+`scikit-fingerprints` brings molecular fingerprints and related functionalities into
 the scikit-learn ecosystem. With familiar class-based design and `.transform()` method,
 fingerprints can be computed from SMILES strings or RDKit `Mol` objects. Resulting NumPy
 arrays or SciPy sparse arrays can be directly used in ML pipelines.
@@ -216,13 +218,16 @@ Publications using scikit-fingerprints:
 1. [J. Adamczyk, W. Czech "Molecular Topological Profile (MOLTOP) - Simple and Strong Baseline for Molecular Graph Classification" ECAI 2024](https://ebooks.iospress.nl/doi/10.3233/FAIA240663)
 2. [J. Adamczyk, P. Ludynia "Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python" SoftwareX](https://www.sciencedirect.com/science/article/pii/S2352711024003145)
 3. [J. Adamczyk, P. Ludynia, W. Czech "Molecular Fingerprints Are Strong Models for Peptide Function Prediction" ArXiv preprint](https://arxiv.org/abs/2501.17901)
-4. [M. Fitzner et al. "BayBE: a Bayesian Back End for experimental planning in the low-to-no-data regime" RSC Digital Discovery](https://pubs.rsc.org/en/content/articlehtml/2025/dd/d5dd00050e)
-5. [J. Xiong "Bridging 3D Molecular Structures and Artificial Intelligence by a Conformation Description Language"](https://www.biorxiv.org/content/10.1101/2025.05.07.652440v1.abstract)
+4. [J. Adamczyk "Towards Rational Pesticide Design with Graph Machine Learning Models for Ecotoxicology" CIKM 2025](https://dl.acm.org/doi/abs/10.1145/3746252.3761660)
+5. [J. Adamczyk, J. Poziemski, F. Job, M. Król, M. Makowski "MolPILE - large-scale, diverse dataset for molecular representation learning" ArXiv preprint](https://arxiv.org/abs/2509.18353)
+6. [M. Fitzner et al. "BayBE: a Bayesian Back End for experimental planning in the low-to-no-data regime" RSC Digital Discovery](https://pubs.rsc.org/en/content/articlehtml/2025/dd/d5dd00050e)
+7. [J. Xiong et al. "Bridging 3D Molecular Structures and Artificial Intelligence by a Conformation Description Language"](https://www.biorxiv.org/content/10.1101/2025.05.07.652440v1.abstract)
+8. [S. Mavlonazarova et al. "Untargeted Metabolomics Reveals Organ-Specific and Extraction-Dependent Metabolite Profiles in Endemic Tajik Species Ferula violacea Korovin" bioRxiv preprint](https://www.biorxiv.org/content/10.1101/2025.08.24.671964v1)
 
 ## Contributing
 
 Please read [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for details on our code of
-conduct, and the process for submitting pull requests to us.
+ conduct and the process for submitting pull requests.
 
 ## License
 

@@ -32,7 +32,7 @@ dependencies = [
     "numba<1",
     "numpy>=1.20.0,<3",
     "pandas<3",
-    "rdkit<=2025.3.6",
+    "rdkit<=2025.9.3",
     "scikit-learn>=1.0.0,<2",
     "scipy>=1.0.0,<2",
     "tqdm>=4.0.0,<5"
@@ -49,21 +49,18 @@ dev = [
     "coverage",
     "jupyter",
     "mypy",
-    "pip-audit",
     "pre-commit",
     "pytest",
     "pytest-cov",
     "pytest-rerunfailures",
     "ruff",
     "setuptools>=80",
-    "xenon"
+    "scipy-stubs",
 ]
 
 test = [
     "mypy",
     "ruff",
-    "xenon",
-    "pip-audit",
     "pre-commit",
     "pytest",
     "pytest-rerunfailures"
@@ -73,7 +70,7 @@ docs = [
     "ipython",
     "nbsphinx",
     "pydata-sphinx-theme",
-    "scikit-learn!=1.7.1",  # due to scikit-learn docs issue: https://github.com/microsoft/lightgbm/issues/6978
+    "scikit-learn!=1.7.1", # due to scikit-learn docs issue: https://github.com/microsoft/lightgbm/issues/6978
     "sphinx",
     "sphinx-copybutton"
 ]
@@ -95,6 +92,15 @@ filterwarnings = [
     "ignore:Function auroc_score.*:FutureWarning"
 ]
 
+[tool.mypy]
+python_version = "3.10"
+check_untyped_defs = true  # check all functions, this fixes some tests
+allow_redefinition = true  # we redefine variables a lot for efficiency
+# most libraries used are not properly typed in Python, particularly RDKit
+ignore_missing_imports = true
+disable_error_code = ["import-untyped"]
+no_site_packages = true
+
 [tool.uv.build-backend]
 module-name = "skfp"
 module-root = ""

@@ -15,18 +15,18 @@ class BoundingBoxADChecker(BaseADChecker):
     This creates a "bounding box" using their extreme values, and new molecules
     should lie in this distribution, i.e. have properties in the same ranges [1]_.
 
-    Typically, physicochemical properties (continous features) are used as inputs.
+    Typically, physicochemical properties (continuous features) are used as inputs.
     Consider scaling, normalizing, or transforming them before computing AD to lessen
     effects of outliers, e.g. with ``PowerTransformer`` or ``RobustScaler``. This is
-    particularly important if ``"three_sigma"`` is used as percentile bound, as it
+    particularly important if ``"three_sigma"`` is used as the percentile bound, as it
     assumes normal distribution.
 
     By default, the full range of training descriptors are allowed as AD. For stricter
     check, use ``percentile_lower`` and ``percentile_upper`` arguments to disallow
     extremely low or large values, respectively. For looser check, use ``num_allowed_violations``
     to allow a number of desrciptors to lie outside the given ranges.
 
-    This method scales very well with both number of samples and features.
+    This method scales very well with both the number of samples and features.
 
     Parameters
     ----------
@@ -42,7 +42,7 @@ class BoundingBoxADChecker(BaseADChecker):
         uses 3 standard deviations from the mean, a common rule-of-thumb for outliers
         assuming the normal distribution.
 
-    num_allowed_violations : bool, default=0
+    num_allowed_violations : int, default=0
         Number of allowed violations of feature ranges. By default, all descriptors
         must lie inside the bounding box.
 
@@ -85,16 +85,16 @@ class BoundingBoxADChecker(BaseADChecker):
 
     _parameter_constraints: dict = {
         **BaseADChecker._parameter_constraints,
-        "percentile_lower": [Interval(Real, 0, 100, closed="both")],
-        "percentile_upper": [Interval(Real, 0, 100, closed="both")],
+        "percentile_lower": [Interval(Real, 0, 100, closed="both"), "three_sigma"],
+        "percentile_upper": [Interval(Real, 0, 100, closed="both"), "three_sigma"],
         "num_allowed_violations": [Interval(Integral, 0, None, closed="left")],
     }
 
     def __init__(
         self,
         percentile_lower: float | str = 0,
         percentile_upper: float | str = 100,
-        num_allowed_violations: int | None = 0,
+        num_allowed_violations: int = 0,
         n_jobs: int | None = None,
         verbose: int | dict = 0,
     ):

@@ -35,11 +35,11 @@ class ConvexHullADChecker(BaseADChecker):
         & 1^T \lambda = 1,\\
         & \lambda_i \geq 0 \text{  for all  } i=1,...,n
 
-    Typically, physicochemical properties (continous features) are used as inputs.
+    Typically, physicochemical properties (continuous features) are used as inputs.
     Consider scaling, normalizing, or transforming them before computing AD to lessen
     effects of outliers, e.g. with ``PowerTransformer`` or ``RobustScaler``.
 
-    This method scales very badly with both number of samples and features. It has
+    This method scales very badly with both the number of samples and features. It has
     quadratic scaling :math:`O(n^2)` in number of samples, and can be realistically run
     on at most 1000-3000 molecules. Its geometry also breaks down above ~10 features,
     marking everything as outside AD.

@@ -63,7 +63,7 @@ class DistanceToCentroidADChecker(BaseADChecker):
     data centroid, i.e. the average (middle) point [1]_. New molecules should lie
     inside the hypersphere of a given radius (distance) from that centroid.
 
-    Typically, physicochemical properties (continous features) are used as inputs.
+    Typically, physicochemical properties (continuous features) are used as inputs.
     Consider scaling, normalizing, or transforming them before computing AD to lessen
     effects of outliers, e.g. with ``PowerTransformer`` or ``RobustScaler``.
 
@@ -129,7 +129,7 @@ class DistanceToCentroidADChecker(BaseADChecker):
     _parameter_constraints: dict = {
         **BaseADChecker._parameter_constraints,
         "threshold": [Interval(Real, 0, None, closed="neither"), StrOptions({"auto"})],
-        "distance": [
+        "metric": [
             callable,
             StrOptions(SCIPY_METRIC_NAMES | SKFP_METRIC_NAMES | SKFP_BULK_METRIC_NAMES),
         ],

@@ -16,7 +16,7 @@ class HotellingT2TestADChecker(BaseADChecker):
     Mahalanobis distance of a new sample from the mean of the training data, scaled
     by the covariance structure of the training data.
 
-    Typically, physicochemical properties (continous features) are used as inputs.
+    Typically, physicochemical properties (continuous features) are used as inputs.
     Consider scaling, normalizing, or transforming them before computing AD to lessen
     effects of outliers, e.g. with ``PowerTransformer`` or ``RobustScaler``. In case
     of Hotelling's T^2 test, using PCA beforehand to obtain orthogonal features is