yurijmikhalevich · Kaos599 · Oct 14, 2025 · Oct 25, 2025 · Oct 25, 2025 · Oct 25, 2025
diff --git a/.all-contributorsrc b/.all-contributorsrc
@@ -49,6 +49,15 @@
       "contributions": [
         "code"
       ]
+    },
+    {
+      "login": "leoauri",
+      "name": "Leo Auri",
+      "avatar_url": "https://avatars.githubusercontent.com/u/10868855?v=4",
+      "profile": "http://leoauri.com",
+      "contributions": [
+        "code"
+      ]
     }
   ],
   "contributorsPerLine": 7,

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # rclip - AI-Powered Command-Line Photo Search Tool
 <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
-[![All Contributors](https://img.shields.io/badge/all_contributors-5-orange.svg?style=flat-square)](#contributors-)
+[![All Contributors](https://img.shields.io/badge/all_contributors-6-orange.svg?style=flat-square)](#contributors-)
 <!-- ALL-CONTRIBUTORS-BADGE:END -->
 
 [[Blog]](https://mikhalevi.ch/rclip-an-ai-powered-command-line-photo-search-tool/) [[Demo on YouTube]](https://www.youtube.com/watch?v=tAJHXOkHidw) [[Paper]](https://www.thinkmind.org/index.php?view=article&articleid=content_2023_1_20_60011)
@@ -83,6 +83,8 @@ cd photos && rclip "search query"
 
 When you run **rclip** for the first time in a particular directory, it will extract features from the photos, which takes time. How long it will take depends on your CPU and the number of pictures you will search through. It took about a day to process 73 thousand photos on my NAS, which runs an old-ish Intel Celeron J3455, 7 minutes to index 50 thousand images on my MacBook with an M1 Max CPU, and three hours to process 1.28 million images on the same MacBook.
 
+When you run **rclip** in a directory that has already been processed, it will only index the new images added since the last run and remove the deleted images from its index. Renamed images are also detected automatically using perceptual hashing, so you don't need to perform a full re-index when files are renamed. This makes consecutive runs much faster.
+
 For a detailed demonstration, watch the video: https://www.youtube.com/watch?v=tAJHXOkHidw.
 
 ### Similar image search
@@ -138,6 +140,20 @@ rclip -p kitty
   ```
 </details>
 
+### How does **rclip** update the index?
+
+When you run **rclip** in a directory that has already been processed, it will
+only index the new images added since the last run and remove the deleted images
+from its index. This makes consecutive runs much faster.
-from its index. This makes consecutive runs much faster.
+from its index. Renamed images are also detected automatically, so you don't need
+to perform a full re-index when files are renamed. This makes consecutive runs much faster.
-from its index. This makes consecutive runs much faster.
+from its index. Renamed images are also detected automatically, so you don't need
+to perform a full re-index when files are renamed. This makes consecutive runs much faster.
+
+If you know that no new images were added or deleted since the last run, you can
+use the `--no-indexing` (or `-n`) argument to skip the indexing step altogether
+and speed up the search even more.
+
+```bash
+rclip -n cat
+```
+
 ## Get help
 
 https://github.com/yurijmikhalevich/rclip/discussions/new/choose
@@ -180,6 +196,7 @@ Thanks go to these wonderful people and organizations ([emoji key](https://allco
       <td align="center" valign="top" width="14.28%"><a href="http://abidkhan484.github.io"><img src="https://avatars.githubusercontent.com/u/15053047?v=4?s=100" width="100px;" alt="AbId KhAn"/><br /><sub><b>AbId KhAn</b></sub></a><br /><a href="https://github.com/yurijmikhalevich/rclip/commits?author=abidkhan484" title="Code">💻</a></td>
       <td align="center" valign="top" width="14.28%"><a href="https://cl4r1ty.dev"><img src="https://avatars.githubusercontent.com/u/136800640?v=4?s=100" width="100px;" alt="Ben"/><br /><sub><b>Ben</b></sub></a><br /><a href="https://github.com/yurijmikhalevich/rclip/commits?author=Cl4r1ty-1" title="Code">💻</a></td>
       <td align="center" valign="top" width="14.28%"><a href="https://techtracer.pages.dev"><img src="https://avatars.githubusercontent.com/u/48885301?v=4?s=100" width="100px;" alt="Tanmay Chaudhari"/><br /><sub><b>Tanmay Chaudhari</b></sub></a><br /><a href="https://github.com/yurijmikhalevich/rclip/commits?author=tanmayc07" title="Code">💻</a></td>
+      <td align="center" valign="top" width="14.28%"><a href="http://leoauri.com"><img src="https://avatars.githubusercontent.com/u/10868855?v=4?s=100" width="100px;" alt="Leo Auri"/><br /><sub><b>Leo Auri</b></sub></a><br /><a href="https://github.com/yurijmikhalevich/rclip/commits?author=leoauri" title="Code">💻</a></td>
     </tr>
   </tbody>
 </table>

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "rclip"
-version = "2.0.11"
+version = "2.1.0"
 description = "AI-Powered Command-Line Photo Search Tool"
 authors = ["Yurij Mikhalevich <yurij@mikhalevi.ch>"]
 license = "MIT"
@@ -21,7 +21,8 @@ classifiers = [
 python = ">=3.10 <3.13"
 numpy = "~2.1.3"
 open_clip_torch = "^3.1.0"
-pillow = "^10.3.0"
+pillow = "^12.0.0"
-pillow = "^12.0.0"
+pillow = "^11.1.0"
-pillow = "^12.0.0"
+pillow = "^11.1.0"
+pillow-heif = "^1.1.1"
 requests = "~=2.32"
 torch = [
   { version = "==2.5.1", source = "pypi", markers = "sys_platform != 'linux' or platform_machine == 'aarch64'" },
@@ -33,6 +34,7 @@ torchvision = [
 ]
 tqdm = "^4.65.0"
 rawpy = "^0.24.0"
+imagehash = "^4.3.1"
 
 [tool.poetry.group.dev.dependencies]
 pyright = {extras = ["nodejs"], version = "^1.1.394"}

diff --git a/rclip/const.py b/rclip/const.py
@@ -6,6 +6,6 @@
 IS_WINDOWS = sys.platform == "win32" or sys.platform == "cygwin"
 
 # these images are always processed
-IMAGE_EXT = ["jpg", "jpeg", "png", "webp"]
+IMAGE_EXT = ["jpg", "jpeg", "png", "webp", "heic"]
-IMAGE_EXT = ["jpg", "jpeg", "png", "webp", "heic"]
+IMAGE_EXT = ["jpg", "jpeg", "png", "webp", "heic", "heif"]
-IMAGE_EXT = ["jpg", "jpeg", "png", "webp", "heic"]
+IMAGE_EXT = ["jpg", "jpeg", "png", "webp", "heic", "heif"]
 # RAW images are processed only if there is no processed image alongside it
 IMAGE_RAW_EXT = ["arw", "cr2"]
diff --git a/rclip/db.py b/rclip/db.py
@@ -13,14 +13,15 @@ class NewImage(ImageOmittable):
   modified_at: float
   size: int
   vector: bytes
+  hash: Optional[str] = None
 
 
 class Image(NewImage):
   id: int
 
 
 class DB:
-  VERSION = 2
+  VERSION = 3
 
   def __init__(self, filename: Union[str, pathlib.Path]):
     self._con = sqlite3.connect(filename)
@@ -61,6 +62,15 @@ def ensure_version(self):
     if db_version < 2:
       self._con.execute("ALTER TABLE images ADD COLUMN indexing BOOLEAN")
       db_version = 2
+    if db_version < 3:
+      # Check if hash column already exists (it might from old code)
+      columns = self._con.execute("PRAGMA table_info(images)").fetchall()
+      column_names = [col["name"] for col in columns]
+      if "hash" not in column_names:
+        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
-        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+      # Rely on CREATE INDEX IF NOT EXISTS to handle cases where hash_index may already exist from old code
-        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+      # Rely on CREATE INDEX IF NOT EXISTS to handle cases where hash_index may already exist from old code
+      # CREATE INDEX IF NOT EXISTS handles cases where hash_index may already exist from old code
-      # Check if hash column already exists (it might from old code)
-      columns = self._con.execute("PRAGMA table_info(images)").fetchall()
-      column_names = [col["name"] for col in columns]
-      if "hash" not in column_names:
-        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
-      # CREATE INDEX IF NOT EXISTS handles cases where hash_index may already exist from old code
+      # Check if hash column already exists (it might from old/experimental code)
+      # so we don't attempt to add it twice on databases from earlier dev builds.
+      columns = self._con.execute("PRAGMA table_info(images)").fetchall()
+      column_names = [col["name"] for col in columns]
+      if "hash" not in column_names:
+        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+      # Always ensure the index exists: on fresh migrations this creates it,
+      # and on partially migrated/experimental databases it is a no-op if the
+      # index was already created by older code.
-      # Check if hash column already exists (it might from old code)
-      columns = self._con.execute("PRAGMA table_info(images)").fetchall()
-      column_names = [col["name"] for col in columns]
-      if "hash" not in column_names:
-        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
-      # CREATE INDEX IF NOT EXISTS handles cases where hash_index may already exist from old code
+      # Check if hash column already exists (it might from old/experimental code)
+      # so we don't attempt to add it twice on databases from earlier dev builds.
+      columns = self._con.execute("PRAGMA table_info(images)").fetchall()
+      column_names = [col["name"] for col in columns]
+      if "hash" not in column_names:
+        self._con.execute("ALTER TABLE images ADD COLUMN hash TEXT")
+      # Always ensure the index exists: on fresh migrations this creates it,
+      # and on partially migrated/experimental databases it is a no-op if the
+      # index was already created by older code.
+      self._con.execute("CREATE INDEX IF NOT EXISTS hash_index ON images(hash) WHERE deleted IS NULL")
+      db_version = 3
     if db_version < self.VERSION:
       raise Exception("migration to a newer index version isn't implemented")
     if db_version_entry:
@@ -69,27 +79,35 @@ def ensure_version(self):
       self._con.execute("INSERT INTO db_version(version) VALUES (?)", (self.VERSION,))
     self._con.commit()
 
+
   def commit(self):
     self._con.commit()
 
   def upsert_image(self, image: NewImage, commit: bool = True):
     self._con.execute(
       """
-      INSERT INTO images(deleted, indexing, filepath, modified_at, size, vector)
-      VALUES (:deleted, :indexing, :filepath, :modified_at, :size, :vector)
+      INSERT INTO images(deleted, indexing, filepath, modified_at, size, vector, hash)
+      VALUES (:deleted, :indexing, :filepath, :modified_at, :size, :vector, :hash)
       ON CONFLICT(filepath) DO UPDATE SET
-        deleted=:deleted, indexing=:indexing, modified_at=:modified_at, size=:size, vector=:vector
+        deleted=:deleted, indexing=:indexing, modified_at=:modified_at, size=:size, vector=:vector, hash=:hash
     """,
       {"deleted": None, "indexing": None, **image},
     )
     if commit:
       self._con.commit()
 
+
   def remove_indexing_flag_from_all_images(self, commit: bool = True):
     self._con.execute("UPDATE images SET indexing = NULL")
     if commit:
       self._con.commit()
 
+  def remove_indexing_flag_from_dir(self, path: str, commit: bool = True):
+    """Remove indexing flag only from images within a specific directory."""
+    self._con.execute("UPDATE images SET indexing = NULL WHERE filepath LIKE ?", (path + f"{os.path.sep}%",))
+    if commit:
+      self._con.commit()
+
   def flag_images_in_a_dir_as_indexing(self, path: str, commit: bool = True):
     self._con.execute("UPDATE images SET indexing = 1 WHERE filepath LIKE ?", (path + f"{os.path.sep}%",))
     if commit:
@@ -108,10 +126,31 @@ def remove_indexing_flag(self, filepath: str, commit: bool = True):
       self._con.commit()
 
   def get_image(self, **kwargs: Any) -> Optional[Image]:
-    query = " AND ".join(f"{key}=:{key}" for key in kwargs)
+    query_parts = [f"{key}=:{key}" for key in kwargs]
+    query_parts.append("deleted IS NULL")
+    query = " AND ".join(query_parts)
     cur = self._con.execute(f"SELECT * FROM images WHERE {query} LIMIT 1", kwargs)
     return cur.fetchone()
 
+  def get_images_by_hash(self, hash_value: str) -> list[Image]:
+    cur = self._con.execute(
+      "SELECT * FROM images WHERE hash = ? AND deleted IS NULL",
+      (hash_value,),
+    )
+    return [dict(row) for row in cur.fetchall()]
+
+  def has_indexing_images_in_dir(self, path: str) -> bool:
+    """Check if there are any images with indexing=1 flag in this directory.
+
+    Used to optimize rename detection: only compute hashes when there are
+    potential deletions to match against.
+    """
+    cur = self._con.execute(
+      "SELECT 1 FROM images WHERE filepath LIKE ? AND indexing = 1 LIMIT 1",
+      (path + f"{os.path.sep}%",)
+    )
+    return cur.fetchone() is not None
+
   def get_image_vectors_by_dir_path(self, path: str) -> sqlite3.Cursor:
     return self._con.execute(
       "SELECT filepath, vector FROM images WHERE filepath LIKE ? AND deleted IS NULL", (path + f"{os.path.sep}%",)

diff --git a/rclip/main.py b/rclip/main.py
@@ -4,11 +4,13 @@
 import sys
 import threading
 from typing import Iterable, List, NamedTuple, Optional, Tuple, TypedDict, cast
+from typing import TYPE_CHECKING
 
 import numpy as np
 from tqdm import tqdm
 import PIL
 from PIL import Image, ImageFile
+from pillow_heif import register_heif_opener
 
 from rclip import db, fs, model
 from rclip.const import IMAGE_EXT, IMAGE_RAW_EXT
@@ -18,6 +20,7 @@
 
 
 ImageFile.LOAD_TRUNCATED_IMAGES = True
+register_heif_opener()
 
 
 class ImageMeta(TypedDict):
@@ -77,17 +80,21 @@ def _index_files(self, filepaths: List[str], metas: List[ImageMeta]):
         filtered_paths.append(path)
       except PIL.UnidentifiedImageError:
         pass
-      except Exception as ex:
+      except (OSError, ValueError) as ex:
-      except (OSError, ValueError) as ex:
+      except Exception as ex:
-      except (OSError, ValueError) as ex:
+      except Exception as ex:
         print(f"error loading image {path}:", ex, file=sys.stderr)
 
     try:
       features = self._model.compute_image_features(images)
     except Exception as ex:
       print("error computing features:", ex, file=sys.stderr)
       return
-    for path, meta, vector in cast(Iterable[PathMetaVector], zip(filtered_paths, metas, features)):
+    for path, meta, vector, image in cast(
+        Iterable[Tuple[str, ImageMeta, 'FeatureVector', Image.Image]],
+        zip(filtered_paths, metas, features, images)
+    ):
+      hash_value = helpers.compute_image_hash(image)
       self._db.upsert_image(
-        db.NewImage(filepath=path, modified_at=meta["modified_at"], size=meta["size"], vector=vector.tobytes()),
+        db.NewImage(filepath=path, modified_at=meta["modified_at"], size=meta["size"], vector=vector.tobytes(), hash=hash_value),
         commit=False,
       )
 
@@ -110,8 +117,9 @@ def ensure_index(self, directory: str):
       file=sys.stderr,
     )
 
-    self._db.remove_indexing_flag_from_all_images(commit=False)
-    self._db.flag_images_in_a_dir_as_indexing(directory, commit=True)
+    # Initialize indexing workflow: reset flags for this directory, then mark it for reindexing
+    self._db.remove_indexing_flag_from_dir(directory)
+    self._db.flag_images_in_a_dir_as_indexing(directory)
 
     with tqdm(total=None, unit="images") as pbar:
 
@@ -151,9 +159,49 @@ def update_total_images(count: int):
 
         image = self._db.get_image(filepath=filepath)
         if image and is_image_meta_equal(image, meta):
+          # Image hasn't changed, remove indexing flag to mark it as still present
           self._db.remove_indexing_flag(filepath, commit=False)
           continue
 
+        # Check if this might be a renamed image
+        # Only attempt rename detection if there are potential deletions to match against
+        has_potential_deletions = self._db.has_indexing_images_in_dir(directory)
+
+        if not image and has_potential_deletions:
+          # Read the image to compute its hash
+          try:
+            img = helpers.read_image(filepath)
+            current_hash = helpers.compute_image_hash(img)
+            # Look for ALL existing images with the same hash
+            existing_images_with_hash = self._db.get_images_by_hash(current_hash)
+
+            # Find an entry where the file no longer exists (true rename, not a copy)
+            existing_image_vector = None
+            for img_entry in existing_images_with_hash:
+              if not os.path.exists(img_entry["filepath"]):
-              if not os.path.exists(img_entry["filepath"]):
+              # Only consider images that are part of the current indexing pass (indexing == 1)
+              if img_entry.get("indexing") == 1 and not os.path.exists(img_entry["filepath"]):
-              if not os.path.exists(img_entry["filepath"]):
+              # Only consider images that are part of the current indexing pass (indexing == 1)
+              if img_entry.get("indexing") == 1 and not os.path.exists(img_entry["filepath"]):
+                existing_image_vector = img_entry["vector"]
+                break
+
+            if existing_image_vector:
+              # This is a renamed file - reuse the existing vector
+              # DON'T remove the indexing flag from the old filepath - we want it to be marked as deleted
+              # Create a new entry for the new filepath
+              self._db.upsert_image(
+                db.NewImage(
+                  filepath=filepath,
+                  modified_at=meta["modified_at"],
+                  size=meta["size"],
+                  vector=existing_image_vector,
+                  hash=current_hash
+                ),
+                commit=False,
+              )
+              self._db.remove_indexing_flag(filepath, commit=False)
+              continue
+          except (PIL.UnidentifiedImageError, OSError, ValueError):
+            # If we can't read the image, fall through to normal indexing
+            pass
+
         batch.append(filepath)
         metas.append(meta)
 
@@ -165,10 +213,13 @@ def update_total_images(count: int):
       if len(batch) != 0:
         self._index_files(batch, metas)
 
+      # Finalize indexing workflow: mark any remaining indexing=1 entries as deleted
+      # These are files that no longer exist (e.g., old paths of renamed files)
+      self._db.flag_indexing_images_in_a_dir_as_deleted(directory)
+
       self._db.commit()
       counter_thread.join()
 
-    self._db.flag_indexing_images_in_a_dir_as_deleted(directory)
     print("", file=sys.stderr)
 
   def search(
@@ -245,6 +296,7 @@ def init_rclip(
 
   return rclip, model_instance, database
 
+
 def print_results(result: List[RClip.SearchResult], args: helpers.argparse.Namespace):
   # if we are not outputting to console on windows, ensure unicode encoding is correct
   if not sys.stdout.isatty() and os.name == "nt":

diff --git a/rclip/utils/helpers.py b/rclip/utils/helpers.py
@@ -10,6 +10,7 @@
 import requests
 import sys
 from importlib.metadata import version
+import imagehash
 
 from rclip.const import IMAGE_RAW_EXT, IS_LINUX, IS_MACOS, IS_WINDOWS
 
@@ -101,7 +102,10 @@ def init_arg_parser() -> argparse.ArgumentParser:
     "get help:\n"
     "  https://github.com/yurijmikhalevich/rclip/discussions/new/choose\n\n",
   )
-  version_str = f"rclip {version('rclip')}"
+  try:
+    version_str = f"rclip {version('rclip')}"
+  except Exception:  # PackageNotFoundError when not installed via package manager
+    version_str = "rclip (development)"
   parser.add_argument("--version", "-v", action="version", version=version_str, help=f'prints "{version_str}"')
   parser.add_argument("query", help="a text query or a path/URL to an image file")
   parser.add_argument(
@@ -230,6 +234,18 @@ def read_raw_image_file(path: str):
   return Image.fromarray(np.array(rgb))
 
 
+def compute_image_hash(image: Image.Image) -> str:
+  """Compute a perceptual hash (pHash) for an image.
+
+  The pHash algorithm generates a compact fingerprint of the image's visual
+  content, such that visually similar images (e.g. resized, recompressed or
+  slightly modified versions of the same picture) produce similar hashes.
+  This makes it suitable for detecting identical or near-duplicate images,
+  such as renamed files with the same underlying content.
-  
-  The pHash algorithm generates a compact fingerprint of the image's visual
-  content, such that visually similar images (e.g. resized, recompressed or
-  slightly modified versions of the same picture) produce similar hashes.
-  This makes it suitable for detecting identical or near-duplicate images,
-  such as renamed files with the same underlying content.
+
+  The pHash algorithm generates a compact, deterministic fingerprint of an
+  image's visual content. Identical visual content will produce identical
+  hashes, and near-duplicate content (e.g. resized or recompressed versions
+  of the same picture) will typically produce very similar hashes.
+
+  In this project, the returned hash is used for exact duplicate detection
+  via equality comparison (for example, to detect renamed files that contain
+  the same underlying image data), not for general similarity matching with
+  a distance threshold.
-  
-  The pHash algorithm generates a compact fingerprint of the image's visual
-  content, such that visually similar images (e.g. resized, recompressed or
-  slightly modified versions of the same picture) produce similar hashes.
-  This makes it suitable for detecting identical or near-duplicate images,
-  such as renamed files with the same underlying content.
+
+  The pHash algorithm generates a compact, deterministic fingerprint of an
+  image's visual content. Identical visual content will produce identical
+  hashes, and near-duplicate content (e.g. resized or recompressed versions
+  of the same picture) will typically produce very similar hashes.
+
+  In this project, the returned hash is used for exact duplicate detection
+  via equality comparison (for example, to detect renamed files that contain
+  the same underlying image data), not for general similarity matching with
+  a distance threshold.
+  """
+  return str(imagehash.phash(image))
+
+
 def read_image(query: str) -> Image.Image:
   path = str.removeprefix(query, "file://")
   try: