Dev/cache pictures by lorossi · Pull Request #9 · lorossi/corgos-telegram-bot

lorossi · 2025-12-24T12:46:10Z

Added image caching
- Images are now downloaded by the Reddit instance instead of being simply loaded by their url

Copilot

Pull request overview

This PR implements local image caching for the Telegram bot, transitioning from serving images directly via URLs to downloading and caching them locally. The main changes include downloading Reddit images to a local cache folder, serving images from file paths instead of URLs, and renaming the getUrl() method to getPhoto() to reflect this architectural change.

Key Changes:

Implemented image downloading and local caching in the Reddit module with new async file operations
Added cache folder configuration and management (creation/deletion)
Updated the Telegram module to use getPhoto() instead of getUrl()

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
corgos_telegram_bot/modules/reddit.py	Core implementation: added image download logic, cache folder management, and file-based queue system
corgos_telegram_bot/modules/telegram.py	Updated to call `getPhoto()` instead of `getUrl()` and renamed variables from `url` to `photo`
tests/test_reddit.py	Updated test to use new `getPhoto()` method name
settings.json.example	Added `reddit_cache_folder` configuration parameter and normalized golden corgo URL placeholder
tests/data/reddit_unit_test_settings.json	Added `reddit_cache_folder` test configuration
tests/data/telegram_unit_test_settings.json	Added `reddit_cache_folder` test configuration
corgos_telegram_bot/main.py	Changed default logging level from INFO to WARNING
.gitignore	Added `.reddit_cache/` to ignore cached images

Note: Several critical issues were identified related to concurrency, error handling, and resource management that should be addressed before merging. See detailed comments below.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T13:14:28Z

corgos_telegram_bot/modules/reddit.py

+    def __del__(self) -> None:
+        """Destructor to clean up resources."""



The destructor is defined but empty. If resources need cleanup on deletion, this should either be removed or properly implemented. Given that async cleanup is needed for the temp folder and Reddit connection, destructors are not appropriate for async cleanup - consider relying on explicit stop() calls instead.

Suggested change

def __del__(self) -> None:

"""Destructor to clean up resources."""

Copilot · 2025-12-24T13:14:28Z

corgos_telegram_bot/modules/reddit.py

+    async def _downloadImage(self, url: str, filepath: str) -> None:
+        """Download an image from the specified url to the specified filepath.
+
+        Args:
+            url (str): url of the image
+            filepath (str): filepath to save the image
+
        """
-        async with self._http_requests_semaphore:
-            logging.debug(f"Requesting url {url}")
+        logging.debug(f"Downloading image from url {url} to filepath {filepath}")
+        try:
            async with aiohttp.ClientSession() as session:
-                async with session.get(url) as response:
-                    logging.debug(f"Request to url {url} completed")
-                    return response
+                async with session.get(url) as resp:
+                    if resp.headers["content-type"] not in self._image_formats:
+                        logging.debug(f"Url {url} is not an image, skipping download")
+                        return
+
+                    async with aiofiles.open(filepath, "wb") as f:
+                        async for chunk in resp.content.iter_chunked(1024):
+                            await f.write(chunk)
+        except Exception as e:
+            logging.error(f"Cannot download image from url {url}, error {e}")


The _http_requests_semaphore that was previously used to limit concurrent HTTP requests is no longer being used in _downloadImage. This could lead to too many concurrent downloads. Consider wrapping the download logic with async with self._http_requests_semaphore: to maintain the rate limiting.

Copilot · 2025-12-24T13:14:29Z

corgos_telegram_bot/modules/reddit.py

+                logging.debug("Downloading image %d/%d from post", x + 1, len(urls))
+                filepath = await self._extractFilenameFromUrl(x, url)
+                await self._downloadImage(url, filepath)
+                self._temp_queue.add(filepath)


The _temp_queue.add(filepath) operation is not protected by the _temp_queue_lock. This creates a race condition where multiple concurrent tasks could modify the set simultaneously, potentially leading to data corruption or lost entries. Wrap this operation with async with self._temp_queue_lock:.

Suggested change

self._temp_queue.add(filepath)

async with self._temp_queue_lock:

self._temp_queue.add(filepath)

Copilot · 2025-12-24T13:14:29Z

corgos_telegram_bot/modules/telegram.py

        if randint(1, 1000) == 1:
            # if we are lucky enough, we get a golden corgo!
-            url = await self._settings.get("telegram_golden_corgo_url")
+            photo = await self._settings.get("telegram_golden_corgo_url")


The variable name was changed from url to photo, but telegram_golden_corgo_url still contains a URL (as evidenced by the _botCheckCommand which manipulates it as a URL with .replace(".jpg", "s.jpg")). The Telegram send_photo API can accept both URLs and file paths, but the inconsistent naming (calling it photo when it's still a URL) may be confusing. Consider either: 1) keeping the variable name as url for clarity, or 2) if golden corgo images should also be cached, update the logic to download and cache the golden corgo as well.

Suggested change

photo = await self._settings.get("telegram_golden_corgo_url")

golden_corgo_url = await self._settings.get("telegram_golden_corgo_url")

photo = golden_corgo_url

Copilot · 2025-12-24T13:14:29Z

corgos_telegram_bot/modules/reddit.py

+    _settings_path: str
+    _image_formats: tuple[str, ...] = (
+        "image/png",
+        "image/jpeg",


The _image_formats tuple was updated to include "image/jpg", but "image/jpg" is not a standard MIME type. The correct MIME type for JPEG images is "image/jpeg". While some servers might return "image/jpg", it's non-standard. Consider removing "image/jpg" or adding a comment explaining why it's needed if you've encountered servers that use this non-standard type.

Suggested change

"image/jpeg",

"image/jpeg",

# "image/jpg" is non-standard but included for compatibility with servers

# that incorrectly use this MIME type for JPEG images.

Copilot · 2025-12-24T13:14:30Z

corgos_telegram_bot/modules/reddit.py

+    async def _deleteTempFolder(self) -> None:
+        """Delete the temporary folder for caching images."""
+        logging.info("Deleting temporary folder for caching images")
+        cache_folder = await self._settings.get("reddit_cache_folder")
+        for filename in await aiofiles.os.listdir(cache_folder):
+            filepath = path.join(cache_folder, filename)
+            await aiofiles.os.remove(filepath)
+        await aiofiles.os.rmdir(cache_folder)
+        logging.info(f"Temporary folder {cache_folder} deleted")


The _deleteTempFolder method will fail if any files remain in the cache folder (e.g., if a download is in progress or if the folder contains unexpected files). The cleanup loop at lines 74-76 only removes files, not subdirectories. If the cache folder somehow contains subdirectories, rmdir will fail. Consider adding error handling or using rmtree for more robust cleanup, or document that the cache folder must not contain subdirectories.

Copilot · 2025-12-24T13:14:31Z

corgos_telegram_bot/modules/reddit.py

+    async def _extractFilenameFromUrl(self, x: int, url: str) -> str:
+        """Extract the filename from the specified url.

        Args:
-            url (str): url to be requested
+            x (int): index of the image
+            url (str): url to extract the filename from

        Returns:
-            Response: response of the request
+            str: filename extracted from the url
+        """
+        logging.debug(f"Extracting filename from url {url}")
+        folder = await self._settings.get("reddit_cache_folder")
+        filename = url.split("/")[-1].split("?")[0]
+        filepath = path.join(folder, f"{x}_{filename}")


The filename extraction uses url.split("/")[-1].split("?")[0] which may not handle all edge cases. URLs without a path component or with unusual structures could result in problematic filenames. Additionally, the index x is prefixed to make filenames unique, but if loadPostsAsync is called multiple times with the same URLs, this could lead to filename collisions across loads. Consider using a more robust filename generation strategy, such as hashing the URL or using a timestamp-based prefix.

Copilot · 2025-12-24T13:14:31Z

corgos_telegram_bot/modules/reddit.py

+        try:
            async with aiohttp.ClientSession() as session:
-                async with session.get(url) as response:
-                    logging.debug(f"Request to url {url} completed")
-                    return response
+                async with session.get(url) as resp:
+                    if resp.headers["content-type"] not in self._image_formats:
+                        logging.debug(f"Url {url} is not an image, skipping download")
+                        return
+
+                    async with aiofiles.open(filepath, "wb") as f:
+                        async for chunk in resp.content.iter_chunked(1024):
+                            await f.write(chunk)
+        except Exception as e:
+            logging.error(f"Cannot download image from url {url}, error {e}")


When the response content-type is not an image format, the method returns early, but the file might have been partially created or opened. While aiofiles.open hasn't been called yet, consider checking the content-type before attempting to write to the file. Additionally, if an exception occurs during download, the partially written file will remain in the cache. Consider wrapping the file writing in a try-finally block that removes the file on failure.

Copilot · 2025-12-24T13:14:31Z

corgos_telegram_bot/modules/reddit.py

+            if self._queue.empty():
+                logging.warning("Reddit queue is empty")
+                raise EmptyQueueException("Reddit queue is empty")
+            # rotate the queue
+            filepath = self._queue.get()
+            self._queue.put(filepath)
+
+        return filepath



The queue rotation logic (get then put) means the same files will be served repeatedly in the same order. While this is likely intentional for cycling through images, it also means that if a file is deleted from the filesystem (e.g., by an external process or disk cleanup), the filepath will remain in the queue and cause repeated errors when trying to send it. Consider adding file existence validation in getPhoto or implementing a mechanism to remove invalid filepaths from the queue when errors occur.

Suggested change

if self._queue.empty():

logging.warning("Reddit queue is empty")

raise EmptyQueueException("Reddit queue is empty")

# rotate the queue

filepath = self._queue.get()

self._queue.put(filepath)

return filepath

# Iterate through the queue to find a valid, existing file.

while not self._queue.empty():

# rotate the queue

filepath = self._queue.get()

if path.exists(filepath):

# Re-queue valid filepath to maintain rotation behavior.

self._queue.put(filepath)

return filepath

# Do not re-queue missing files; effectively remove them.

logging.warning(

"File '%s' no longer exists on disk; removing from Reddit queue",

filepath,

)

logging.warning("Reddit queue is empty")

raise EmptyQueueException("Reddit queue is empty")

Copilot · 2025-12-24T13:14:31Z

corgos_telegram_bot/modules/reddit.py

        Args:
            media_metadata (dict): media metadata of the gallery

+


There's an unnecessary extra blank line in the docstring. The standard Google-style docstring format uses a single blank line between sections. Remove the extra blank line.

Suggested change

lorossi added 5 commits December 24, 2025 13:44

Added preloading of images

57f9eab

Updated settings with cache folder

44a4e58

Removed cache folder from git

4956ff5

changed default logging level

df3b89c

Images only in queue are removed

dfae0fb

lorossi requested a review from Copilot December 24, 2025 13:06

Copilot started reviewing on behalf of lorossi December 24, 2025 13:06 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/cache pictures#9

Dev/cache pictures#9
lorossi wants to merge 5 commits intomainfrom
dev/cache-pictures

lorossi commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def __del__(self) -> None:
		"""Destructor to clean up resources."""

	self._temp_queue.add(filepath)
	async with self._temp_queue_lock:
	self._temp_queue.add(filepath)

	photo = await self._settings.get("telegram_golden_corgo_url")
	golden_corgo_url = await self._settings.get("telegram_golden_corgo_url")
	photo = golden_corgo_url

Conversation

lorossi commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants