Skip to content

Dev/cache pictures#9

Draft
lorossi wants to merge 5 commits intomainfrom
dev/cache-pictures
Draft

Dev/cache pictures#9
lorossi wants to merge 5 commits intomainfrom
dev/cache-pictures

Conversation

@lorossi
Copy link
Owner

@lorossi lorossi commented Dec 24, 2025

  • Added image caching
    • Images are now downloaded by the Reddit instance instead of being simply loaded by their url

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements local image caching for the Telegram bot, transitioning from serving images directly via URLs to downloading and caching them locally. The main changes include downloading Reddit images to a local cache folder, serving images from file paths instead of URLs, and renaming the getUrl() method to getPhoto() to reflect this architectural change.

Key Changes:

  • Implemented image downloading and local caching in the Reddit module with new async file operations
  • Added cache folder configuration and management (creation/deletion)
  • Updated the Telegram module to use getPhoto() instead of getUrl()

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
corgos_telegram_bot/modules/reddit.py Core implementation: added image download logic, cache folder management, and file-based queue system
corgos_telegram_bot/modules/telegram.py Updated to call getPhoto() instead of getUrl() and renamed variables from url to photo
tests/test_reddit.py Updated test to use new getPhoto() method name
settings.json.example Added reddit_cache_folder configuration parameter and normalized golden corgo URL placeholder
tests/data/reddit_unit_test_settings.json Added reddit_cache_folder test configuration
tests/data/telegram_unit_test_settings.json Added reddit_cache_folder test configuration
corgos_telegram_bot/main.py Changed default logging level from INFO to WARNING
.gitignore Added .reddit_cache/ to ignore cached images

Note: Several critical issues were identified related to concurrency, error handling, and resource management that should be addressed before merging. See detailed comments below.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to 61
def __del__(self) -> None:
"""Destructor to clean up resources."""

Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destructor is defined but empty. If resources need cleanup on deletion, this should either be removed or properly implemented. Given that async cleanup is needed for the temp folder and Reddit connection, destructors are not appropriate for async cleanup - consider relying on explicit stop() calls instead.

Suggested change
def __del__(self) -> None:
"""Destructor to clean up resources."""

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +117
async def _downloadImage(self, url: str, filepath: str) -> None:
"""Download an image from the specified url to the specified filepath.

Args:
url (str): url of the image
filepath (str): filepath to save the image

"""
async with self._http_requests_semaphore:
logging.debug(f"Requesting url {url}")
logging.debug(f"Downloading image from url {url} to filepath {filepath}")
try:
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
logging.debug(f"Request to url {url} completed")
return response
async with session.get(url) as resp:
if resp.headers["content-type"] not in self._image_formats:
logging.debug(f"Url {url} is not an image, skipping download")
return

async with aiofiles.open(filepath, "wb") as f:
async for chunk in resp.content.iter_chunked(1024):
await f.write(chunk)
except Exception as e:
logging.error(f"Cannot download image from url {url}, error {e}")
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _http_requests_semaphore that was previously used to limit concurrent HTTP requests is no longer being used in _downloadImage. This could lead to too many concurrent downloads. Consider wrapping the download logic with async with self._http_requests_semaphore: to maintain the rate limiting.

Copilot uses AI. Check for mistakes.
logging.debug("Downloading image %d/%d from post", x + 1, len(urls))
filepath = await self._extractFilenameFromUrl(x, url)
await self._downloadImage(url, filepath)
self._temp_queue.add(filepath)
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _temp_queue.add(filepath) operation is not protected by the _temp_queue_lock. This creates a race condition where multiple concurrent tasks could modify the set simultaneously, potentially leading to data corruption or lost entries. Wrap this operation with async with self._temp_queue_lock:.

Suggested change
self._temp_queue.add(filepath)
async with self._temp_queue_lock:
self._temp_queue.add(filepath)

Copilot uses AI. Check for mistakes.
if randint(1, 1000) == 1:
# if we are lucky enough, we get a golden corgo!
url = await self._settings.get("telegram_golden_corgo_url")
photo = await self._settings.get("telegram_golden_corgo_url")
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name was changed from url to photo, but telegram_golden_corgo_url still contains a URL (as evidenced by the _botCheckCommand which manipulates it as a URL with .replace(".jpg", "s.jpg")). The Telegram send_photo API can accept both URLs and file paths, but the inconsistent naming (calling it photo when it's still a URL) may be confusing. Consider either: 1) keeping the variable name as url for clarity, or 2) if golden corgo images should also be cached, update the logic to download and cache the golden corgo as well.

Suggested change
photo = await self._settings.get("telegram_golden_corgo_url")
golden_corgo_url = await self._settings.get("telegram_golden_corgo_url")
photo = golden_corgo_url

Copilot uses AI. Check for mistakes.
_settings_path: str
_image_formats: tuple[str, ...] = (
"image/png",
"image/jpeg",
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _image_formats tuple was updated to include "image/jpg", but "image/jpg" is not a standard MIME type. The correct MIME type for JPEG images is "image/jpeg". While some servers might return "image/jpg", it's non-standard. Consider removing "image/jpg" or adding a comment explaining why it's needed if you've encountered servers that use this non-standard type.

Suggested change
"image/jpeg",
"image/jpeg",
# "image/jpg" is non-standard but included for compatibility with servers
# that incorrectly use this MIME type for JPEG images.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +78
async def _deleteTempFolder(self) -> None:
"""Delete the temporary folder for caching images."""
logging.info("Deleting temporary folder for caching images")
cache_folder = await self._settings.get("reddit_cache_folder")
for filename in await aiofiles.os.listdir(cache_folder):
filepath = path.join(cache_folder, filename)
await aiofiles.os.remove(filepath)
await aiofiles.os.rmdir(cache_folder)
logging.info(f"Temporary folder {cache_folder} deleted")
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _deleteTempFolder method will fail if any files remain in the cache folder (e.g., if a download is in progress or if the folder contains unexpected files). The cleanup loop at lines 74-76 only removes files, not subdirectories. If the cache folder somehow contains subdirectories, rmdir will fail. Consider adding error handling or using rmtree for more robust cleanup, or document that the cache folder must not contain subdirectories.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +93
async def _extractFilenameFromUrl(self, x: int, url: str) -> str:
"""Extract the filename from the specified url.

Args:
url (str): url to be requested
x (int): index of the image
url (str): url to extract the filename from

Returns:
Response: response of the request
str: filename extracted from the url
"""
logging.debug(f"Extracting filename from url {url}")
folder = await self._settings.get("reddit_cache_folder")
filename = url.split("/")[-1].split("?")[0]
filepath = path.join(folder, f"{x}_{filename}")
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename extraction uses url.split("/")[-1].split("?")[0] which may not handle all edge cases. URLs without a path component or with unusual structures could result in problematic filenames. Additionally, the index x is prefixed to make filenames unique, but if loadPostsAsync is called multiple times with the same URLs, this could lead to filename collisions across loads. Consider using a more robust filename generation strategy, such as hashing the URL or using a timestamp-based prefix.

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +117
try:
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
logging.debug(f"Request to url {url} completed")
return response
async with session.get(url) as resp:
if resp.headers["content-type"] not in self._image_formats:
logging.debug(f"Url {url} is not an image, skipping download")
return

async with aiofiles.open(filepath, "wb") as f:
async for chunk in resp.content.iter_chunked(1024):
await f.write(chunk)
except Exception as e:
logging.error(f"Cannot download image from url {url}, error {e}")
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the response content-type is not an image format, the method returns early, but the file might have been partially created or opened. While aiofiles.open hasn't been called yet, consider checking the content-type before attempting to write to the file. Additionally, if an exception occurs during download, the partially written file will remain in the cache. Consider wrapping the file writing in a try-finally block that removes the file on failure.

Copilot uses AI. Check for mistakes.
Comment on lines +348 to 356
if self._queue.empty():
logging.warning("Reddit queue is empty")
raise EmptyQueueException("Reddit queue is empty")
# rotate the queue
filepath = self._queue.get()
self._queue.put(filepath)

return filepath

Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue rotation logic (get then put) means the same files will be served repeatedly in the same order. While this is likely intentional for cycling through images, it also means that if a file is deleted from the filesystem (e.g., by an external process or disk cleanup), the filepath will remain in the queue and cause repeated errors when trying to send it. Consider adding file existence validation in getPhoto or implementing a mechanism to remove invalid filepaths from the queue when errors occur.

Suggested change
if self._queue.empty():
logging.warning("Reddit queue is empty")
raise EmptyQueueException("Reddit queue is empty")
# rotate the queue
filepath = self._queue.get()
self._queue.put(filepath)
return filepath
# Iterate through the queue to find a valid, existing file.
while not self._queue.empty():
# rotate the queue
filepath = self._queue.get()
if path.exists(filepath):
# Re-queue valid filepath to maintain rotation behavior.
self._queue.put(filepath)
return filepath
# Do not re-queue missing files; effectively remove them.
logging.warning(
"File '%s' no longer exists on disk; removing from Reddit queue",
filepath,
)
logging.warning("Reddit queue is empty")
raise EmptyQueueException("Reddit queue is empty")

Copilot uses AI. Check for mistakes.
Args:
media_metadata (dict): media metadata of the gallery


Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an unnecessary extra blank line in the docstring. The standard Google-style docstring format uses a single blank line between sections. Remove the extra blank line.

Suggested change

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants