-
Notifications
You must be signed in to change notification settings - Fork 1
Feat: Pre-check for existing images using in_img_dir column (Closes #34) #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…directory cases, update test_download_images for new logic flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…mageomics/cautious-robot into feature/issue-34/check-existing-images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed one update to the description (beginning of the README) to reflect the updated functionality. A few other suggestions for clarity are noted below.
A bigger issue I noticed is that the check_existing_images function will have a conflict if the --starting-idx parameter is passed and I don't believe that's addressed.
Suggested method: run the existing image check for all filenames starting at a passed starting index and then have the download start from only the images that aren't already in the output folder following the starting index. This should maintain the current, expected behavior.
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
egrace479
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more items. I am also concerned that process_checksums needs an edit here.
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
egrace479
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing I noticed while running through the README examples. I made the suggestion on the line I could and just pointed to the line above. It's the new in_img_dir column that now shows up in the missing CSV.
egrace479
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Summary
Implements a feature to check for and skip downloading images that already exist in the target image directory (
img_dir), addressing Issue #34.Changes Implemented
check_exisiting_images, utilizessum-buddy.gather_file_pathsto collect file names from the existingimg_dir.in_img_dir, is added to thesource_dfto track which images are already present.download_imagesfunction is now passed a filtered dataframe containing only images that need to be downloaded (in_img_dir == False).'{img_dir}' already contains all images. Exited without executing.There are {checksum_df.shape[0]} files in {img_dir}. Based on {csv_path}, there should be {expected_num_imgs} images.CLI Examplessection regarding new outputs and cases where relevant to these changesCloses #34