Skip to content

WikiArt scraper only scraping <3000 images #27

@fk798

Description

@fk798

Hi! When scraping and downloading images to train the DCGAN on, the scraper is unable to get access to the full dataset. Instead, for example when I try downloading images using the command python art.py --genre=landscape --num_pages=250 --output_dir=landscape_scraped I am only able to download around 2400 images before the prorgram ends. However, when you go to the WikiArt website, it shows that for landscape there are around 22000 images available.

Here's what I think the issue is: when you go to the landscape page, the webpage shows that there are a total of 3600 images you can see. I tried scrolling all the way down to see if there were other pages I could access with different images, but it doesn't show any buttons to go to any other pages (if there are any). It looks like WikiArt has their website so that you can only view those 3600 images instead of the entire dataset, which poses a problem since we have less data to train the network on. I might be wrong since I don't really know how WikiArt works, but how can I obtain more images than just the 2400 images?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions