Skip to content

Chapter 5 - webCrawler.py not working properly #6

@xemage

Description

@xemage

I think this code is not working properly.
The result is very dependent on the number of threads you start.
The more threads you start, the more pages will be crawled.
I guess the problem is that the crawler threads finish due to empty queue and don't get back to work when there is new work in the queue.

Some results shown by the crawler when crawling https://tutorialedge.net

1 Thread: Total Number of Pages Visited 35
5 Threads: Total Number of Pages Visited 35
10 Threads: Total Number of Pages Visited 36
50 Threads: Total Number of Pages Visited 67
100 Threads: Total Number of Pages Visited 78

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions