Skip to content

Duplicates or 403 are not taken into account by the maxUrlPerSchemeAuthority limit #4

@guillaumepitel

Description

@guillaumepitel

In several occasions, I've seen a lot of URLs requested on a hosts, even though the maxURLPerSchemeAuthority was low (maybe 50-100).

It seems that duplicates and other non-content responses (401,403) are not counted. This behaviour makes sense for a lot of sites, but I think there should be a limit "maxRequestPerSchemeAuthority" to avoid wasting time on sites with a lot of inlinks that leads to nothing (for instance, there are a lot of links pointing toward stumbleupon.com/submit?...... which produces an error).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions