Skip to content

Questions about TransclusionDecideRule #496

@cgr71ii

Description

@cgr71ii

Hi! I have some questions about TransclusionDecideRule which I'd appreciate if someone could answer:

  • The option maxTransHops with the value 0 means that only text or similar will be downloaded? If so, a value of 0 I understand that might be counterproductive since we might be losing some URI's, is that right?
  • The option maxSpeculativeHops with the value 0 means that only URI's from the same authority will be downloaded (if we ignore, for the purpose of the question, the rest of decide rules)?
  • The option maxSpeculativeHops with a greater value of 0 will get URI's if the maximum number of hops don't exceed the provided value, but will the URI's from the downloaded documents be included in the crawl?
  • The options maxTransHops and maxSpeculativeHops with the value 0 would lead to a crawl where only text, or similar but not rich-media content, would be downloaded from pages of the same authority? If not, how is this possible to archive? If I'm not wrong, in order to only download data from the same authority it is possible to use HopCrossesAssignmentLevelDomainDecideRule, but I'm not sure and neither about only download text-based data (Exclude PDF-Files #453 might be the solution, but again not sure).

The problem I think is that I don't understand very well the "trans" and "speculative" hops, even that I've read the wiki post about it.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions