-
Notifications
You must be signed in to change notification settings - Fork 782
Questions about TransclusionDecideRule #496
Copy link
Copy link
Closed
Labels
Description
Hi! I have some questions about TransclusionDecideRule which I'd appreciate if someone could answer:
- The option
maxTransHopswith the value 0 means that only text or similar will be downloaded? If so, a value of 0 I understand that might be counterproductive since we might be losing some URI's, is that right? - The option
maxSpeculativeHopswith the value 0 means that only URI's from the same authority will be downloaded (if we ignore, for the purpose of the question, the rest of decide rules)? - The option
maxSpeculativeHopswith a greater value of 0 will get URI's if the maximum number of hops don't exceed the provided value, but will the URI's from the downloaded documents be included in the crawl? - The options
maxTransHopsandmaxSpeculativeHopswith the value 0 would lead to a crawl where only text, or similar but not rich-media content, would be downloaded from pages of the same authority? If not, how is this possible to archive? If I'm not wrong, in order to only download data from the same authority it is possible to useHopCrossesAssignmentLevelDomainDecideRule, but I'm not sure and neither about only download text-based data (Exclude PDF-Files #453 might be the solution, but again not sure).
The problem I think is that I don't understand very well the "trans" and "speculative" hops, even that I've read the wiki post about it.
Thank you!
Reactions are currently unavailable