Skip to content
This repository was archived by the owner on Sep 15, 2025. It is now read-only.

Conversation

wofferl
Copy link

@wofferl wofferl commented Jan 20, 2025

Feeds may, for whatever reason, contain older articles, which disproportionately shifts the nextUpdateTime based on the average. This happens when the last update time + median does not yield a date in the future, and then the average is taken.

For example you have a feed with 20 items, where 15 items are from the last 2 hours and 5 are month ago. The median will than very low compared to the average. To prevent this, outliers are removed using the interquartile range.

Another problem are feeds that are generated during download and therefore the last modified time corresponds to the download. These feeds are not recognized as sleepy and are therefore calculated incorrectly.
It is therefore better to use the time of the most recent article as the basis.

Here are two examples for the first problem:

This feed is very active during the day, has 20 items that were mostly written in the last 1.5-2 hours. Towards the evening the intervals increases and from time to time there are items that are months old. It can therefore happen that the next update time is postponed by a week.

bin/feedio read https://newsfeed.kicker.de/news/aktuell

kicker.xml.txt

Currently:

Next time a new item may be published : 2025-01-27T02:45:31+00:00
Minimum interval between items : 0 days, 0 hours, 5 minutes, 57 seconds
Median interval : 0 days, 0 hours, 5 minutes, 57 seconds
Average interval : 8 days, 11 hours, 46 minutes, 8 seconds
Maximum interval : 161 days, 5 hours, 21 minutes, 7 seconds

Patched version:

Next time a new item may be published : 2025-01-20T20:09:18+00:00
Minimum interval between items : 0 days, 0 hours, 0 minutes, 4 seconds
Median interval : 0 days, 0 hours, 5 minutes, 57 seconds
Average interval : 0 days, 0 hours, 7 minutes, 8 seconds
Maximum interval : 161 days, 5 hours, 21 minutes, 7 seconds

Same for this feed, here there are always old items.

sportschau.xml.txt

bin/feedio read "https://www.sportschau.de/fussball/index~rss2.xml"

Currently:

Next time a new item may be published : 2025-01-22T19:45:04+00:00
Minimum interval between items : 0 days, 0 hours, 55 minutes, 53 seconds
Median interval : 0 days, 0 hours, 55 minutes, 53 seconds
Average interval : 2 days, 16 hours, 49 minutes, 14 seconds
Maximum interval : 48 days, 2 hours, 46 minutes, 50 seconds

Patched version:

Next time a new item may be published : 2025-01-20T20:13:59+00:00
Minimum interval between items : 0 days, 0 hours, 1 minutes, 28 seconds
Median interval : 0 days, 0 hours, 55 minutes, 53 seconds
Average interval : 0 days, 2 hours, 22 minutes, 10 seconds
Maximum interval : 48 days, 2 hours, 46 minutes, 50 seconds

@wofferl wofferl requested a review from alexdebril as a code owner January 20, 2025 19:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant