-
-
Notifications
You must be signed in to change notification settings - Fork 2
Make nextupdatetime more reliable #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the reliability of next update time calculations for feeds by addressing two key issues: feeds containing old articles that skew average calculations, and feeds generated during download time rather than reflecting actual publication patterns.
- Implements outlier removal using interquartile range (IQR) to filter out old articles that disproportionately affect average intervals
- Changes the base timestamp from feed's last modified time to the newest item's publication date for more accurate scheduling
- Improves median calculation to handle even-numbered datasets correctly
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/FeedIo/Reader/Result/UpdateStats.php | Core logic changes including outlier removal, timestamp base switching, and median calculation improvements |
| tests/FeedIo/Reader/Result/UpdateStatsTest.php | Updated test expectations to reflect the new calculation methodology |
|
Since I can't make any changes here, I thumbed up the copilot changes :) |
d1dad98 to
9e751a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ae12ff9 to
6f1fdf6
Compare
- Use IQR method for outlier detection in average calculation - Calculate median using middle two values for even counts - Use newest item date for sleep detection and next update - Prevent future dates from affecting calculations - Add comprehensive edge case tests Signed-off-by: Wolfgang <[email protected]>
…completeness Small code fixes Co-authored-by: Copilot <[email protected]>
6f1fdf6 to
796e684
Compare
Imported: alexdebril#434
Feeds may, for whatever reason, contain older articles, which disproportionately shifts the nextUpdateTime based on the average. This happens when the last update time + median does not yield a date in the future, and then the average is taken.
For example you have a feed with 20 items, where 15 items are from the last 2 hours and 5 are month ago. The median will than very low compared to the average. To prevent this, outliers are removed using the interquartile range.
Another problem are feeds that are generated during download and therefore the last modified time corresponds to the download. These feeds are not recognized as sleepy and are therefore calculated incorrectly.
It is therefore better to use the time of the most recent article as the basis.
Here are two examples for the first problem:
This feed is very active during the day, has 20 items that were mostly written in the last 1.5-2 hours. Towards the evening the intervals increases and from time to time there are items that are months old. It can therefore happen that the next update time is postponed by a week.
bin/feedio read https://newsfeed.kicker.de/news/aktuell
kicker.xml.txt
Currently:
Patched version:
Same for this feed, here there are always old items.
sportschau.xml.txt
bin/feedio read "https://www.sportschau.de/fussball/index~rss2.xml"
Currently:
Patched version: