Skip to content

Conversation

@OleVik
Copy link

@OleVik OleVik commented Jul 30, 2025

Following the case from alexdebril/feed-io#415, the current method-signature of Reader->read() invalidates Adapter/Http/Client->getResponse() by defining a value for $modifiedSince other than its default, null.

When querying a feed that does not support HEAD-requests, it will fail. This is the current case for BlueSky. Eg., a GET-request -- curl https://bsky.app/profile/did:plc:eclio37ymobqex2ncko63h4r/rss works fine, adding --head gives 404-response. This should of course be a 405-response.

This is the case for both the 5- and 6-major version range: When calling getResponse(), the fallback GET-request is not reached if the prior requests throws an error.

@OleVik OleVik changed the title Do not invalidated ResponseInterface Do not invalidate ResponseInterface Jul 30, 2025
@Grotax Grotax requested a review from Copilot August 2, 2025 18:15
Copilot

This comment was marked as outdated.

@Grotax
Copy link

Grotax commented Aug 2, 2025

Hi, thanks for this.

I think in principle I like this, the date way in the past was set to fetch all items no matter what in context of nextcloud/news. I'm not sure how it was before that.

But just removing the lines won't do it, the date set there is used in the Result and in filtering the items and Null will not be accepted to further handling of null is needed.

@SMillerDev what do you think?

@OleVik
Copy link
Author

OleVik commented Aug 2, 2025

It was rather more elaborate back then, see upstream#340. The caller, most often FeedIo-read() already passes along the $modifiedSince-parameter to the client, whose ClientInterface requires it. It is what should handle the null rather than further up the chain.

@Grotax
Copy link

Grotax commented Aug 3, 2025

I'm not sure if I understand what you mean but anyway if you are motivated to keep working on this feel free, otherwise we can close this PR.

@OleVik
Copy link
Author

OleVik commented Aug 3, 2025

The case for removing

if (is_null($modifiedSince)) {
    $modifiedSince = new DateTime('1800-01-01');
}

is that the method - public function read(string $url, FeedInterface $feed, DateTime $modifiedSince = null): Result of Reader - allows null in the last parameter, but these 3 lines invalidate the method-signature.

When it then calls the client through $response = $this->client->getResponse($url, $modifiedSince);, it disallows ClientInterface's method's signature as well: public function getResponse(string $url, DateTime $modifiedSince = null): ResponseInterface;.

If any other value of $modifiedSince than the default was desired, it should be added when calling FeedIo's read-method, which shares the same signature. As it stands Client's return of a GET-request can never return because it's $modifiedSince is not nullable, as it's method signature defines it should be.

@SMillerDev
Copy link

Result of Reader - allows null in the last parameter, but these 3 lines invalidate the method-signature.

How does a completely independent piece of code invalidate a signature? As far as I know invalidating a method signature can only happen if a method is somehow overloaded, which AFAIK isn't even possible in PHP.

Do you mean to say that always setting a date makes the date parameter on the ClientInterface useless?

@OleVik
Copy link
Author

OleVik commented Aug 4, 2025

With the code the property is not nullable, hence it does not follow the method as defined in the class or interface-signature.

@SMillerDev
Copy link

By that definition any if statement that sets a variable based on an earlier method parameter invalidates the method. In an academic sense that might be true, but in the real world this happens a lot.

Unless this fixes a real bug I'm inclined to decline this.

@Grotax
Copy link

Grotax commented Aug 5, 2025

I think the core issue is that some feeds do not support headers.

And I think a better solution to this would be to check if the remote actually supports it or not.

Like in nextcloud/news#3243

@wofferl
Copy link

wofferl commented Aug 5, 2025

I think the core issue is that some feeds do not support headers.

That too, but setting an invalid header in general is not exactly useful either. The RFC states that the server should ignore invalid headers, but as you can see from the reddit example, you are quickly flagged as a ‘bad bot’.

I also think that the workaround would no longer be necessary.

However, I believe that this would not have been necessary in general, as the problem was homemade in my opinion, due to the original forcing to “1970-01-01”. modifiedSince is used here in two places, once to set the If-Modified-Since header to prevent unnecessary downloads, but also for Result->getItemsSince(DateTime $since = null), so that you can only request the items that would be new. I think the workaround was only needed for the latter, so that this could have been done independently of the header. You can also pass a different date to getItemsSince, so this could have been done on the client side, if I understood the problem correctly.

So I would remove the forced setting of the header completely, as both the invalid header “1800-01-01” and “1970-01-01” mean the same as not setting a header. For this Result.php needs to be modified, so that $modifiedSince = null is also allowed here and in this case getItemsSince should return all items, which would also solve the problem with older items.

The question would be what getModifiedSince should return here, I don't see this being changed during processing the request. So it returns the value that you have entered, unless I am missing something here.

And I think a better solution to this would be to check if the remote actually supports it or not.

Like in nextcloud/news#3243

I would leave the responsibility for whether a feed supports modifiedSince to the calling app, which also has control over the status information of the feeds.

@wofferl
Copy link

wofferl commented Aug 5, 2025

This is how I would do it:
wofferl@4a7bfd5

@OleVik
Copy link
Author

OleVik commented Aug 8, 2025

Integrated changes from @wofferl's commit.

@Grotax Grotax requested a review from Copilot August 11, 2025 11:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where the ResponseInterface is invalidated when reading feeds that don't support HEAD requests by removing the automatic creation of a default DateTime for $modifiedSince and making it properly nullable throughout the codebase.

  • Removes automatic creation of default DateTime('1800-01-01') when modifiedSince is null
  • Makes Result constructor's modifiedSince parameter properly nullable with default null
  • Adds null-safe handling in getItemsSince() method to return all items when no start date is provided

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/FeedIo/Reader.php Removes forced default DateTime creation for null modifiedSince parameter
src/FeedIo/Reader/Result.php Makes modifiedSince parameter nullable and adds null-safe filtering logic
tests/FeedIo/ReaderTest.php Updates test expectation to match new null behavior

@Grotax Grotax changed the title Do not invalidate ResponseInterface Change: do no longer default to 1800-01-01 as date for fetching feeds Aug 13, 2025
@Grotax
Copy link

Grotax commented Aug 13, 2025

Thanks for your contributions and inputs that is how you evolve open source projects.

@Grotax Grotax merged commit b079f6b into php-feed-io:main Aug 13, 2025
Grotax added a commit that referenced this pull request Aug 13, 2025
- Add logic to set the node's link from the <guid> element when isPermaLink="true" and no link is present. (#12)
- Do no longer default to 1800-01-01 as date for fetching feeds (#15)
- Don't set if-modified-since header when discover feeds (#19)

- Analysis of relative links for the Atom feed (#10)
- Implicit null deprecations fixed (#17)
Grotax added a commit that referenced this pull request Aug 13, 2025
Changed
- Add logic to set the node's link from the <guid> element when isPermaLink="true" and no link is present. (#12)
- Do no longer default to 1800-01-01 as date for fetching feeds (#15)
- Don't set if-modified-since header when discover feeds (#19)

Fixed
- Analysis of relative links for the Atom feed (#10)
- Implicit null deprecations fixed (#17)
@Grotax Grotax mentioned this pull request Aug 13, 2025
Grotax added a commit that referenced this pull request Aug 13, 2025
Changed
- Add logic to set the node's link from the <guid> element when isPermaLink="true" and no link is present. (#12)
- Do no longer default to 1800-01-01 as date for fetching feeds (#15)
- Don't set if-modified-since header when discover feeds (#19)

Fixed
- Analysis of relative links for the Atom feed (#10)
- Implicit null deprecations fixed (#17)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants