[IDEV-2771] RTUF stream responses from the server rather than collecting it in memory; Update CLI #180

briluza · 2025-10-22T10:58:19Z

Changes

Use httpx stream for streaming RTTF response
Optimize RTTF functionalities to yield line by line instead of storing it in memory
Update test cases

jbabac · 2025-10-22T15:33:54Z

LGTM! Good job @briluza !

kelvinatorr · 2025-10-23T13:43:40Z

domaintools/results.py

+            )
+            if should_wait:
+                log.info(f"Sleeping for {wait_for}s.")
+                time.sleep(wait_for)


This sleep is going to happen after a subsequent GET request has already been made. This will likely cause a timeout.

ohh nice catch here! should be above before making request so every connection is fresh

kelvinatorr · 2025-10-23T13:50:37Z

domaintools/results.py

+
+            should_wait = (
+                wait_for
+                and wait_for > 0


Why are you checking if it is greater than 0? If wait_for is 0 then it will be False and checked in the previous line. I don't think wait_for can be negative given that it is the difference of the safe_after and now.

kelvinatorr · 2025-10-23T13:51:44Z

domaintools/results.py

+            should_wait = (
+                wait_for
+                and wait_for > 0
+                and not (self.api.rate_limit and (self.product == "account-information"))


The not self.api.rate_limit is already checked in the _wait_time() call. Why check it here again? And how can the product be "account-information" in a class specifically only used for the Feeds products?

Checking this again, this should be removed as this was from the base class and this is specific for feeds.

kelvinatorr · 2025-10-23T13:52:30Z

domaintools/results.py

+                        yield line
+        except Exception as e:
+            self.latest_feeds_status_code = 500
+            yield {"status_ready": True, "error": str(e)}


If an exception occurs, this line will be yielded to the user. Is this what is desired?

answer is related to my comment here #180 (comment)

kelvinatorr · 2025-10-23T13:55:56Z

domaintools/results.py

+                for line in response.iter_lines():
+                    if line:
+                        yield line
+        except Exception as e:


Rather than a blanket Exception don't you think it will be more useful to the user to capture specific httpx exceptions: https://www.python-httpx.org/exceptions/ so the error messages are more specific? If the exception happens during iter_lines() above. The "FATAL: Failed to start the feed generator in _make_request. Reason:" message will be shown, but it will be inaccurate because the iteration already started and the user already has jsonlines yielded to them.

Also if you catch HTTPStatusError you can use raise_for_status() so then you don't have to do the unusual thing of setting self.latest_feeds_status_code, yielding a throwaway line, and then having self.setStatus(self.latest_feeds_status_code) in _get_results() check the status.

thanks, this answer the yield part on exception but rather doing that, I'll set the status and throw the exception as well to the user.

Aslo when calling setStatus, if its not a success status code then it will throw the exception and it will stop the execution before yielding back to user.

kelvinatorr · 2025-10-23T14:00:18Z

domaintools/results.py

+                yield {"status_ready": True}
+
+                for line in response.iter_lines():
+                    if line:


You don't need this check. There should never be a double \n in the RTTF responses. And if there were, it shouldn't be hidden from the user.

ahh okay, thanks for the info!

kelvinatorr · 2025-10-23T15:38:16Z

tests/fixtures/vcr/test_newly_observed_domains_feed.yaml

+      user-agent:
+      - python-httpx/0.28.1
+      x-api-key:
+      - 4b02d-a4719-e33e7-93128-5a5ff


You should remove this api_key from this repo.

kelvinatorr · 2025-10-23T15:43:47Z

domaintools/results.py

 """

-from itertools import chain
+import json


This import is unused

kelvinatorr · 2025-10-23T15:43:58Z

domaintools/results.py

-from itertools import zip_longest
+from itertools import zip_longest, chain
 from typing import Generator
+from httpx import Client


This one too

kelvinatorr · 2025-10-23T15:54:15Z

domaintools/results.py

+
+    Highlevel process:
+
+    httpx stream -> yield each json line -> check status code -> yield back data to client


I don't think this sequence is right. Isn't it actually:

httpx stream -> check status code -> yield back data to client -> repeat if 206

🤦 thanks for spotting this! redunant 'yield'

briluza added 2 commits October 22, 2025 18:14

Implement streaming request for RTTF endpoints

17dcbf9

update test cases

bc187cc

jbabac approved these changes Oct 22, 2025

View reviewed changes

briluza merged commit 6ffe955 into release-2.6.0 Oct 22, 2025
12 checks passed

kelvinatorr reviewed Oct 23, 2025

View reviewed changes

domaintools/results.py

"""

from itertools import chain

import json

Copy link

kelvinatorr Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is unused

briluza reacted with thumbs up emoji

kelvinatorr reviewed Oct 23, 2025

View reviewed changes


		Highlevel process:

		httpx stream -> yield each json line -> check status code -> yield back data to client

[IDEV-2771] RTUF stream responses from the server rather than collecting it in memory; Update CLI #180

[IDEV-2771] RTUF stream responses from the server rather than collecting it in memory; Update CLI #180

Uh oh!

Conversation

briluza commented Oct 22, 2025

Uh oh!

jbabac commented Oct 22, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kelvinatorr Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

briluza Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kelvinatorr Oct 23, 2025 •

edited

Loading

briluza Oct 23, 2025 •

edited

Loading