Skip to content

fix: Persist rendering type detection results in AdaptivePlaywrightCrawler#2987

Merged
janbuchar merged 14 commits intomasterfrom
recoverable-rendering-type-detection-results
Jun 11, 2025
Merged

fix: Persist rendering type detection results in AdaptivePlaywrightCrawler#2987
janbuchar merged 14 commits intomasterfrom
recoverable-rendering-type-detection-results

Conversation

@janbuchar
Copy link
Copy Markdown
Contributor

  • closes Persist RenderingTypePredictor state #2899
  • The PR introduces a RecoverableState class with the hopes of using it for all persistent components (Statistics, SessionPool, ...), similarly to the Python port, in the future
  • RecoverableState is utilized in RenderingTypePredictor to make its state persistent by default for better detection reliability after pauses and migrations

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label May 27, 2025
@janbuchar janbuchar requested review from B4nan and barjin May 27, 2025 15:19
@github-actions github-actions bot added this to the 115th sprint - Tooling team milestone May 27, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label May 27, 2025
@janbuchar janbuchar force-pushed the recoverable-rendering-type-detection-results branch from 6271943 to 5f580ac Compare May 28, 2025 10:21
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues (not calling initialize), but it looks solid otherwise.

We might have different ideas about what this is for. I was imagining a fully generic "trait", that helps users with implementing serialize and deserialize methods for anything imaginable. The current implementation decisions seem to point more towards "simple" POJOs only.

This is fine by me, I'm just curious - have you tried (thought about) the former thing? Do you see any reasons why that wouldn't work?

"compilerOptions": {
"outDir": "./dist"
"outDir": "./dist",
"rootDir": "./src"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need this (and the changes in the other tsconfig)? feels wrong to change it just for two packages

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janbuchar janbuchar requested review from B4nan and barjin June 8, 2025 11:34
@janbuchar
Copy link
Copy Markdown
Contributor Author

I spent quite a bit of time trying to make serialize and deserialize non-optional if the TStateModel is not directly JSON-serializable, but I had to give up. I guess we can either keep it like it is now or make those callbacks required.

Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright to me (aside from the TS edits, but I'll trust y'all on this). Again - thanks for the tests 🚀

* across migrations or restarts. It manages the loading, saving, and resetting of state data,
* with optional persistence capabilities.
*
* The state is represented by a plain JavaScript object that can be serialized to and deserialized from JSON.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is no longer completely true with custom serialize and deserialize methods, but there is IMO nothing wrong with not telling the users the whole truth, keeping them on the safe side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "plain JavaScript object" is not exactly a legal term and we do accept stuff that can be (de)serialized from/to JSON, if only with the caveat that sometimes you have to write the serialization logic by hand.

image

@janbuchar janbuchar merged commit 76431ba into master Jun 11, 2025
9 checks passed
@janbuchar janbuchar deleted the recoverable-rendering-type-detection-results branch June 11, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persist RenderingTypePredictor state

3 participants