Skip to content

Conversation

@freylily
Copy link

Implements the T-Online publisher with body content extraction.

Changes:

  • Created TOnlineParser class with V1 parser version
  • Implemented body extraction using CSS selectors for paragraphs, summary, and
    subheadings
  • Registered T-Online in the DE publisher group
  • Tested selectors on multiple articles to ensure correct extraction

The parser extracts:

  • Title from JSON-LD headline
  • Authors from JSON-LD author field
  • Publishing date from JSON-LD datePublished
  • Body content (paragraphs and subheadings)

Copy link
Collaborator

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding T-Online. Unfortunately, it did not run on my device. I left a couple of comments pointing out the issues.

domain="https://www.t-online.de/",
parser=TOnlineParser,
sources=[
Sitemap("https://www.t-online.de/sitemap.xml"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



class TOnlineParser(ParserProxy):
class V1(BaseParser):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing the images and topics attributes.


class TOnlineParser(ParserProxy):
class V1(BaseParser):
_paragraph_selector = CSSSelector("div[class*='px-24'] > p.text-18")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to rework all the selectors, they do not seem to work.

],
)

TOnline = Publisher(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as you're done implementing this parser, please follow these steps to generate the test files. Make sure that the test article includes, topics, images and subheadlines.

@addie9800 addie9800 self-assigned this Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants