LBC publisher integrated #814

nancyboukamel-ds · 2025-10-24T18:31:08Z

No description provided.

addie9800

Thanks for adding the first Lebanese publisher. Unfortunately, the code you provided does not run. I have commented on the relevant section, so that you can go ahead and fix it.

addie9800 · 2025-10-26T17:56:32Z

src/fundus/publishers/lb/__init__.py

+        domain = "https://www.lbcgroup.tv",
+        parser = LBCGroupParser,
+        sources=[
+            RSSFeed("https://www.lbcgroup.tv/Rss/latest-news/en"),


Since the default language is arabic, please add the following parameter to to each English source languages={"en"}. Also, since there is an arabic RSSFeed, please add that as well: RSSFeed("https://www.lbcgroup.tv/Rss/latest-news/ar")

I think we can add all RSSFeeds from https://www.lbcgroup.tv/rss/ar and https://www.lbcgroup.tv/rss/en

addie9800 · 2025-10-26T17:57:24Z

src/fundus/publishers/lb/__init__.py

+        parser = LBCGroupParser,
+        sources=[
+            RSSFeed("https://www.lbcgroup.tv/Rss/latest-news/en"),
+            NewsMap("https://www.lbcgroup.tv/newssitemap.xml"),


This this source is multi-lingual, please override the language with languages={"en", "ar"}

addie9800 · 2025-10-26T18:04:18Z

src/fundus/publishers/lb/__init__.py

+        sources=[
+            RSSFeed("https://www.lbcgroup.tv/Rss/latest-news/en"),
+            NewsMap("https://www.lbcgroup.tv/newssitemap.xml"),
+            Sitemap("https://www.lbcgroup.tv/sitemap.xml"),


I think this sitemap can safely be removed, at least I didn't find any articles that weren't also part of the Newsmap. And more problematically, there were numerous pages that were not relevant to Fundus.

addie9800 · 2025-10-26T18:08:19Z

src/fundus/publishers/lb/lbc_group.py

+            # Use the defined content_selector to locate the block of text.
+            return extract_article_body_with_selector(
+                self.precomputed.doc,
+                content_selector=self._content_container_selector,


This line does not really make sense and causes the program to crash. You have defined the selector as content_container_selector and try to access it as _content_container_selector. Also, extract_article_body_with_selector has no parameter content_selector.

nancyboukamel-ds and others added 3 commits October 21, 2025 14:48

lebanon publisher

958d802

lebanon news

27dd5a4

Update documentation from @ 716b827

92b8962

addie9800 self-assigned this Oct 26, 2025

addie9800 requested changes Oct 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LBC publisher integrated #814

LBC publisher integrated #814

Uh oh!

nancyboukamel-ds commented Oct 24, 2025

Uh oh!

addie9800 left a comment

Uh oh!

addie9800 Oct 26, 2025

Uh oh!

addie9800 Oct 26, 2025

Uh oh!

addie9800 Oct 26, 2025

Uh oh!

addie9800 Oct 26, 2025

Uh oh!

addie9800 Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LBC publisher integrated #814

Are you sure you want to change the base?

LBC publisher integrated #814

Uh oh!

Conversation

nancyboukamel-ds commented Oct 24, 2025

Uh oh!

addie9800 left a comment

Choose a reason for hiding this comment

Uh oh!

addie9800 Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

addie9800 Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

addie9800 Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

addie9800 Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

addie9800 Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants