Skip to content

Conversation

@vrdhn91
Copy link

@vrdhn91 vrdhn91 commented Oct 22, 2025

To add Media Indonesia publisher, a new python file is created (media_indonesia.py) for the parser class. Extracted information are title, author, published date, and article.

@vrdhn91 vrdhn91 changed the title Task 3 - Add Media Indonesia Publisher Add Media Indonesia Publisher Oct 22, 2025
Copy link
Collaborator

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding our first Indonesian publisher! I do have a couple of comments, before we can continue with the next step.



class MediaIndonesiaParser(ParserProxy):
class V1(BaseParser):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images attribute seems to be missing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topics also seem to be missing. They can be accessed using the meta attribute keywords.


class MediaIndonesiaParser(ParserProxy):
class V1(BaseParser):
_paragraph_selector = CSSSelector("div.article")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph selector selects just the entire article as one big paragraph. You should consider something like div.article > p. Make sure it is consistent with the subheadlines of this article. I would recommend switching to XPaths, I personally prefert them for these more subtle cases.

class MediaIndonesiaParser(ParserProxy):
class V1(BaseParser):
_paragraph_selector = CSSSelector("div.article")
_subheadline_selector = CSSSelector("div.article > h2")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This article uses a different formatting for the subheadlines.

@addie9800 addie9800 self-assigned this Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants