This is a simple library to interface with HN Search API (provided by Algolia).
Install | Basic Usage | Development | Roadmap
👉 Note: As an example, I used this library to download ALL Hacker News posts and made it available as a public dataset in Kaggle.
$ pip install python-hnfrom hn import search_by_date
# Search everything (stories, comments, etc) containing the keyword 'python'
search_by_date('python')
# Search everything (stories, comments, etc) from author 'pg' and keyword 'lisp'
search_by_date('lisp', author='pg', created_at__lt='2018-01-01')
# Search only stories
search_by_date('lisp', author='pg', stories=True, created_at__lt='2018-01-01')
# Search stories *or* comments
search_by_date(q='lisp', author='pg', stories=True, comments=True, created_at__lt='2018-01-01')Tags are part of HN Search API provided by Algolia. You can read more in their docs. They can form complex queries, for example:
# All the comments in the story `6902129`
tags = PostType('comment') & StoryID('6902129')The available tags are:
PostType: with optionsstory,comment,poll,pollopt,show_hn,ask_hn,front_page.Author: receives the username as param (Author('pg')).StoryID: receives the story id (StoryID('6902129'))
Filters can be applied to restrict the search by:
- Creation Date:
created_at - Points:
points - Number of comments:
num_comments
They can accept >, <, >=, <= operators with a syntax similar to Django's.
lt(<): Lower than. Exampleponts__lt=100lte(<=): Lower than or equals to. Exampleponts__lte=100gt(>): Greater than. Examplecreated_at__gt='2018'(created after 2018-01-01).gte(>=): Greater than or equals to. Examplenum_comments__gte=50.
Examples (See Algolia docs for more info):
# Created after October 1st, 2018
search_by_date(created_at__gt='2018-10')
# Created after October 1st, 2017 and before January 1st 2018
search_by_date(created_at__gt='2018-10', created_at__lt='2018')
# Stories with *exactly* 1000 points
search_by_date(tags=PostType('story'), points=1000)
# Comments with more than 50 points
search_by_date(tags=PostType('comment'), points__gt=50)
# Stories with 100 comments or more
search_by_date(tags=PostType('story'), num_comments__gt=100)[TODO]
I'm in the process of updating this project and migrating to uv. You should be able to just do:
$ uv run py.test