- Workflow
- User input a prompt about what news they want to know about.
- Each day the agent will search the web for news that fix the prompt that user set.
- The agent will classify the news into categories based on the relevance to the prompt.
- It will then generate an audio as a form of podcast for the listener.
- analyze (prompting.py):
- Input: User prompt
- Output: list[interest] List of user interest
- crawl (crawl.py):
- Input: news website, # of article to crawl
- Output: list[article] List of article object
- gen_embed (crawl.py):
- Input: list of article
- Update embedding library
- Output: Dict{article, embedding}
- rank (crawl.py):
- Input: Dict{article, embedding}, list[interest]
- Output: Dict{interest, list[article]} List of article according to interest
- generate_podcast_script (generation.py):
- Input: Dict{interest, list[article]}
- Output: list[Line] List of script lines
- synthesize_audio (generation.py):
- Input: list[Line]
- Output: mp3
V 0.2
- Crawl Pipeline:
- crawl (crawl.py):
- Input: news website, # of article to crawl
- Output: list[title, url, summary]
- gen_embed (crawl.py):
- Input: list[title, url, summary]
- Update embedding library (.pkl): Dict{url, embedding}
- Output: None
- crawl (crawl.py):
- Recommendation Pipeline
- analyze (prompting.py):
- Input: User prompt
- Output: list[interest] List of user interest
- rank (crawl.py):
- Input: Read .pkl file: Dict{url, embedding}, list[interest]
- Download article based on url
- Output: Dict{interest, list[article]} List of article according to interest
- generate_podcast_script (generation.py):
- Input: Dict{interest, list[article]}
- Output: list[Line] List of script lines
- synthesize_audio (generation.py):
- Input: list[Line]
- Output: mp3
- analyze (prompting.py):