Skip to content

learncodesmart/FinNLP

 
 

Repository files navigation

Natural Language Dataset for Finance

The demos are shown in ChatGPT for FinTech

Disclaimer: We are sharing codes for academic purpose under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.

Ⅰ. Data sources

1. News

Platform Data Type Related Market Data Source Specified Company Range Type Source Type Limits
Yahoo Financial News US Stocks Finnhub News Date Range Third party Account-specific (Free)
Sina Financial News CN Stocks Sina Finance × Date Range Official Not too much
CCTV Governemnt News CN Stocks Akshare cctv × Date Range Third party N/A
US Mainstream Media Financial News US Stocks Finnhub News Date Range Third party Account-specific (Free)
CN Mainstream Media Financial News CN Stocks Tushare Major News × Date Range Third party Account-specific(About ¥500 per year )

2. Social Media

Platform Data Type Related Market Data Source Specified Company Range Type Source Type Limits
Twitter Tweets US Stocks Twitter Downloader Date Range Official N/A
Twitter Sentiment US Stocks Finnhub Sentiment Date Range Third Party N/A
StockTwits Tweets US Stocks Stocktwits Downloader Lastest Official N/A
Reddit (wallstreetbets) Threads US Stocks Reddit Downloader × Lastest Official N/A
Reddit Sentiment US Stocks Finnhub Sentiment Date Range Third Party N/A
Weibo Tweets CN Stocks Soon - - - -

3. Company Announcement

Platform Data Type Related Market Data Source Specified Company Range Type Source Type Limits
Juchao (Official Website) Text CN Stocks Juchao Annoumcement Downloader Date Range Official Not too much
SEC (Official Website) Text US Stocks Soon Date Range Official Not too much
Sina Text CN Stocks Sina Annoumcement Downloader Lastest Third Party Not too much

4. Trends

Platform Data Type Related Market Data Source Specified Company Range Type Source Type Limits
Google Trends Index US Stocks Google Trends Date Range Official N/A
Baidu Index Index CN Stocks Soon - - - -

5. Data Sets

Data Source Type Stocks Dates Avaliable
AShare News 3680 2018-07-01 to 2021-11-30
stocknet-dataset Tweets 87 2014-01-02 to 2015-12-30
CHRNN Tweets 38 2017-01-03 to 2017-12-28

Ⅱ. Large Language Models (LLM)

1. Chat

2. Sentiment/Embedding

About

Alternative data (NLP) in finance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.1%
  • Python 9.9%