Fix Static Crawling Issue Due to Newly Implemented Anti-Scraping Mechanism#109
Open
JunTingLin wants to merge 3 commits intomlouielu:masterfrom
Open
Fix Static Crawling Issue Due to Newly Implemented Anti-Scraping Mechanism#109JunTingLin wants to merge 3 commits intomlouielu:masterfrom
JunTingLin wants to merge 3 commits intomlouielu:masterfrom
Conversation
…ing of TWSE_EQUITIES and TPEX_EQUITIES
Contributor
|
Hello JunTingLin! I think I encountered the same problem with you. The update function fails. |
mitchhuang777
left a comment
There was a problem hiding this comment.
- Consider adding try-except blocks can help handle potential exceptions.
- use WebDriverWait(driver, 10).until rather than time.sleep
| driver.get(main_page_url) | ||
| time.sleep(5) # 等待JavaScript渲染完成 | ||
| driver.get(url) | ||
| time.sleep(5) # 等待JavaScript渲染完成 |
There was a problem hiding this comment.
magical number is not a good way :(
| # 使用WebDriver先訪問主頁面,再訪問指定的URL | ||
| main_page_url = "https://isin.twse.com.tw" | ||
| driver.get(main_page_url) | ||
| time.sleep(5) # 等待JavaScript渲染完成 |
There was a problem hiding this comment.
magical number is not a good way :(
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
作者您好,
首先感謝您開發並分享這麼實用的專案。我在使用過程中發現,自從過年之後,原本透過靜態爬蟲requests去抓取http://isin.twse.com.tw/isin/C_public.jsp?strMode=2 上的所有股票代號資料的方法已經無法正常運作了。我推測這可能是網站加強了防爬機制的結果。
為了解決這個問題,我對fetch.py中的fetch_data函數進行了一番修正,改用Selenium進行動態爬蟲。考慮到可能有使用者會在無GUI環境下運行此專案,我有啟用了無頭模式(headless mode)。但...一旦啟用無頭模式後,就頻繁遇到連線失敗的問題。經過一番嘗試後,我發現了一個可行的解決方案:先訪問主頁面https://isin.twse.com.tw 並暫停幾秒,然後再去訪問目標URL,這樣就能順利獲取所需的資料了。
如果我的修改存在任何問題,或者有更好的解決方案,請隨時聯繫我。