PythonProgramMyself

介绍自己的python小编程

用selenium+Beautifulsoup爬取数据

ccbsite=[]
adr = []
url = "http://tool.ccb.com/outlet/frontOprNodeQuery.gsp"
browser = webdriver.Chrome()
browser.get(url)
browser.find_element_by_xpath('//*[@id="province"]').click()
browser.find_element_by_xpath('//*[@id="province"]/option[25]').click()
browser.find_element_by_xpath('//*[@id="button"]').click()
for p in range(1,20):
    curr_html = browser.page_source
    bs = BeautifulSoup(curr_html,"html.parser")
    tables = bs.select('table tbody tr')
    for i in [0,1,2]:
        tables.pop(0)
    for i in range(0,len(tables)):
        ccbsite.append(tables[i].text.split('\n')[1])
        adr.append(tables[i].text.split('\n')[3].split('\xa0')[0].strip())
    try:
        browser.find_element_by_link_text('下一页').click()    #这一步很关键，不能用xpath定位，试了很多次才意识到
    except:
        print("结束")
        

browser.close()

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitattributes		.gitattributes
Program_MP3player.py		Program_MP3player.py
Program_check.py		Program_check.py
Program_find_keywords.py		Program_find_keywords.py
Program_leaderface.py		Program_leaderface.py
README.md		README.md
README_opencv.md		README_opencv.md
deal_excel.py		deal_excel.py
facedetect.py		facedetect.py
files_copy_collect.py		files_copy_collect.py
jieba_shanghai2035.py		jieba_shanghai2035.py
random_lots.py		random_lots.py
scratch_icbc.py		scratch_icbc.py
tkinter布局图.png		tkinter布局图.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PythonProgramMyself

用selenium+Beautifulsoup爬取数据

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PythonProgramMyself

用selenium+Beautifulsoup爬取数据

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages