Skip to content

BeautifulSoup Not Scraping Tags #35

@sandersrc

Description

@sandersrc

I have been following your SeautifulSoup tutorial (https://www.youtube.com/watch?v=87Gx3U0BDlo). When I run the code I get an error that indicates there are no attrs within the result. Please quickly browse through my problem below. Any push in the right direction would be much appreciated.

CODE

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for h2_tag in soup.find_all('h2'):
a_tag = h2_tag.find('a')
urls.append(a_tag.attrs['href'])

print(urls)
#####################

RESULT:
Traceback (most recent call last):
File "C:\Users\Rafiki\PycharmProjects\HelloWorld\WHExample.py", line 22, in
urls.append(a_tag.attrs['href'])
AttributeError: 'NoneType' object has no attribute 'attrs'

Process finished with exit code 1

I changed the code to the following so I could see what I was getting in the h2 tags and found that they were not including the nested information.

CODE

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefing-room/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

for h2_tag in soup.find_all('h2'):
print(h2_tag.attrs)
##############

RESULT:
{'id': 'dialog2Title'}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['h4alt', 'form-headline']}

Process finished with exit code 0

Please help me understand how to drill down through tags to find the information within.

Thank you,
Ryan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions