Skip to content

Extract Tripadvisor reviews from a specific page with Google Colab #17

@biagioscalingipsy

Description

@biagioscalingipsy

Hi Giuseppe!
I premise that I am a very novice user of Python, and for the moment, I am using Google Colab to perform some operations. In particular, I am trying to extract the reviews on TripAdvisor at this link:(https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html).

I tried several attempts using BeautifulSoup:
import requests
from bs4 import BeautifulSoup as soup

import requests
from bs4 import BeautifulSoup as soup

import requests
from bs4 import BeautifulSoup as soup

URL della pagina di TripAdvisor

url = 'https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html'

Effettua la richiesta HTTP per ottenere il contenuto della pagina

html = requests.get(url)
bsobj = soup(html.content, 'html.parser')

Trova tutti i tag 'q' che contengono le recensioni

reviews = []
for r in bsobj.findAll('q'):
reviews.append(r.span.text.strip())
print(r.span.text.strip())

Stampa le recensioni estratte

for review in reviews:
print(review)`

The code seems to work, but the runtime is too long and eventually crashes because of a large idle time on Colab (I even tried inserting an automatic click to avoid the timeout, but it doesn't work).

After that, I tried following your script but when I run:
driver = webdriver.Safari()
I get this error:
"Exception: SafariDriver was not found; are you using Safari 10 or later? You can download Safari from https://developer.apple.com/safari/download/".

The point is that I have the latest version of Safari (version 16.5.1), and I also checked the Safari Development section "Allow remote automation". How do you think I can download the reviews into a txt file or put them into a dataframe?

Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions