Coccodrillo is an intelligent natural language assistant. It is capable of interpreting complex requests regarding travel, weather, restaurants, events, safety, and much more. It uses advanced NLP models to understand user intents, extract relevant entities, correct errors, and generate structured responses.
The system recognizes 8 main intents:
-
π Safety updates for a location β "Is it safe?"
- Sources: Bing News + Viaggiare Sicuri
-
πͺοΈ Weather alerts and extreme weather conditions
- Sources: Bing News
-
πΊοΈ Recommended places to visit
- Data: Dataset downloaded from Lonely Planet, with the help of Python scripts based on Selenium and BeautifulSoup.
- Integration: Uses Google Maps to check the availability of locations (for example, to know if a place is temporarily closed).
-
πΆ Information on concerts and events
- Sources: Bandsintown
-
π½οΈ Best restaurants to eat
- Sources: Yelp
-
π Recommendations on typical dishes or foods
- Data: Dataset downloaded from TasteAtlas, using Python scripts with Selenium and BeautifulSoup.
-
π€οΈ Future weather forecasts
- Sources: Il Meteo
-
π Information on trains, flights, and buses
- Sources: TheTrainLine
This project requires PyTorch 2.6.0.
β PyTorch 2.6.0 is compatible with the following Python versions:
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.13
β οΈ Note: The file ./setup/requirements.txt
lists the currently installed libraries, generated with the command pip freeze > requirements.txt
,
but in theory all necessary libraries, with the correct versions, can be installed using the Python script to_install.py.
To install all the necessary libraries and download the required BERT models and spaCy language model, run the following commands:
python3 ./setup/to_install.py
This project uses Chrome for browser automation, which requires ChromeDriver to be installed.
You can download the appropriate version of ChromeDriver for your operating system (Windows, macOS, or Linux) from the following link:
π ChromeDriver Downloads
Note: For convenience, ChromeDriver executables for Linux and Windows are already included in the
./setup/driver
directory.
- Make sure to download the ChromeDriver version that matches your installed version of Google Chrome.
- If you donβt have Google Chrome installed, you can download it here:
π Download Google Chrome
- Place the
chromedriver
executable inside thebin/
directory.
- Place the
chromedriver_win32
directory in the root of this project. - Ensure that the ChromeDriver file has a
.exe
extension (chromedriver.exe
).
To run the project, navigate to the src directory and execute the run.py script using the following command:
cd src
python3 run.py
-
The user enters a natural language phrase in
run.py
-
The
BERT
modelall-MiniLM-L6-v2
is used to classify the intent of the request -
The text is analyzed to extract cities, dates, and other entities through:
- π§ Question Answering with:
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad') model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
- If information is missing (e.g., date), the system detects this and asks the user for clarification.
- The system handles spelling errors and formatting through semantic checks.
- π§ Question Answering with:
-
Downloaded datasets and information available on the web (using Python libraries like Selenium and BeautifulSoup) are used to search for all the necessary data.
-
The output is presented to the user in a well-structured format.
First Response:
An introductory sentence generated with a bi-grams model based on a simple dataset, explaining that the system is searching for information.
Second Response:
A structured output of all the information found.
The system automatically corrects small errors in city names or dates, thanks to spelling checks and semantic similarity logic.
The user can make complex requests, including multiple destinations and time periods within the same sentence.
For safety requests regarding foreign cities or countries:
- News is searched in the local language to maximize accuracy.
- Texts are summarized and translated using the following tools:
from transformers import pipeline, BartTokenizer
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
To filter articles and return only the most relevant ones to the query, the similarity between the user query and found articles is calculated:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus)
query_vector = tfidf_matrix[-1]
document_vectors = tfidf_matrix[:-1]
similarity = cosine_similarity(query_vector, document_vectors).flatten()
Only articles with similarity above a predefined threshold are included in the response.
- Viaggiare Sicuri (Farnesina)
- Reliable international news websites
- Automatic summaries via
Bart
The system constructs a graph of places to visit:
- ποΈ Nodes = Points of interest (museums, landmarks, squares, etc.)
- π Edges = Walking distances between each location
- Each node includes two weights:
- π¨ Beauty: a value representing how iconic or scenic the place is
- β±οΈ Visit time: estimated duration needed to explore the location
This project implements a minimum-cost algorithm for optimizing sightseeing routes, with the goal of providing an efficient and enjoyable itinerary. It balances factors such as the beauty of the place, walking distance, and visit duration to create an ideal sightseeing plan.
- π¨ Beauty of the place (highly weighted)
- πΆ Walking distance (moderately weighted)
- β Visit duration (low or no weight)
- β Google Maps availability data: Includes only locations that are open on the selected days and excludes temporarily closed or under renovation places.
- 8 hours per day of sightseeing time, ensuring the itinerary respects the total time available (e.g., 3 days Γ 8 hours = 24 hours).
Currently, the route optimization feature supports the following cities:
- Rome
- Ljubljana
- Prague
- Vienna
- Florence
- Naples
- Maribor
- Paris
- Valencia
- Barcelona
- Madrid
We plan to extend the list of supported cities in the future. The dataset for additional cities has already been downloaded but requires manual formatting to meet the model's specifications.
"I would like to go to Rome for 3 days. Can you recommend the best things to visit? Start from Termini Station."
With the following configuration:
importance_time_visit = 0.0
importance_beauty = 0.7
importance_edge = 0.3
The output is:
{
'Rome': ([
('Museo Nazionale Romano: Palazzo Massimo alle Terme', '1'),
('Basilica di Santa Maria Maggiore', '1'),
('Colosseum', '1'),
('Roman Forum', '1'),
('Pantheon', '1'),
('Piazza Navona', '1'),
('Villa Farnesina', '1'),
("Castel Sant'Angelo", '1'),
("St Peter's Basilica", '1'),
('Vatican Gardens', '1'),
('Sistine Chapel', '0'), # β marked as closed
('Vatican Museums', '1'),
('Gianicolo', '1'),
('Museo della Repubblica Romana e della Memoria Garibaldina', '1'),
('Basilica di Santa Maria in Trastevere', '1'),
('Jewish Ghetto', '1'),
("Campo de' Fiori", '1'),
('Trevi Fountain', '1'),
('Galleria Doria Pamphilj', '1'),
('Piazza di Spagna', '1'),
('Pincio Hill Gardens', '1'),
('Museo e Galleria Borghese', '1')
], 23)
}
- Total time used: ~23 hours
- π« Closed location excluded: Sistine Chapel (temporarily closed)
Below is the optimized travel route drawn on a map.
Each segment is color-coded based on the order of visitation (earliest to latest).
- ποΈ City: Rome
- π Number of days
- π Starting location (e.g., Termini Station)
- Constructs a weighted graph of attractions
- Filters out temporarily closed or inaccessible sites
- Runs a route optimization algorithm with custom weights:
- π¨ Beauty of each place
- β Visit time
- πΆ Walking distance
- Computes total estimated time
- Generates a visual map of the itinerary
- β A sorted list of recommended places to visit
- β³ Estimated total visit time
- πΊοΈ A visual path connecting all selected locations
The user can search for events specifying:
- City
- Dates
- Artist
- Music genre
The system connects to public website (e.g., Bandsintown) to show updated events.
- Suggests the best restaurants based on the area
- Recommends typical dishes based on location
The weather module provides:
- Detailed forecasts for cities and dates
- Automatic detection of extreme events or abnormal conditions
The system provides up-to-date information on:
- Trains
- Buses
While the system is robust and flexible, it has some technical limitations that are currently being improved:
- The transport website, if constantly queried, can block automated traffic detecting it as suspicious activity.
- In these cases, the search may fail or return incomplete results.
- Long articles may cause errors in the translation phase.
- The system splits texts into individual sentences, but sometimes even a single sentence is too long to be translated correctly.
- In these cases, the result is provided in the original language.
Tell me the best typical food in Rome.
Output
Can you recommend some places to visit in Valencia for 3 days?
Output
I would like to go in Rome for 3 days, can you reccomend for me the best things to visit? Start to Termini Station
Output
Hello, can you say me the current situation about the security in France, is safe?
Output
Tell me the last news about the warning weather alert in Valencia.
Output
Some concerts in Ljubljana for tomorrow.
Output
I am going in Milan in the 1 June, there are concert by Jerry Cantrell?
Output
Can you write for me the best places where i can eat in Prague
Output
What are the typical food in Naples? and in Paris?
Output
I am going in Berlin, tell me the temperature for friday.
Output
-
Language: Python
3.10+
-
NLP Models:
BERT
(for Question Answering and Named Entity Recognition)MiniLM
(for intent classification)BART
(for summarization and translation)
-
Main Libraries:
transformers
,sentence-transformers
scikit-learn
,networkx
,geopy
nltk
,spacy
,pandas
,requests
beautifulsoup4
,selenium
The system has been tested using real-world natural language phrases to verify its reliability and effectiveness in real scenarios. Three main types of tests were conducted to assess various aspects of the system:
-
Test on ambiguous and incomplete requests
Examples with incomplete or ambiguous phrases were used to verify how the system handles request interpretation and the processing of missing information.
The results of these tests are available in thetesting/query_with_error_testing
folder. -
Intent classification test
This test verified if the system can correctly classify the user intent, even in the presence of complex phrases or multiple intents.
The results of these tests are available in thetesting/intent_testing
folder. -
Real request and output test
In this test, real examples of requests were used, executing a complete simulation from query formulation to the system-generated output. The goal was to evaluate the quality of the responses generated and the overall system reliability.
The results of these tests are available in thetesting/final_output_testing
folder.
- In some cases, spelling errors were deliberately introduced in the requests to test the system's ability to correct them automatically.
- Multiple cities or dates were also included in a single request to verify how the system handles complex scenarios.
These tests helped identify and fix any weaknesses, improving the overall system reliability.
Do you have suggestions, bugs to report, or want to contribute?
π Open an issue or contact me directly on Davide GitHub or Ondrej GitHub