Doctolib Scraper

Doctolib Scraper is a python program that scraps doctors informations into a yaml file.

The program is meant to be run locally.

Warning

Don't over use this project. Doctolib has a antiscrap policy, meaning that you could get banned from the website if you scrap it too much.

But don't worry, I have personnaly run this script (and previous more aggressive ones) like 40 times in a short period of time, and didn't have any problem. Only a cloudflare at one point, but it quickly disapeared when I came back to a normal navigation on the website for some days.

Still, even if there's no consequences for you, please remain respectful of their website by making a reasonable amount of requests to it.

Motivation

The initial goal of this project goes beyond simply scraping data from Doctolib. The ultimate objective is to classify practitioners in a spreadsheet based on certain criteria. The most important one for me is accessibility from my home. Since I don’t own a car, some practitioners are difficult to reach via public transportation.

To avoid the tedious task of manually checking each practitioner's page, copying their address, going to the public transport website, simulating a route from my home, and retrieving the total travel time and walking time, I decided to create this project.

The first step is to collect essential information from Doctolib, that is what this program does.

Next, the Google Maps API is used to simulate a route from my home to the practitioner's location, allowing me to calculate the travel time.

The final step is to format the gathered and computed data in a Google Sheet using its API.

I haven’t included the last two parts because they rely on GCP APIs, which would require explaining how to set them up. I’d also need to refactor my code to better integrate someone else’s credentials. Given the complexity of using these APIs, I believe very few people (if any) would be interested in this functionality.

However, if you're interested in this additional feature, feel free to reach out to me on Reddit (you’ll find my profile link on my GitHub profile).

Installation

If you don't have python on your computer, install it.

Clone the project

  git clone https://github.com/gagota/doctolib-scrapper.git

Go to the project directory

  cd doctolib-scrapper

Create and activate a virtual environment (Optional, recommended)

  python -m venv doctolib_env

  . .\doctolib_env\Scripts\activate

Install dependencies

  pip install -r requirements.txt

Usage

First, you need to make your search on doctolib.fr.

Then, when on the search page, copy the url of the page.

Open '_main.py' (in the 'src' folder) with your prefered IDE (like VS Code).

Paste the url in the doctolib_search_url variable.

Enter a search name in the search_name variable. It will be the name of the folder containing the gathered data.

You can tweak the parameters as you like in the parameters object.

Play the script, and watch the magic happen before your eyes ;)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
screenshots		screenshots
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_fr.md		README_fr.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doctolib Scraper

Warning

Motivation

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Doctolib Scraper

Warning

Motivation

Installation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages