Skip to content

Aremaki/edstuto_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eds-tutorial

About

In this tutorial we introduce some issues related to the analysis of real world data that are made available for research in clinical data warehouses. It is targeted towards data scientists that master the basics of Python programming and data analysis. The tutorial is decomposed in a series of small exercises and a final project. Whereas small exercises illustrate specific issues, the final project mimics an end-to-end research study that may be reported in a scientific article.

Data is fake, and this project can consequently be freely shared without impacting patients’ privacy. A fake data generator is made available and can be tuned to illustrate various use cases. Its development has been freely inspired by the characteristics and issues observed while analyzing data of the Greater Paris University Hospitals.

Getting started

Environment and kernel creation

Python, JupyterLab and an environment manager are recommended. You may choose for instance Anaconda.

We also recommend using Visual Studio Code.

Please follow theses instructions:

  1. Open a terminal
  2. Go to your local repository for the 2025_EI project
  3. Clone the project locally : git clone {URL}
  4. Using the terminal, access the cloned file cd edstuto
  5. Install the required packages with uv:
  • pip install uv==0.7.8
  • uv venv --python 3.11.9
  • source .venv/bin/activate
  • uv sync

NB: For VS Code users, in order to see clearly the plots, it is recommended to enable the Theme Matplotlib Plots in your setting > Extensions > Jupyter.

Scientific libraries installation

The following scientific libraries developed in the context of Paris’ clinical data warehouse may moreover be leveraged to facilitate the resolution of some exercises:

  • eds-scikit: a set of tools to assist data scientists working on a clinical data warehouse (structured data).
  • edsnlp: a set of spaCy components that are used to extract information from clinical notes written in French (unstructured data).

Acknowledgement

We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published