Medical Knowledge Harmonization: A Graph-based, Entity-Selective Approach to Multi-source Diagnoses

Description

This project is engineered to formulate an integrated knowledge graph by synthesizing diagnostic data from multiple healthcare centers, thereby providing a comprehensive view of an individual's health trajectory, with a particular emphasis on entities related to Genes, Diseases, Chemicals, Species, Variants, and Cell Types (DNA or RNA), which are notably significant in the context of rare and/or chronic diseases. Leveraging Named Entity Recognition (NER), Entity Normalization, and Relationship Extraction (RE) techniques on raw medical texts, individual knowledge graphs are created and subsequently merged into a unified graph. This exhaustive visualization supports healthcare professionals in making well-informed decisions, ensuring that no detail, especially those pivotal to understanding and managing genetic information and rare diseases, is neglected from any diagnostic source.

Installation

Prerequisites

Python (>= 3.x)
Conda

Setup

Clone the repository:

git clone https://github.com/anbianchi/knowledge_frombio
cd knowledge_frombio

Create a Conda environment:
```
conda env create -f environment.yml
```
Activate the Conda environment:

conda activate [Your Environment Name]

Usage

You can utilize the tool in two primary ways: by processing the dataset used in the experiments or manually inserting and processing diagnostic reports. Below are the detailed steps for both approaches:

1. Processing the Experiment Dataset

To process the dataset utilized in the experiments, use the following command:

python main.py --dataset "dataset.csv"

Replace "dataset.csv" with your dataset filename. The script processes the dataset and generates knowledge graphs accordingly.

2. Manually Inserting and Processing Diagnostic Reports

If you prefer to manually input diagnostic reports, place your report files within the diagnostic_reports folder. Ensure that all reports within the folder are related to the same patient to maintain consistency and accuracy in the generated knowledge graph.

python main.py --manual

This command instructs the tool to process the reports present within the diagnostic_reports folder.

Dataset Information

The experiments utilize the MIMIC-IV-Note: Deidentified free-text clinical notes dataset, a freely accessible critical care database that holds de-identified health-related data associated with over one thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2008 and 2019.

Key Features:

De-identification: Adheres to stringent data security and privacy protocols, ensuring that all patient records are thoroughly de-identified, maintaining the privacy and anonymity of the individuals involved.
Accessibility: The dataset is publicly available to researchers across the world, fostering a collaborative and open research environment.

Usage in this Project:

In the context of this project, specifically the "discharge.csv" file, in notes folder is used to extract and analyze diagnostic texts. The raw text data from patient reports is processed through our system to generate individual and merged knowledge graphs, which then serve to offer a panoramic view of a patient's medical history and interactions.

Accessing the Dataset:

To access and use the MIMIC-IV dataset for replicating our experiments or for your research, please follow the steps below:

Requesting Access: Visit the MIMIC website and follow their guidelines for requesting access to the dataset.
Downloading the Data: Once approved, download the dataset, specifically the "discharge.csv" file found in the notes folder.
Data Processing: Use the script generate.py from our repository to preprocess the data, converting the notes into a format suitable for our system.

For comprehensive details about the dataset and how to use it, kindly refer to the official documentation.

Note: Even though the dataset is publicly available, we strictly adhere to the usage guidelines provided by MIMIC-IV, ensuring ethical use of the data in our research.

Code Structure

demo_example/: Folder containing a subset of results.
modules/: Folder containing the main script and utility functions.
merged_outputs/ and temp_outputs/: Folders where the output graphs and results will be saved.
requirements.txt: File listing all necessary Python packages.
main_script.py: Main script to run the program.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
demo_example		demo_example
lib		lib
modules		modules
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
firstfiveunique_processed_patients.csv		firstfiveunique_processed_patients.csv
generate.py		generate.py
main.py		main.py
require.js		require.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Knowledge Harmonization: A Graph-based, Entity-Selective Approach to Multi-source Diagnoses

Description

Installation

Prerequisites

Setup

Usage

1. Processing the Experiment Dataset

2. Manually Inserting and Processing Diagnostic Reports

Dataset Information

Key Features:

Usage in this Project:

Accessing the Dataset:

Code Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

anbianchi/knowledge_frombio

Folders and files

Latest commit

History

Repository files navigation

Medical Knowledge Harmonization: A Graph-based, Entity-Selective Approach to Multi-source Diagnoses

Description

Installation

Prerequisites

Setup

Usage

1. Processing the Experiment Dataset

2. Manually Inserting and Processing Diagnostic Reports

Dataset Information

Key Features:

Usage in this Project:

Accessing the Dataset:

Code Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages