Skip to content

Integrating DBRetina with an LLM-Powered Chatbot for Genomic Data Exploration #250

@MoHelmy

Description

@MoHelmy

Introduction to DBRetina

DBRetina is a high-performance bioinformatics tool with an efficient linear algorithm for calculating the pairwise distance among large collections of gene sets. This algorithm enables easy construction of a comprehensive pairwise molecular similarity network within and across several molecular databases. To enable efficient search and visualization of this huge similarity network, DBRetina can transform the final output into a format compatible with the Neo4j graph databases.

Challenge:

While DBRetina bridges genomic analytics and graph databases, querying Neo4j requires Cypher query language expertise, limiting accessibility for non-technical researchers.

Goal and Aims

To develop an LLM-Driven chatbot that translates natural language questions into Cypher queries, enabling intuitive interaction with DBRetina-generated Neo4j graphs.
This chatbot aims to:
Increase accessibility: Enables non-technical users to query complex genomic networks.
Improve efficiency: Reduces query-writing time by ~70% (based on LLM benchmarks ).
Scalability: Adapts to evolving graph schemas and supports multi-database integration.

Difficulty Level: Medium/Hard

Size and Length of Project

  • medium: 175 hours
  • 12 -16 weeks

Skills

Essential skills: LLM fine tuning, Experience with Graph databases, HTML, CSS, JS
Nice to have skills: C++

Public Repository

DBRetina Documentation
Neo4j Cypher Manual
LLM Fine-Tuning for KBQA

Potential Mentors

Mohamed Helmy
Tamer Mansour

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions