-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Introduction to DBRetina
DBRetina is a high-performance bioinformatics tool with an efficient linear algorithm for calculating the pairwise distance among large collections of gene sets. This algorithm enables easy construction of a comprehensive pairwise molecular similarity network within and across several molecular databases. To enable efficient search and visualization of this huge similarity network, DBRetina can transform the final output into a format compatible with the Neo4j graph databases.
Challenge:
While DBRetina bridges genomic analytics and graph databases, querying Neo4j requires Cypher query language expertise, limiting accessibility for non-technical researchers.
Goal and Aims
To develop an LLM-Driven chatbot that translates natural language questions into Cypher queries, enabling intuitive interaction with DBRetina-generated Neo4j graphs.
This chatbot aims to:
Increase accessibility: Enables non-technical users to query complex genomic networks.
Improve efficiency: Reduces query-writing time by ~70% (based on LLM benchmarks ).
Scalability: Adapts to evolving graph schemas and supports multi-database integration.
Difficulty Level: Medium/Hard
Size and Length of Project
- medium: 175 hours
- 12 -16 weeks
Skills
Essential skills: LLM fine tuning, Experience with Graph databases, HTML, CSS, JS
Nice to have skills: C++
Public Repository
DBRetina Documentation
Neo4j Cypher Manual
LLM Fine-Tuning for KBQA