biocypher/biochatter

Milestones

Bioconductor support chatbot
We aim to build a helpful chat solution for users of the Bioconductor ecosystem using the technologies around current Large Language Models. To be able to help with questions regarding the Bioconductor ecosystem, the LLM requires a good knowledge of Bioconductor packages, structure, and coding etiquette; something which cannot be assumed to be present in any LLM. To counter this problem, we aim to implement a custom solution with concrete knowledge from the Bioconductor ecosystem, including a Retrieval-Augmented Generation (RAG) process that can inject context into the LLM prompt for answering specific questions. To achieve this, we will use and extend the [BioChatter](https://github.com/biocypher/biochatter) library, an open source library for the biomedical application of LLMs. The desired outcome is a chatbot instance that can inform Bioconductor users about the ecosystems (which packages to use for which purpose, where to get more info), usage of specific libraries (including their vignettes and idiomatic programming style), and troubleshooting of common problems (by integrating the Bioconductor support forum). In the process of managing the complex multi-layered knowledge required for this assistant, we will use the [BioCypher](https://github.com/biocypher/biocypher) library, which is designed to facilitate knowledge management, and which natively interacts with BioChatter. Building specific knowledge graphs for the layers of information in the Bioconductor ecosystem (from the meta-information about the ecosystem down until the occurrence of errors in a specific package) will allow the RAG mechanism to give context-specific answers to the users’ questions.
No due date
•0/7 issues closed
0% complete7 open 0 closed
Podcasting feature
Issues related to the podcasting feature. Low priority since google have been addressing this, specifically.
No due date
•0/3 issues closed
0% complete3 open 0 closed
Improve user-friendliness
Includes all issues that are concerned with making the user experience more streamlined
No due date
•5/14 issues closed
35% complete9 open 5 closed
Improve Benchmarking
The BioChatter benchmark should be continuously expanded and made more robust to keep up with developments in the field and allow BioChatter use in more contexts
No due date
•15/25 issues closed
60% complete10 open 15 closed
BioHackathon2
Issues to be addressed in the de.NBI biohackathon2
No due date
•3/6 issues closed
50% complete3 open 3 closed
Semi-automated curation
LLMs can assist in curating large amounts of data, which until now has been a task reserved for human domain experts. Curation efforts can include: - cell type annotation based on free text, marker genes, foundation model embeddings, or other features - curation of interactions (e.g. metabolites) based on a combination of database and literature information
No due date
•0/3 issues closed
0% complete3 open 0 closed
Improve vector database handling
Vector database handling is rudimentary at the moment, to the extent that for each run of the class, a new vector DB instance is created. There is no way of managing the vector DB from biochatter, and the only vector DB vendor available is Milvus. The Milvus DB needs to be available locally through the official docker container. We should add a more sophisticated vector DB handling system that allows the user to connect to an existing instance locally or in the cloud, and that can manage this DB, i.e., inform about contents, wipe the database, access collections, etc. We should be able to add metadata to DB entries, which can carry information such as which document or other source the embedded information comes from. We can also add additional DB vendors, such as Pinecone etc.
No due date
•3/4 issues closed
75% complete1 open 3 closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milestones

Bioconductor support chatbot

Podcasting feature

Improve user-friendliness

Improve Benchmarking

BioHackathon2

Semi-automated curation

Improve vector database handling

Milestones

List view

Bioconductor support chatbot

Podcasting feature

Improve user-friendliness

Improve Benchmarking

BioHackathon2

Semi-automated curation

Improve vector database handling