55[ ![ Docker] ( https://img.shields.io/badge/Docker-20.0+-blue.svg )] ( https://docker.com )
66[ ![ License] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( LICENSE )
77
8- A comprehensive AI-powered backend system for analyzing and Identifying scientific papers that contain curatable microbiome Signatures (curation readiness assessment.)
8+ Backend system for analyzing scientific papers to identify curatable microbiome signatures. Extracts essential BugSigDB fields and retrieves full text from PubMed/PMC.
99
10- > ** ✅ Tested Setup ** : This project has been successfully built and tested on Ubuntu Linux with Docker. See [ SETUP_GUIDE.md] ( SETUP_GUIDE.md ) for verified setup steps.
10+ Tested on Ubuntu Linux with Docker. See [ SETUP_GUIDE.md] ( SETUP_GUIDE.md ) for setup steps.
1111
12- ## 🧬 Overview
12+ ## Overview
1313
14- BioAnalyzer Backend is a specialized system that combines advanced AI analysis with comprehensive PubMed data retrieval to evaluate scientific papers for BugSigDB curation readiness. The system extracts 6 essential fields required for microbial signature curation and provides full text retrieval capabilities .
14+ BioAnalyzer extracts 6 essential fields from papers for BugSigDB curation. Uses AI analysis with PubMed data retrieval to evaluate papers.
1515
16- ### Key Capabilities
16+ ### Features
1717
18- - ** 🔬 Paper Analysis ** : Extract 6 essential BugSigDB fields using AI
19- - ** 🤖 Multi-Provider LLM Support ** : LiteLLM integration for OpenAI, Anthropic, Gemini, Ollama, and Llamafile
20- - ** 🧠 Advanced RAG** : Contextual summarization and chunk re-ranking for improved accuracy
21- - ** 📥 Full Text Retrieval ** : Comprehensive PubMed and PMC data retrieval
22- - ** 🌐 REST API** : Versioned API endpoints (v1 and v2) with RAG support
23- - ** 💻 CLI Tool ** : User-friendly command- line interface
24- - ** 📊 Multiple Formats ** : JSON, CSV, XML and table output formats
25- - ** ⚡ Batch Processing ** : Analyze multiple papers simultaneously
26- - ** 🔧 Docker Support ** : Containerized deployment
27- - ** 📈 Monitoring** : Health checks and performance metrics
18+ - Paper analysis : Extract 6 BugSigDB fields using AI
19+ - Multi-provider LLM support: Works with OpenAI, Anthropic, Gemini, Ollama, and Llamafile via LiteLLM
20+ - RAG support : Contextual summarization and chunk re-ranking for better accuracy
21+ - Full text retrieval: Gets metadata and full text from PubMed/PMC
22+ - REST API: Versioned endpoints (v1 and v2) with RAG support
23+ - CLI tool: Command- line interface for analysis
24+ - Multiple output formats : JSON, CSV, XML, and table formats
25+ - Batch processing : Analyze multiple papers at once
26+ - Docker support : Containerized deployment
27+ - Monitoring: Health checks and performance metrics
2828
29- ## 🏗️ Architecture
29+ ## Architecture
3030
3131### System Components
3232
@@ -133,7 +133,7 @@ The system supports multiple LLM providers through LiteLLM:
133133
134134Auto-detection: If ` LLM_PROVIDER ` is not set, the system auto-detects from available API keys.
135135
136- ## 🚀 Quick Start
136+ ## Quick Start
137137
138138### Prerequisites
139139
@@ -148,37 +148,26 @@ Auto-detection: If `LLM_PROVIDER` is not set, the system auto-detects from avail
148148
149149### Installation & Setup
150150
151- #### ✅ ** Method 1: Docker Installation (Recommended & Tested) **
151+ #### Docker Installation (Recommended)
152152
153- This is the ** recommended approach ** as it avoids Python environment conflicts and provides a clean, isolated setup.
153+ Docker avoids Python environment conflicts and provides a clean setup.
154154
155155``` bash
156- # 1. Navigate to the project directory
157156cd /path/to/bioanalyzer-backend
158157
159- # 2. Install CLI commands system-wide
160158chmod +x install.sh
161159./install.sh
162160
163- # 3. Build Docker image
164161docker compose build
165-
166- # 4. Start the application
167162docker compose up -d
168163
169- # 5. Verify installation
170164docker compose ps
171165curl http://localhost:8000/health
172166```
173167
174- ** Expected Output:**
175- ``` json
176- {"status" :" healthy" ,"timestamp" :" 2025-10-23T17:52:40.249451+00:00" ,"version" :" 1.0.0" }
177- ```
178-
179- #### ** Method 2: Local Python Installation**
168+ #### Local Python Installation
180169
181- ⚠️ ** Note** : This method may encounter issues with externally managed Python environments on modern Linux distributions.
170+ Note: This may encounter issues with externally managed Python environments on modern Linux distributions.
182171
183172``` bash
184173# Clone and setup
@@ -198,27 +187,22 @@ cp .env.example .env
198187# Edit .env with your API keys
199188```
200189
201- ### 🧪 ** Verification Steps **
190+ ### Verification
202191
203- After installation, verify the system is working :
192+ After installation, verify everything works :
204193
205194``` bash
206- # 1. Check Docker container status
207195docker compose ps
208-
209- # 2. Test API health
210196curl http://localhost:8000/health
211197
212- # 3. Test CLI commands (add to PATH first)
213198export PATH=" $PATH :/home/ronald/.local/bin"
214199BioAnalyzer fields
215200BioAnalyzer status
216-
217- # 4. View API documentation
218- # Open browser: http://localhost:8000/docs
219201```
220202
221- ## 📖 Usage
203+ Open http://localhost:8000/docs for API documentation.
204+
205+ ## Usage
222206
223207### CLI Commands
224208
@@ -303,12 +287,12 @@ GET /api/v1/config # Configuration info
303287
304288### Web Interface
305289
306- Once started, access :
307- - ** Main Interface** : http://localhost:3000
308- - ** API Documentation** : http://localhost:8000/docs
309- - ** Health Check** : http://localhost:8000/health
290+ Once started:
291+ - Main Interface: http://localhost:3000
292+ - API Documentation: http://localhost:8000/docs
293+ - Health Check: http://localhost:8000/health
310294
311- ## 🔧 Configuration
295+ ## Configuration
312296
313297### Environment Variables
314298
@@ -379,24 +363,24 @@ export RAG_TOP_K_CHUNKS="10"
379363- ` app/utils/config.py ` : Application configuration
380364- ` docker-compose.yml ` : Docker services configuration
381365
382- ## 📊 The 6 Essential BugSigDB Fields
366+ ## The 6 Essential BugSigDB Fields
383367
384- The system analyzes papers for these critical fields:
368+ The system analyzes papers for these fields:
385369
386- 1 . ** 🧬 Host Species** : The organism being studied (Human, Mouse, Rat, etc.)
387- 2 . ** 📍 Body Site** : Sample collection location (Gut, Oral, Skin, etc.)
388- 3 . ** 🏥 Condition** : Disease/treatment/exposure being studied
389- 4 . ** 🔬 Sequencing Type** : Molecular method used (16S, metagenomics, etc.)
390- 5 . ** 🌳 Taxa Level** : Taxonomic level analyzed (phylum, genus, species, etc.)
391- 6 . ** 👥 Sample Size** : Number of samples or participants
370+ 1 . Host Species: The organism being studied (Human, Mouse, Rat, etc.)
371+ 2 . Body Site: Sample collection location (Gut, Oral, Skin, etc.)
372+ 3 . Condition: Disease/treatment/exposure being studied
373+ 4 . Sequencing Type: Molecular method used (16S, metagenomics, etc.)
374+ 5 . Taxa Level: Taxonomic level analyzed (phylum, genus, species, etc.)
375+ 6 . Sample Size: Number of samples or participants
392376
393377### Field Status Values
394378
395- - ** ✅ PRESENT** : Information about the microbiom signtaure is complete and clear
396- - ** ⚠️ PARTIALLY_PRESENT** : Some information available but incomplete
397- - ** ❌ ABSENT** : Information is missing
379+ - PRESENT: Information about the microbiome signature is complete and clear
380+ - PARTIALLY_PRESENT: Some information available but incomplete
381+ - ABSENT: Information is missing
398382
399- ## 🏛️ Architecture Details
383+ ## Architecture Details
400384
401385### Service Layer Architecture
402386
@@ -518,7 +502,7 @@ Aggregate Results + RAG Stats → Cache → JSON/CSV/Table Output
5185023 . ** Parsing Errors** : Error reporting with context
5195034 . ** Missing Data** : Clear indication of unavailable information
520504
521- ## 🔍 API Examples
505+ ## API Examples
522506
523507### v1 API - Simple Analysis
524508``` bash
@@ -612,7 +596,7 @@ curl -X POST "http://localhost:8000/api/v1/retrieve/batch" \
612596}
613597```
614598
615- ## 🧪 Testing
599+ ## Testing
616600
617601### Run Tests
618602``` bash
@@ -635,7 +619,7 @@ docker exec -it bioanalyzer-api pytest
635619- CLI command testing
636620- Error handling validation
637621
638- ## 📁 Project Structure
622+ ## Project Structure
639623
640624```
641625bioanalyzer-backend/
@@ -686,7 +670,7 @@ bioanalyzer-backend/
686670└── README.md # This file
687671```
688672
689- ## 🚀 Deployment
673+ ## Deployment
690674
691675### Docker Deployment
692676
@@ -732,7 +716,7 @@ python cli.py analyze 12345678
732716python cli.py retrieve 12345678 --save
733717```
734718
735- ## 📈 Performance
719+ ## Performance
736720
737721### Optimization Features
738722
@@ -753,7 +737,7 @@ python cli.py retrieve 12345678 --save
753737- ** Memory Usage** : ~ 100-200MB base + 50MB per concurrent request
754738- ** Cache Hit Rate** : ~ 60-80% (for frequently analyzed papers)
755739
756- ## 🔧 Development
740+ ## Development
757741
758742### Setting Up Development Environment
759743
@@ -797,11 +781,11 @@ pytest
7977813 . ** CLI Commands** : Extend ` cli.py ` with new commands
7987824 . ** Models** : Add Pydantic models in ` app/api/models/ `
799783
800- ## 🐛 Troubleshooting
784+ ## Troubleshooting
801785
802- ### Common Issues & Solutions
786+ ### Common Issues
803787
804- #### ** Python Environment Issues**
788+ #### Python Environment Issues
805789``` bash
806790# Error: externally-managed-environment
807791# Solution: Use Docker (recommended) or install python3-venv
@@ -810,23 +794,23 @@ python3 -m venv .venv
810794source .venv/bin/activate
811795```
812796
813- #### ** Docker Compose Issues**
797+ #### Docker Compose Issues
814798``` bash
815799# Error: docker-compose command not found
816800# Solution: Use newer Docker Compose syntax
817801docker compose build # Instead of docker-compose build
818802docker compose up -d # Instead of docker-compose up -d
819803```
820804
821- #### ** CLI Command Not Found**
805+ #### CLI Command Not Found
822806``` bash
823807# Error: BioAnalyzer command not found
824808# Solution: Add to PATH
825809export PATH=" $PATH :/home/<copmuter_name>/.local/bin"
826810# Or restart terminal after running ./install.sh
827811```
828812
829- #### ** API Not Responding**
813+ #### API Not Responding
830814``` bash
831815# Check container status
832816docker compose ps
@@ -838,7 +822,7 @@ docker compose logs
838822docker compose restart
839823```
840824
841- #### ** Missing API Keys**
825+ #### Missing API Keys
842826``` bash
843827# Warning: GeminiQA not initialized
844828# This is normal - system works without API keys
@@ -855,17 +839,17 @@ export LOG_LEVEL=DEBUG
855839python main.py
856840```
857841
858- ## 📚 Documentation
842+ ## Documentation
859843
860- - ** 🚀 Quick Start ** : [ QUICKSTART.md] ( docs/QUICKSTART.md ) - Get running in 5 minutes
861- - ** 📖 Complete Setup Guide ** : [ SETUP_GUIDE.md] ( SETUP_GUIDE.md ) - Detailed setup steps (tested & verified)
862- - ** 🏗️ Architecture Guide ** : [ ARCHITECTURE.md] ( docs/ARCHITECTURE.md ) - System architecture and design
863- - ** 🧠 RAG Guide ** : [ RAG_GUIDE.md] ( docs/RAG_GUIDE.md ) - ** NEW! ** Comprehensive RAG features documentation
864- - ** ⚙️ Settings Guide ** : [ SETTINGS.md] ( docs/SETTINGS.md ) - Configuration system documentation
865- - ** 🐳 Docker Guide ** : [ DOCKER_DEPLOYMENT.md] ( docs/DOCKER_DEPLOYMENT.md ) - Docker deployment guide
866- - ** 🔧 API Documentation** : http://localhost:8000/docs (when running) - Interactive API documentation
844+ - [ QUICKSTART.md] ( docs/QUICKSTART.md ) - Get running in 5 minutes
845+ - [ SETUP_GUIDE.md] ( SETUP_GUIDE.md ) - Detailed setup steps
846+ - [ ARCHITECTURE.md] ( docs/ARCHITECTURE.md ) - System architecture
847+ - [ RAG_GUIDE.md] ( docs/RAG_GUIDE.md ) - RAG features documentation
848+ - [ SETTINGS.md] ( docs/SETTINGS.md ) - Configuration system
849+ - [ DOCKER_DEPLOYMENT.md] ( docs/DOCKER_DEPLOYMENT.md ) - Docker deployment
850+ - API Documentation: http://localhost:8000/docs (when running)
867851
868- ## 🤝 Contributing
852+ ## Contributing
869853
8708541 . Fork the repository
8718552 . Create a feature branch (` git checkout -b feature/amazing-feature ` )
@@ -881,11 +865,11 @@ python main.py
881865- Use type hints for all functions
882866- Write comprehensive docstrings
883867
884- ## 📄 License
868+ ## License
885869
886870This project is licensed under the MIT License - see the [ LICENSE] ( LICENSE ) file for details.
887871
888- ## 🙏 Acknowledgments
872+ ## Acknowledgments
889873
890874- ** BugSigDB Team** : For the microbial signatures database
891875- ** NCBI** : For PubMed data access and E-utilities API
@@ -894,12 +878,12 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
894878- ** FastAPI** : For the excellent web framework
895879- ** Docker** : For containerization technology
896880
897- ## 📞 Support
881+ ## Support
898882
899883- ** Issues** : [ GitHub Issues] ( https://github.com/waldronlab/bioanalyzer-backend/issues )
900884- ** Discussions** : [ GitHub Discussions] ( https://github.com/waldronlab/bioanalyzer-backend/discussions )
901885- ** Documentation** : [ Project Wiki] ( https://github.com/waldronlab/bioanalyzer-backend/wiki )
902886
903887---
904888
905- ** Happy analyzing! 🧬🔬 **
889+ Happy analyzing!
0 commit comments