Skip to content

Commit e7a3fd6

Browse files
authored
Merge pull request #41 from waldronlab/refactor/code-quality-cleanup
refactor: improve code quality and remove AI-generated patterns
2 parents eb962f0 + b4577b1 commit e7a3fd6

38 files changed

+416
-1504
lines changed

README.md

Lines changed: 69 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -5,28 +5,28 @@
55
[![Docker](https://img.shields.io/badge/Docker-20.0+-blue.svg)](https://docker.com)
66
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
77

8-
A comprehensive AI-powered backend system for analyzing and Identifying scientific papers that contain curatable microbiome Signatures (curation readiness assessment.)
8+
Backend system for analyzing scientific papers to identify curatable microbiome signatures. Extracts essential BugSigDB fields and retrieves full text from PubMed/PMC.
99

10-
> **Tested Setup**: This project has been successfully built and tested on Ubuntu Linux with Docker. See [SETUP_GUIDE.md](SETUP_GUIDE.md) for verified setup steps.
10+
Tested on Ubuntu Linux with Docker. See [SETUP_GUIDE.md](SETUP_GUIDE.md) for setup steps.
1111

12-
## 🧬 Overview
12+
## Overview
1313

14-
BioAnalyzer Backend is a specialized system that combines advanced AI analysis with comprehensive PubMed data retrieval to evaluate scientific papers for BugSigDB curation readiness. The system extracts 6 essential fields required for microbial signature curation and provides full text retrieval capabilities.
14+
BioAnalyzer extracts 6 essential fields from papers for BugSigDB curation. Uses AI analysis with PubMed data retrieval to evaluate papers.
1515

16-
### Key Capabilities
16+
### Features
1717

18-
- **🔬 Paper Analysis**: Extract 6 essential BugSigDB fields using AI
19-
- **🤖 Multi-Provider LLM Support**: LiteLLM integration for OpenAI, Anthropic, Gemini, Ollama, and Llamafile
20-
- **🧠 Advanced RAG**: Contextual summarization and chunk re-ranking for improved accuracy
21-
- **📥 Full Text Retrieval**: Comprehensive PubMed and PMC data retrieval
22-
- **🌐 REST API**: Versioned API endpoints (v1 and v2) with RAG support
23-
- **💻 CLI Tool**: User-friendly command-line interface
24-
- **📊 Multiple Formats**: JSON, CSV, XML and table output formats
25-
- **Batch Processing**: Analyze multiple papers simultaneously
26-
- **🔧 Docker Support**: Containerized deployment
27-
- **📈 Monitoring**: Health checks and performance metrics
18+
- Paper analysis: Extract 6 BugSigDB fields using AI
19+
- Multi-provider LLM support: Works with OpenAI, Anthropic, Gemini, Ollama, and Llamafile via LiteLLM
20+
- RAG support: Contextual summarization and chunk re-ranking for better accuracy
21+
- Full text retrieval: Gets metadata and full text from PubMed/PMC
22+
- REST API: Versioned endpoints (v1 and v2) with RAG support
23+
- CLI tool: Command-line interface for analysis
24+
- Multiple output formats: JSON, CSV, XML, and table formats
25+
- Batch processing: Analyze multiple papers at once
26+
- Docker support: Containerized deployment
27+
- Monitoring: Health checks and performance metrics
2828

29-
## 🏗️ Architecture
29+
## Architecture
3030

3131
### System Components
3232

@@ -133,7 +133,7 @@ The system supports multiple LLM providers through LiteLLM:
133133

134134
Auto-detection: If `LLM_PROVIDER` is not set, the system auto-detects from available API keys.
135135

136-
## 🚀 Quick Start
136+
## Quick Start
137137

138138
### Prerequisites
139139

@@ -148,37 +148,26 @@ Auto-detection: If `LLM_PROVIDER` is not set, the system auto-detects from avail
148148

149149
### Installation & Setup
150150

151-
#### **Method 1: Docker Installation (Recommended & Tested)**
151+
#### Docker Installation (Recommended)
152152

153-
This is the **recommended approach** as it avoids Python environment conflicts and provides a clean, isolated setup.
153+
Docker avoids Python environment conflicts and provides a clean setup.
154154

155155
```bash
156-
# 1. Navigate to the project directory
157156
cd /path/to/bioanalyzer-backend
158157

159-
# 2. Install CLI commands system-wide
160158
chmod +x install.sh
161159
./install.sh
162160

163-
# 3. Build Docker image
164161
docker compose build
165-
166-
# 4. Start the application
167162
docker compose up -d
168163

169-
# 5. Verify installation
170164
docker compose ps
171165
curl http://localhost:8000/health
172166
```
173167

174-
**Expected Output:**
175-
```json
176-
{"status":"healthy","timestamp":"2025-10-23T17:52:40.249451+00:00","version":"1.0.0"}
177-
```
178-
179-
#### **Method 2: Local Python Installation**
168+
#### Local Python Installation
180169

181-
⚠️ **Note**: This method may encounter issues with externally managed Python environments on modern Linux distributions.
170+
Note: This may encounter issues with externally managed Python environments on modern Linux distributions.
182171

183172
```bash
184173
# Clone and setup
@@ -198,27 +187,22 @@ cp .env.example .env
198187
# Edit .env with your API keys
199188
```
200189

201-
### 🧪 **Verification Steps**
190+
### Verification
202191

203-
After installation, verify the system is working:
192+
After installation, verify everything works:
204193

205194
```bash
206-
# 1. Check Docker container status
207195
docker compose ps
208-
209-
# 2. Test API health
210196
curl http://localhost:8000/health
211197

212-
# 3. Test CLI commands (add to PATH first)
213198
export PATH="$PATH:/home/ronald/.local/bin"
214199
BioAnalyzer fields
215200
BioAnalyzer status
216-
217-
# 4. View API documentation
218-
# Open browser: http://localhost:8000/docs
219201
```
220202

221-
## 📖 Usage
203+
Open http://localhost:8000/docs for API documentation.
204+
205+
## Usage
222206

223207
### CLI Commands
224208

@@ -303,12 +287,12 @@ GET /api/v1/config # Configuration info
303287

304288
### Web Interface
305289

306-
Once started, access:
307-
- **Main Interface**: http://localhost:3000
308-
- **API Documentation**: http://localhost:8000/docs
309-
- **Health Check**: http://localhost:8000/health
290+
Once started:
291+
- Main Interface: http://localhost:3000
292+
- API Documentation: http://localhost:8000/docs
293+
- Health Check: http://localhost:8000/health
310294

311-
## 🔧 Configuration
295+
## Configuration
312296

313297
### Environment Variables
314298

@@ -379,24 +363,24 @@ export RAG_TOP_K_CHUNKS="10"
379363
- `app/utils/config.py`: Application configuration
380364
- `docker-compose.yml`: Docker services configuration
381365

382-
## 📊 The 6 Essential BugSigDB Fields
366+
## The 6 Essential BugSigDB Fields
383367

384-
The system analyzes papers for these critical fields:
368+
The system analyzes papers for these fields:
385369

386-
1. **🧬 Host Species**: The organism being studied (Human, Mouse, Rat, etc.)
387-
2. **📍 Body Site**: Sample collection location (Gut, Oral, Skin, etc.)
388-
3. **🏥 Condition**: Disease/treatment/exposure being studied
389-
4. **🔬 Sequencing Type**: Molecular method used (16S, metagenomics, etc.)
390-
5. **🌳 Taxa Level**: Taxonomic level analyzed (phylum, genus, species, etc.)
391-
6. **👥 Sample Size**: Number of samples or participants
370+
1. Host Species: The organism being studied (Human, Mouse, Rat, etc.)
371+
2. Body Site: Sample collection location (Gut, Oral, Skin, etc.)
372+
3. Condition: Disease/treatment/exposure being studied
373+
4. Sequencing Type: Molecular method used (16S, metagenomics, etc.)
374+
5. Taxa Level: Taxonomic level analyzed (phylum, genus, species, etc.)
375+
6. Sample Size: Number of samples or participants
392376

393377
### Field Status Values
394378

395-
- **PRESENT**: Information about the microbiom signtaure is complete and clear
396-
- **⚠️ PARTIALLY_PRESENT**: Some information available but incomplete
397-
- **ABSENT**: Information is missing
379+
- PRESENT: Information about the microbiome signature is complete and clear
380+
- PARTIALLY_PRESENT: Some information available but incomplete
381+
- ABSENT: Information is missing
398382

399-
## 🏛️ Architecture Details
383+
## Architecture Details
400384

401385
### Service Layer Architecture
402386

@@ -518,7 +502,7 @@ Aggregate Results + RAG Stats → Cache → JSON/CSV/Table Output
518502
3. **Parsing Errors**: Error reporting with context
519503
4. **Missing Data**: Clear indication of unavailable information
520504

521-
## 🔍 API Examples
505+
## API Examples
522506

523507
### v1 API - Simple Analysis
524508
```bash
@@ -612,7 +596,7 @@ curl -X POST "http://localhost:8000/api/v1/retrieve/batch" \
612596
}
613597
```
614598

615-
## 🧪 Testing
599+
## Testing
616600

617601
### Run Tests
618602
```bash
@@ -635,7 +619,7 @@ docker exec -it bioanalyzer-api pytest
635619
- CLI command testing
636620
- Error handling validation
637621

638-
## 📁 Project Structure
622+
## Project Structure
639623

640624
```
641625
bioanalyzer-backend/
@@ -686,7 +670,7 @@ bioanalyzer-backend/
686670
└── README.md # This file
687671
```
688672

689-
## 🚀 Deployment
673+
## Deployment
690674

691675
### Docker Deployment
692676

@@ -732,7 +716,7 @@ python cli.py analyze 12345678
732716
python cli.py retrieve 12345678 --save
733717
```
734718

735-
## 📈 Performance
719+
## Performance
736720

737721
### Optimization Features
738722

@@ -753,7 +737,7 @@ python cli.py retrieve 12345678 --save
753737
- **Memory Usage**: ~100-200MB base + 50MB per concurrent request
754738
- **Cache Hit Rate**: ~60-80% (for frequently analyzed papers)
755739

756-
## 🔧 Development
740+
## Development
757741

758742
### Setting Up Development Environment
759743

@@ -797,11 +781,11 @@ pytest
797781
3. **CLI Commands**: Extend `cli.py` with new commands
798782
4. **Models**: Add Pydantic models in `app/api/models/`
799783

800-
## 🐛 Troubleshooting
784+
## Troubleshooting
801785

802-
### Common Issues & Solutions
786+
### Common Issues
803787

804-
#### **Python Environment Issues**
788+
#### Python Environment Issues
805789
```bash
806790
# Error: externally-managed-environment
807791
# Solution: Use Docker (recommended) or install python3-venv
@@ -810,23 +794,23 @@ python3 -m venv .venv
810794
source .venv/bin/activate
811795
```
812796

813-
#### **Docker Compose Issues**
797+
#### Docker Compose Issues
814798
```bash
815799
# Error: docker-compose command not found
816800
# Solution: Use newer Docker Compose syntax
817801
docker compose build # Instead of docker-compose build
818802
docker compose up -d # Instead of docker-compose up -d
819803
```
820804

821-
#### **CLI Command Not Found**
805+
#### CLI Command Not Found
822806
```bash
823807
# Error: BioAnalyzer command not found
824808
# Solution: Add to PATH
825809
export PATH="$PATH:/home/<copmuter_name>/.local/bin"
826810
# Or restart terminal after running ./install.sh
827811
```
828812

829-
#### **API Not Responding**
813+
#### API Not Responding
830814
```bash
831815
# Check container status
832816
docker compose ps
@@ -838,7 +822,7 @@ docker compose logs
838822
docker compose restart
839823
```
840824

841-
#### **Missing API Keys**
825+
#### Missing API Keys
842826
```bash
843827
# Warning: GeminiQA not initialized
844828
# This is normal - system works without API keys
@@ -855,17 +839,17 @@ export LOG_LEVEL=DEBUG
855839
python main.py
856840
```
857841

858-
## 📚 Documentation
842+
## Documentation
859843

860-
- **🚀 Quick Start**: [QUICKSTART.md](docs/QUICKSTART.md) - Get running in 5 minutes
861-
- **📖 Complete Setup Guide**: [SETUP_GUIDE.md](SETUP_GUIDE.md) - Detailed setup steps (tested & verified)
862-
- **🏗️ Architecture Guide**: [ARCHITECTURE.md](docs/ARCHITECTURE.md) - System architecture and design
863-
- **🧠 RAG Guide**: [RAG_GUIDE.md](docs/RAG_GUIDE.md) - **NEW!** Comprehensive RAG features documentation
864-
- **⚙️ Settings Guide**: [SETTINGS.md](docs/SETTINGS.md) - Configuration system documentation
865-
- **🐳 Docker Guide**: [DOCKER_DEPLOYMENT.md](docs/DOCKER_DEPLOYMENT.md) - Docker deployment guide
866-
- **🔧 API Documentation**: http://localhost:8000/docs (when running) - Interactive API documentation
844+
- [QUICKSTART.md](docs/QUICKSTART.md) - Get running in 5 minutes
845+
- [SETUP_GUIDE.md](SETUP_GUIDE.md) - Detailed setup steps
846+
- [ARCHITECTURE.md](docs/ARCHITECTURE.md) - System architecture
847+
- [RAG_GUIDE.md](docs/RAG_GUIDE.md) - RAG features documentation
848+
- [SETTINGS.md](docs/SETTINGS.md) - Configuration system
849+
- [DOCKER_DEPLOYMENT.md](docs/DOCKER_DEPLOYMENT.md) - Docker deployment
850+
- API Documentation: http://localhost:8000/docs (when running)
867851

868-
## 🤝 Contributing
852+
## Contributing
869853

870854
1. Fork the repository
871855
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
@@ -881,11 +865,11 @@ python main.py
881865
- Use type hints for all functions
882866
- Write comprehensive docstrings
883867

884-
## 📄 License
868+
## License
885869

886870
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
887871

888-
## 🙏 Acknowledgments
872+
## Acknowledgments
889873

890874
- **BugSigDB Team**: For the microbial signatures database
891875
- **NCBI**: For PubMed data access and E-utilities API
@@ -894,12 +878,12 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
894878
- **FastAPI**: For the excellent web framework
895879
- **Docker**: For containerization technology
896880

897-
## 📞 Support
881+
## Support
898882

899883
- **Issues**: [GitHub Issues](https://github.com/waldronlab/bioanalyzer-backend/issues)
900884
- **Discussions**: [GitHub Discussions](https://github.com/waldronlab/bioanalyzer-backend/discussions)
901885
- **Documentation**: [Project Wiki](https://github.com/waldronlab/bioanalyzer-backend/wiki)
902886

903887
---
904888

905-
**Happy analyzing! 🧬🔬**
889+
Happy analyzing!

0 commit comments

Comments
 (0)