Skip to content

Latest commit

 

History

History
223 lines (171 loc) · 5.78 KB

File metadata and controls

223 lines (171 loc) · 5.78 KB

Fixes Applied - Multi-Source API Issues

Issues Identified from Logs

❌ Clinical Trials APIs

  1. ClinicalTrials.gov - 403 Forbidden (blocked)
  2. EU CTR - 404 Not Found (incorrect URL or access restricted)
  3. WHO ICTRP - 404 Not Found (API endpoint changed or restricted)

❌ Patent APIs

  1. EPO OPS - 403 Forbidden (requires API key registration)
  2. Lens.org - 401 Unauthorized (requires API key)

❌ PDF Generation

  1. xhtml2pdf - ImportError: cannot import name 'ShowBoundaryValue' from 'reportlab.platypus.frames'
    • Compatibility issue between xhtml2pdf 0.2.13 and newer ReportLab versions

✅ Solutions Implemented

1. Clinical Trials - Added FREE Alternative

✅ PubMed Clinical Trials Search

  • API: NCBI E-utilities (completely free, no auth required)
  • URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
  • Access: Public, unlimited for reasonable use
  • Data: Clinical trial publications indexed in PubMed
  • Format: JSON

Implementation:

async def _search_pubmed_clinical_trials(...)
    # Search PubMed for clinical trial publications
    # Fetch: esearch.fcgi -> esummary.fcgi
    # Returns: PMID, title, publication date, journal, etc.

Benefits:

  • ✅ Always available
  • ✅ No authentication needed
  • ✅ Rich metadata
  • ✅ Global coverage
  • ✅ Updated daily

2. Patents - Added FREE Alternative

✅ FreePatentsOnline.com

Implementation:

async def _search_free_patents_online(...)
    # Search FreePatentsOnline
    # Parse HTML for patent numbers and titles
    # Returns: Patent ID, title, URL

Benefits:

  • ✅ No API key needed
  • ✅ Simple HTTP requests
  • ✅ Large US patent database
  • ✅ Always accessible

3. PDF Generation - Fixed Compatibility

Updated requirements.txt:

reportlab==3.6.13        # Stable version
xhtml2pdf==0.2.15        # Latest compatible version
html5lib==1.1            # Required dependency

Enhanced error handling:

try:
    # Try xhtml2pdf first
    from xhtml2pdf import pisa
    pisa.CreatePDF(...)
except ImportError:
    print("Install with: pip install xhtml2pdf")
    return text_fallback
except Exception:
    # Try WeasyPrint as backup
    try:
        import weasyprint
        weasyprint.HTML(...).write_pdf(...)
    except:
        return text_fallback

🔄 Current Data Sources

Clinical Trials (2 sources)

  1. ClinicalTrials.gov - Primary (may be blocked)
  2. PubMed - Secondary (always works) ⭐ NEW

Patents (2 sources)

  1. Curated Dataset - 20+ real pharma patents (always works)
  2. FreePatentsOnline - Web scraping (public access) ⭐ NEW

Literature (1 source)

  1. Europe PMC - Working perfectly

🚀 Next Steps to Test

1. Update Dependencies

cd c:\Users\abhis\Desktop\Projects\moleculeX\backend
venv\Scripts\pip.exe install -r requirements.txt --upgrade

2. Restart Backend

venv\Scripts\python.exe main.py

3. Test Query

Run any query and expect to see:

🔬 Clinical Trials Agent: Starting multi-source search...
🌐 Querying ClinicalTrials.gov...
🌐 Querying PubMed for clinical trial publications...
✅ PubMed Clinical Trials: 8 publications
⚠️ ClinicalTrials.gov blocked (403)
✅ Clinical Trials Agent: Found 8 unique trials from all sources

📄 Patent Agent: Starting multi-source patent search...
📚 Searching curated patent database...
🌐 Querying FreePatentsOnline.com...
✅ Curated dataset: 6 patents
✅ FreePatentsOnline: 5 patents
✅ Patent Agent: Found 11 unique patents from all sources

📊 Generating report...
✅ Report generated with xhtml2pdf: job_xxx.pdf

📊 Expected Improvements

Before:

  • Clinical Trials: 0 results (all APIs blocked)
  • Patents: 6 results (only curated dataset)
  • PDF: Failed (compatibility error)

After:

  • Clinical Trials: 5-15 results (PubMed works)
  • Patents: 10-20 results (curated + FreePatentsOnline)
  • PDF: ✅ Works (fixed dependencies)

🛠️ Fallback Strategy

If all APIs fail:

  1. Clinical Trials: Returns 0 results (graceful handling)
  2. Patents: Always returns curated dataset (6+ patents)
  3. Literature: Europe PMC (almost always works)
  4. Reports: Text file generated (.txt format)

The system NEVER crashes, always provides some results!


🔮 Future Enhancements

More Free APIs to Add:

Clinical Trials:

  • ✅ PubMed (implemented)
  • 🔄 ClinicalTrials.gov RSS feeds
  • 🔄 OpenTrials.net
  • 🔄 ANZCTR (Australia/NZ registry)

Patents:

  • ✅ FreePatentsOnline (implemented)
  • 🔄 Google Patents Public Datasets (BigQuery free tier)
  • 🔄 WIPO PatentScope (requires registration)
  • 🔄 J-PlatPat (Japan patents)

Literature:

  • ✅ Europe PMC (working)
  • 🔄 PubMed Central (free full-text)
  • 🔄 CORE (academic papers)
  • 🔄 Semantic Scholar (AI-powered)

⚠️ Known Limitations

  1. ClinicalTrials.gov: May remain blocked (403) - this is their anti-bot protection
  2. FreePatentsOnline: HTML parsing may break if they change their website
  3. PubMed: Rate limited to 3 requests/second (we're well under this)
  4. PDF Generation: Requires proper HTML/CSS (complex layouts may fail)

📝 Testing Checklist

  • Install updated dependencies
  • Restart backend server
  • Run test query
  • Check console logs for API responses
  • Verify PubMed returns results
  • Verify FreePatentsOnline returns results
  • Check if PDF is generated (not .txt)
  • View report on frontend
  • Test with different queries

Last Updated: November 9, 2025
Status: ✅ Ready for testing