🧪 XAScribe

AI-powered research assistant for X-ray Absorption Spectroscopy (XAS) analysis and manuscript generation using Google Gemini 2.5 Flash and RAG technology.

Features

XAS Data Analysis: Upload and analyze experimental data with machine learning models
Literature Processing: Process research papers for context-aware manuscript generation with intelligent caching
AI Manuscript Generation: Create academic content with proper citations using Gemini 2.5 Flash
Smart Caching: Automatically reuses processed papers from previous sessions - no need to re-upload
Optional Uploads: Add new papers to existing collections or work with cached data only
Interactive Workflow: Streamlined interface with automatic detection of existing data

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Get Google Gemini API Key

Visit Google AI Studio
Create an API key for Gemini
Keep it ready for the app interface

3. Run the Application

streamlit run app.py

4. Access the App

Open your browser to the URL shown in terminal (typically http://localhost:8501)

Usage

Enter API Key in the sidebar
Check Existing Data: App automatically detects papers from previous sessions
Upload Data (optional):
- XAS data files (JSON, CSV, XLSX, TXT)
- Additional research papers (individual TXT files or ZIP archive)
Process Data: Initialize with existing papers or add new ones to your collection
Generate Manuscript: Enter your research question and generate academic content

Smart Caching System

First Run: Upload research papers and XAS data - system creates cache files
Subsequent Runs: Papers are automatically detected and loaded from cache
Adding Papers: Upload additional papers to expand your existing collection
Cache Files: rag_cache.pkl (metadata) and embeddings_cache.pkl (vector embeddings)

Example Data

The project includes:

example_xas_data/nmc_exp_xas.json - Sample XAS experimental data
example_papers/ - 24 research papers about XAS studies on battery cathodes
paper_metadata.csv - Citation metadata for proper referencing

File Structure

├── app.py                    # Main Streamlit application
├── rag.py                    # RAG system with FAISS vector search
├── Backend/Modeling.py       # XAS data analysis pipeline
├── requirements.txt          # Python dependencies
├── paper_metadata.csv        # Citation metadata
├── rag_cache.pkl             # Cached paper metadata (auto-generated)
├── embeddings_cache.pkl      # Cached vector embeddings (auto-generated)
├── converted_papers/         # Processed paper text files (auto-generated)
├── example_xas_data/         # Sample experimental data
└── example_papers/           # Sample research papers

Configuration

Set your API key via:

Streamlit sidebar interface (recommended)
Environment variable: export GOOGLE_API_KEY="your-key"

Technologies

Frontend: Streamlit
AI: Google Gemini 2.5 Flash
Vector Search: FAISS (Facebook AI Similarity Search)
Embeddings: BGE-large-en (sentence-transformers)
ML: Scikit-learn, SHAP
Materials Science: Pymatgen
Caching: Python pickle for persistent storage

Troubleshooting

API Key Error: Verify your Gemini API key is valid and has access to Gemini 2.5 Flash
File Format: Use supported formats (JSON, CSV, XLSX, TXT, ZIP)
Memory Issues: Process smaller batches for large datasets
Cache Issues: Delete rag_cache.pkl and embeddings_cache.pkl to force re-processing
Missing Papers: Check converted_papers/ folder for processed text files
Model Not Found: Ensure your API key has access to the latest Gemini models

Cache Management

Cache Location: All cache files are stored in the project root directory
Force Refresh: Delete cache files to re-process papers with updated parameters
Backup: Cache files can be backed up to preserve processed literature databases
Sharing: Share rag_cache.pkl and embeddings_cache.pkl to distribute processed datasets

XAScribe v2.0 - Advanced AI Research Platform for Materials Science with Smart Caching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 XAScribe

Features

Quick Start

1. Install Dependencies

2. Get Google Gemini API Key

3. Run the Application

4. Access the App

Usage

Smart Caching System

Example Data

File Structure

Configuration

Technologies

Troubleshooting

Cache Management

About

Uh oh!

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Backend		Backend
Random Forest Model		Random Forest Model
__pycache__		__pycache__
converted_papers		converted_papers
example_xas_data		example_xas_data
=0.41.0		=0.41.0
README.md		README.md
app.py		app.py
embeddings_cache.pkl		embeddings_cache.pkl
paper_metadata.csv		paper_metadata.csv
rag.py		rag.py
rag_cache.pkl		rag_cache.pkl
requirements.txt		requirements.txt

Oscuro-Phoenix/xascribe

Folders and files

Latest commit

History

Repository files navigation

🧪 XAScribe

Features

Quick Start

1. Install Dependencies

2. Get Google Gemini API Key

3. Run the Application

4. Access the App

Usage

Smart Caching System

Example Data

File Structure

Configuration

Technologies

Troubleshooting

Cache Management

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages