explodinggradients
diff --git a/‎experimental/docs/gdrive_backend.md‎
Lines changed: 235 additions & 0 deletions b/‎experimental/docs/gdrive_backend.md‎
Lines changed: 235 additions & 0 deletions
diff --git a/‎experimental/examples/gdrive_append_example.py‎
Lines changed: 163 additions & 0 deletions b/‎experimental/examples/gdrive_append_example.py‎
Lines changed: 163 additions & 0 deletions
@@ -0,0 +1,235 @@
+# Google Drive Backend for Ragas
+
+The Google Drive backend allows you to store Ragas datasets and experiments in Google Sheets within your Google Drive. This provides a cloud-based, collaborative storage solution that's familiar to many users.
+
+## Features
+
+- **Cloud Storage**: Store your datasets and experiments in Google Drive
+- **Collaborative**: Share and collaborate on datasets using Google Drive's sharing features
+- **Google Sheets Format**: Data is stored in Google Sheets for easy viewing and editing
+- **Automatic Structure**: Creates organized folder structure (datasets/ and experiments/)
+- **Type Preservation**: Attempts to preserve basic data types (strings, numbers)
+- **Multiple Authentication**: Supports both OAuth and Service Account authentication
+
+## Installation
+
+```bash
+# Install with Google Drive dependencies
+pip install "ragas_experimental[gdrive]"
+```
+
+## Setup
+
+### 1. Google Cloud Project Setup
+
+1. Go to the [Google Cloud Console](https://console.cloud.google.com/)
+2. Create a new project or select an existing one
+3. Enable the following APIs:
+   - Google Drive API
+   - Google Sheets API
+
+### 2. Authentication Setup
+
+Choose one of two authentication methods:
+
+#### Option A: Service Account (Recommended)
+
+1. In Google Cloud Console, go to "Credentials"
+2. Click "Create Credentials" → "Service account"
+3. Create the service account and download the JSON key file
+4. Share your Google Drive folder with the service account email
+
+*This is the preferred method as it works well for both scripts and production environments without requiring user interaction.*
+
+#### Option B: OAuth 2.0 (Alternative for Interactive Use)
+
+1. In Google Cloud Console, go to "Credentials"
+2. Click "Create Credentials" → "OAuth client ID"
+3. Choose "Desktop application"
+4. Download the JSON file (save as `credentials.json`)
+
+### 3. Google Drive Folder Setup
+
+1. Create a folder in Google Drive for your Ragas data
+2. Get the folder ID from the URL: `https://drive.google.com/drive/folders/FOLDER_ID_HERE`
+3. If using Service Account, share the folder with the service account email
+
+## Usage
+
+### Basic Usage
+
+```python
+from ragas_experimental.dataset import Dataset
+from pydantic import BaseModel
+
+# Define your data model
+class EvaluationRecord(BaseModel):
+    question: str
+    answer: str
+    score: float
+
+# Create dataset with Google Drive backend
+dataset = Dataset(
+    name="my_evaluation",
+    backend="gdrive",
+    data_model=EvaluationRecord,
+    folder_id="your_google_drive_folder_id",
+    credentials_path="path/to/credentials.json"
+)
+
+# Add data
+record = EvaluationRecord(
+    question="What is AI?",
+    answer="Artificial Intelligence",
+    score=0.95
+)
+dataset.append(record)
+
+# Save to Google Drive
+dataset.save()
+
+# Load from Google Drive
+dataset.load()
+```
+
+### Authentication Options
+
+#### Using Environment Variables
+
+```bash
+export GDRIVE_FOLDER_ID="your_folder_id"
+export GDRIVE_CREDENTIALS_PATH="path/to/credentials.json"
+# OR for service account:
+export GDRIVE_SERVICE_ACCOUNT_PATH="path/to/service_account.json"
+```
+
+```python
+# Environment variables will be used automatically
+dataset = Dataset(
+    name="my_evaluation",
+    backend="gdrive",
+    data_model=EvaluationRecord,
+    folder_id=os.getenv("GDRIVE_FOLDER_ID")
+)
+```
+
+#### Using Service Account
+
+```python
+dataset = Dataset(
+    name="my_evaluation",
+    backend="gdrive",
+    data_model=EvaluationRecord,
+    folder_id="your_folder_id",
+    service_account_path="path/to/service_account.json"
+)
+```
+
+#### Custom Token Path
+
+```python
+dataset = Dataset(
+    name="my_evaluation",
+    backend="gdrive",
+    data_model=EvaluationRecord,
+    folder_id="your_folder_id",
+    credentials_path="path/to/credentials.json",
+    token_path="custom_token.json"
+)
+```
+
+## File Structure
+
+The backend creates the following structure in your Google Drive folder:
+
+```text
+Your Google Drive Folder/
+├── datasets/
+│   ├── dataset1.gsheet
+│   ├── dataset2.gsheet
+│   └── ...
+└── experiments/
+    ├── experiment1.gsheet
+    ├── experiment2.gsheet
+    └── ...
+```
+
+Each dataset/experiment is stored as a separate Google Sheet with:
+
+- Column headers matching your data model fields
+- Automatic type conversion for basic types (int, float, string)
+- JSON serialization for complex objects
+
+## Environment Variables
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `GDRIVE_FOLDER_ID` | Google Drive folder ID | `1abc123...` |
+| `GDRIVE_CREDENTIALS_PATH` | Path to OAuth credentials JSON | `./credentials.json` |
+| `GDRIVE_SERVICE_ACCOUNT_PATH` | Path to service account JSON | `./service_account.json` |
+| `GDRIVE_TOKEN_PATH` | Path to store OAuth token | `./token.json` |
+
+## Best Practices
+
+### Security
+
+- Never commit credential files to version control
+- Use environment variables for sensitive information
+- Regularly rotate service account keys
+- Use OAuth for development, service accounts for production
+
+### Performance
+
+- Google Sheets API has rate limits - avoid frequent saves with large datasets
+- Consider batching operations when possible
+- Use appropriate folder organization for large numbers of datasets
+
+### Collaboration
+
+- Share folders with appropriate permissions (view/edit)
+- Use descriptive dataset names
+- Document your data models clearly
+
+## Troubleshooting
+
+### Common Issues
+
+1. **"Folder not found" error**
+   - Verify the folder ID is correct
+   - Ensure the folder is shared with your service account (if using one)
+   - Check that the folder exists and is accessible
+
+2. **Authentication errors**
+   - Verify credential file paths are correct
+   - Check that required APIs are enabled in Google Cloud Console
+   - For OAuth: delete token file and re-authenticate
+   - For Service Account: verify the JSON file is valid
+
+3. **Permission errors**
+   - Ensure your account has edit access to the folder
+   - For service accounts: share the folder with the service account email
+   - Check Google Drive sharing settings
+
+4. **Import errors**
+   - Install dependencies: `pip install "ragas_experimental[gdrive]"`
+   - Verify all required packages are installed
+
+### Getting Help
+
+If you encounter issues:
+
+1. Check error messages carefully for specific details
+2. Verify your Google Cloud project setup
+3. Test with a simple example first
+4. Check the Google Drive API documentation for rate limits
+
+## Limitations
+
+- Google Sheets has a limit of 10 million cells per spreadsheet
+- Complex nested objects are JSON-serialized as strings
+- API rate limits may affect performance with large datasets
+- Requires internet connection for all operations
+
+## Examples
+
+See `examples/gdrive_backend_example.py` for a complete working example.
@@ -0,0 +1,163 @@
+"""Example showing how to append data to an existing Google Drive dataset.
+
+This demonstrates the proper pattern for adding data to existing datasets
+while preserving the existing records.
+"""
+
+from pydantic import BaseModel
+from ragas_experimental.dataset import Dataset
+
+
+# Example data model
+class EvaluationRecord(BaseModel):
+    question: str
+    answer: str
+    context: str
+    score: float
+    feedback: str
+
+
+def append_to_existing_dataset():
+    """Example of appending to an existing dataset."""
+    
+    folder_id = "folder_id_here"  # Replace with your actual Google Drive folder ID
+    
+    # Option 1: Load existing dataset and add more data
+    print("=== Appending to Existing Dataset ===")
+    
+    try:
+        # Try to load existing dataset
+        dataset = Dataset.load(
+            name="evaluation_results",
+            backend="gdrive",
+            data_model=EvaluationRecord,
+            folder_id=folder_id,
+            credentials_path="credentials.json",
+            token_path="token.json"
+        )
+        print(f"Loaded existing dataset with {len(dataset)} records")
+        
+    except FileNotFoundError:
+        # Dataset doesn't exist, create a new one
+        print("Dataset doesn't exist, creating new one")
+        dataset = Dataset(
+            name="evaluation_results",
+            backend="gdrive",
+            data_model=EvaluationRecord,
+            folder_id=folder_id,
+            credentials_path="credentials.json",
+            token_path="token.json"
+        )
+    
+    # Show existing records
+    print("Existing records:")
+    for i, record in enumerate(dataset):
+        print(f"  {i+1}. {record.question}")
+    
+    # Add new records
+    new_records = [
+        EvaluationRecord(
+            question="What is the largest planet in our solar system?",
+            answer="Jupiter",
+            context="Solar system knowledge question.",
+            score=0.9,
+            feedback="Correct answer"
+        ),
+        EvaluationRecord(
+            question="Who painted the Mona Lisa?",
+            answer="Leonardo da Vinci",
+            context="Art history question.",
+            score=1.0,
+            feedback="Perfect answer"
+        )
+    ]
+    
+    # Append new records
+    for record in new_records:
+        dataset.append(record)
+    
+    print(f"\nAdded {len(new_records)} new records")
+    
+    # Save the updated dataset (this replaces the sheet with all records)
+    dataset.save()
+    print(f"Saved updated dataset with {len(dataset)} total records")
+    
+    # Verify by listing all records
+    print("\nAll records in dataset:")
+    for i, record in enumerate(dataset):
+        print(f"  {i+1}. {record.question} -> {record.answer}")
+    
+    return dataset
+
+
+def create_multiple_datasets():
+    """Example of creating separate datasets instead of appending."""
+    
+    folder_id = "folder_id_here"  # Replace with your actual Google Drive folder ID
+    
+    print("\n=== Creating Multiple Datasets ===")
+    
+    # Create different datasets for different evaluation runs
+    datasets = {}
+    
+    for run_name, data in [
+        ("basic_qa", [
+            EvaluationRecord(
+                question="What is 1+1?",
+                answer="Two",
+                context="Basic math",
+                score=1.0,
+                feedback="Correct"
+            )
+        ]),
+        ("advanced_qa", [
+            EvaluationRecord(
+                question="Explain quantum entanglement",
+                answer="Quantum entanglement is a phenomenon...",
+                context="Advanced physics",
+                score=0.8,
+                feedback="Good explanation"
+            )
+        ])
+    ]:
+        dataset = Dataset(
+            name=f"evaluation_{run_name}",
+            backend="gdrive",
+            data_model=EvaluationRecord,
+            folder_id=folder_id,
+            credentials_path="credentials.json",
+            token_path="token.json"
+        )
+        
+        for record in data:
+            dataset.append(record)
+        
+        dataset.save()
+        datasets[run_name] = dataset
+        print(f"Created dataset '{run_name}' with {len(dataset)} records")
+    
+    # List all datasets
+    available_datasets = list(datasets.values())[0].backend.list_datasets()
+    print(f"\nAll available datasets: {available_datasets}")
+    
+    return datasets
+
+
+if __name__ == "__main__":
+    try:
+        # Method 1: Append to existing dataset
+        dataset = append_to_existing_dataset()
+        
+        # Method 2: Create separate datasets  
+        datasets = create_multiple_datasets()
+        
+        print("\n✅ Append operations completed successfully!")
+        print("\nKey points:")
+        print("- dataset.save() replaces the entire sheet (this is the intended behavior)")
+        print("- To append: load existing data, add new records, then save")
+        print("- For different evaluation runs, consider separate datasets")
+        
+    except Exception as e:
+        print(f"Error: {e}")
+        import traceback
+        traceback.print_exc()