|
| 1 | +# Google Drive Backend for Ragas |
| 2 | + |
| 3 | +The Google Drive backend allows you to store Ragas datasets and experiments in Google Sheets within your Google Drive. This provides a cloud-based, collaborative storage solution that's familiar to many users. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Cloud Storage**: Store your datasets and experiments in Google Drive |
| 8 | +- **Collaborative**: Share and collaborate on datasets using Google Drive's sharing features |
| 9 | +- **Google Sheets Format**: Data is stored in Google Sheets for easy viewing and editing |
| 10 | +- **Automatic Structure**: Creates organized folder structure (datasets/ and experiments/) |
| 11 | +- **Type Preservation**: Attempts to preserve basic data types (strings, numbers) |
| 12 | +- **Multiple Authentication**: Supports both OAuth and Service Account authentication |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +```bash |
| 17 | +# Install with Google Drive dependencies |
| 18 | +pip install "ragas_experimental[gdrive]" |
| 19 | +``` |
| 20 | + |
| 21 | +## Setup |
| 22 | + |
| 23 | +### 1. Google Cloud Project Setup |
| 24 | + |
| 25 | +1. Go to the [Google Cloud Console](https://console.cloud.google.com/) |
| 26 | +2. Create a new project or select an existing one |
| 27 | +3. Enable the following APIs: |
| 28 | + - Google Drive API |
| 29 | + - Google Sheets API |
| 30 | + |
| 31 | +### 2. Authentication Setup |
| 32 | + |
| 33 | +Choose one of two authentication methods: |
| 34 | + |
| 35 | +#### Option A: Service Account (Recommended) |
| 36 | + |
| 37 | +1. In Google Cloud Console, go to "Credentials" |
| 38 | +2. Click "Create Credentials" → "Service account" |
| 39 | +3. Create the service account and download the JSON key file |
| 40 | +4. Share your Google Drive folder with the service account email |
| 41 | + |
| 42 | +*This is the preferred method as it works well for both scripts and production environments without requiring user interaction.* |
| 43 | + |
| 44 | +#### Option B: OAuth 2.0 (Alternative for Interactive Use) |
| 45 | + |
| 46 | +1. In Google Cloud Console, go to "Credentials" |
| 47 | +2. Click "Create Credentials" → "OAuth client ID" |
| 48 | +3. Choose "Desktop application" |
| 49 | +4. Download the JSON file (save as `credentials.json`) |
| 50 | + |
| 51 | +### 3. Google Drive Folder Setup |
| 52 | + |
| 53 | +1. Create a folder in Google Drive for your Ragas data |
| 54 | +2. Get the folder ID from the URL: `https://drive.google.com/drive/folders/FOLDER_ID_HERE` |
| 55 | +3. If using Service Account, share the folder with the service account email |
| 56 | + |
| 57 | +## Usage |
| 58 | + |
| 59 | +### Basic Usage |
| 60 | + |
| 61 | +```python |
| 62 | +from ragas_experimental.dataset import Dataset |
| 63 | +from pydantic import BaseModel |
| 64 | + |
| 65 | +# Define your data model |
| 66 | +class EvaluationRecord(BaseModel): |
| 67 | + question: str |
| 68 | + answer: str |
| 69 | + score: float |
| 70 | + |
| 71 | +# Create dataset with Google Drive backend |
| 72 | +dataset = Dataset( |
| 73 | + name="my_evaluation", |
| 74 | + backend="gdrive", |
| 75 | + data_model=EvaluationRecord, |
| 76 | + folder_id="your_google_drive_folder_id", |
| 77 | + credentials_path="path/to/credentials.json" |
| 78 | +) |
| 79 | + |
| 80 | +# Add data |
| 81 | +record = EvaluationRecord( |
| 82 | + question="What is AI?", |
| 83 | + answer="Artificial Intelligence", |
| 84 | + score=0.95 |
| 85 | +) |
| 86 | +dataset.append(record) |
| 87 | + |
| 88 | +# Save to Google Drive |
| 89 | +dataset.save() |
| 90 | + |
| 91 | +# Load from Google Drive |
| 92 | +dataset.load() |
| 93 | +``` |
| 94 | + |
| 95 | +### Authentication Options |
| 96 | + |
| 97 | +#### Using Environment Variables |
| 98 | + |
| 99 | +```bash |
| 100 | +export GDRIVE_FOLDER_ID="your_folder_id" |
| 101 | +export GDRIVE_CREDENTIALS_PATH="path/to/credentials.json" |
| 102 | +# OR for service account: |
| 103 | +export GDRIVE_SERVICE_ACCOUNT_PATH="path/to/service_account.json" |
| 104 | +``` |
| 105 | + |
| 106 | +```python |
| 107 | +# Environment variables will be used automatically |
| 108 | +dataset = Dataset( |
| 109 | + name="my_evaluation", |
| 110 | + backend="gdrive", |
| 111 | + data_model=EvaluationRecord, |
| 112 | + folder_id=os.getenv("GDRIVE_FOLDER_ID") |
| 113 | +) |
| 114 | +``` |
| 115 | + |
| 116 | +#### Using Service Account |
| 117 | + |
| 118 | +```python |
| 119 | +dataset = Dataset( |
| 120 | + name="my_evaluation", |
| 121 | + backend="gdrive", |
| 122 | + data_model=EvaluationRecord, |
| 123 | + folder_id="your_folder_id", |
| 124 | + service_account_path="path/to/service_account.json" |
| 125 | +) |
| 126 | +``` |
| 127 | + |
| 128 | +#### Custom Token Path |
| 129 | + |
| 130 | +```python |
| 131 | +dataset = Dataset( |
| 132 | + name="my_evaluation", |
| 133 | + backend="gdrive", |
| 134 | + data_model=EvaluationRecord, |
| 135 | + folder_id="your_folder_id", |
| 136 | + credentials_path="path/to/credentials.json", |
| 137 | + token_path="custom_token.json" |
| 138 | +) |
| 139 | +``` |
| 140 | + |
| 141 | +## File Structure |
| 142 | + |
| 143 | +The backend creates the following structure in your Google Drive folder: |
| 144 | + |
| 145 | +```text |
| 146 | +Your Google Drive Folder/ |
| 147 | +├── datasets/ |
| 148 | +│ ├── dataset1.gsheet |
| 149 | +│ ├── dataset2.gsheet |
| 150 | +│ └── ... |
| 151 | +└── experiments/ |
| 152 | + ├── experiment1.gsheet |
| 153 | + ├── experiment2.gsheet |
| 154 | + └── ... |
| 155 | +``` |
| 156 | + |
| 157 | +Each dataset/experiment is stored as a separate Google Sheet with: |
| 158 | + |
| 159 | +- Column headers matching your data model fields |
| 160 | +- Automatic type conversion for basic types (int, float, string) |
| 161 | +- JSON serialization for complex objects |
| 162 | + |
| 163 | +## Environment Variables |
| 164 | + |
| 165 | +| Variable | Description | Example | |
| 166 | +|----------|-------------|---------| |
| 167 | +| `GDRIVE_FOLDER_ID` | Google Drive folder ID | `1abc123...` | |
| 168 | +| `GDRIVE_CREDENTIALS_PATH` | Path to OAuth credentials JSON | `./credentials.json` | |
| 169 | +| `GDRIVE_SERVICE_ACCOUNT_PATH` | Path to service account JSON | `./service_account.json` | |
| 170 | +| `GDRIVE_TOKEN_PATH` | Path to store OAuth token | `./token.json` | |
| 171 | + |
| 172 | +## Best Practices |
| 173 | + |
| 174 | +### Security |
| 175 | + |
| 176 | +- Never commit credential files to version control |
| 177 | +- Use environment variables for sensitive information |
| 178 | +- Regularly rotate service account keys |
| 179 | +- Use OAuth for development, service accounts for production |
| 180 | + |
| 181 | +### Performance |
| 182 | + |
| 183 | +- Google Sheets API has rate limits - avoid frequent saves with large datasets |
| 184 | +- Consider batching operations when possible |
| 185 | +- Use appropriate folder organization for large numbers of datasets |
| 186 | + |
| 187 | +### Collaboration |
| 188 | + |
| 189 | +- Share folders with appropriate permissions (view/edit) |
| 190 | +- Use descriptive dataset names |
| 191 | +- Document your data models clearly |
| 192 | + |
| 193 | +## Troubleshooting |
| 194 | + |
| 195 | +### Common Issues |
| 196 | + |
| 197 | +1. **"Folder not found" error** |
| 198 | + - Verify the folder ID is correct |
| 199 | + - Ensure the folder is shared with your service account (if using one) |
| 200 | + - Check that the folder exists and is accessible |
| 201 | + |
| 202 | +2. **Authentication errors** |
| 203 | + - Verify credential file paths are correct |
| 204 | + - Check that required APIs are enabled in Google Cloud Console |
| 205 | + - For OAuth: delete token file and re-authenticate |
| 206 | + - For Service Account: verify the JSON file is valid |
| 207 | + |
| 208 | +3. **Permission errors** |
| 209 | + - Ensure your account has edit access to the folder |
| 210 | + - For service accounts: share the folder with the service account email |
| 211 | + - Check Google Drive sharing settings |
| 212 | + |
| 213 | +4. **Import errors** |
| 214 | + - Install dependencies: `pip install "ragas_experimental[gdrive]"` |
| 215 | + - Verify all required packages are installed |
| 216 | + |
| 217 | +### Getting Help |
| 218 | + |
| 219 | +If you encounter issues: |
| 220 | + |
| 221 | +1. Check error messages carefully for specific details |
| 222 | +2. Verify your Google Cloud project setup |
| 223 | +3. Test with a simple example first |
| 224 | +4. Check the Google Drive API documentation for rate limits |
| 225 | + |
| 226 | +## Limitations |
| 227 | + |
| 228 | +- Google Sheets has a limit of 10 million cells per spreadsheet |
| 229 | +- Complex nested objects are JSON-serialized as strings |
| 230 | +- API rate limits may affect performance with large datasets |
| 231 | +- Requires internet connection for all operations |
| 232 | + |
| 233 | +## Examples |
| 234 | + |
| 235 | +See `examples/gdrive_backend_example.py` for a complete working example. |
0 commit comments