Skip to content

Latest commit

 

History

History
229 lines (172 loc) · 5.46 KB

File metadata and controls

229 lines (172 loc) · 5.46 KB

Agent 0.0 Google Docs Content Integration

Date: 2025-02-08
Status: ✅ COMPLETE


✅ What Was Added

1. Google Docs Content Fetching

  • ✅ Added _fetch_google_docs_content() method to Agent 0.0
  • ✅ Uses GoogleDriveClient to fetch actual document content
  • ✅ Extracts file IDs from document metadata
  • ✅ Fetches content for top 10 most recent documents
  • ✅ Truncates content to 5000 chars to avoid token limits

2. Enhanced Analysis Prompt

  • ✅ Google Docs content included in Gemini analysis when available
  • ✅ Content analysis provides deeper insights than metadata alone:
    • Actual topics and concepts user works on
    • Technical depth and complexity
    • Learning patterns from document content
    • Project identification from content

3. Token Management

  • ✅ Optional google_access_token parameter
  • ✅ Placeholder for automatic token retrieval from database
  • ✅ Graceful fallback to metadata-only if token not available

How It Works

Data Flow

  1. Agent 0.0 CalledPOST /api/agents/persona-architect
  2. Fetch Docs Metadata → From database
  3. Get Access Token → From request or database (future)
  4. Fetch Docs Content → Using Google Drive API
  5. Build Analysis Prompt → Includes content if available
  6. Gemini Analysis → Uses content for deeper insights
  7. Build Persona Card → Enhanced with content insights

Content Fetching Process

For each document (top 10):
    ↓
Extract Google Doc file ID from metadata
    ↓
Call Google Drive API export_media()
    ↓
Get document content as text
    ↓
Truncate to 5000 chars (if needed)
    ↓
Include in analysis prompt

Usage

API Call with Access Token

POST /api/agents/persona-architect
{
    "include_docs": true,
    "google_access_token": "ya29.a0AfH6SMB..."  # Optional
}

Response Includes Content Analysis

When content is fetched, the analysis prompt includes:

  • Actual document text (up to 5000 chars per doc)
  • Content length information
  • File IDs for reference

Without Access Token

If no token provided:

  • ✅ Still works with metadata only
  • ✅ Uses document titles and metadata
  • ⚠️ Less accurate persona analysis

How Content Enhances Persona

1. Expertise Levels

With Content:

  • Analyze technical depth in documents
  • Identify specific technologies/concepts used
  • Determine actual skill level from code/examples

Without Content:

  • Only titles and metadata
  • Less accurate expertise assessment

2. Active Projects

With Content:

  • Identify actual project topics from content
  • See project structure and goals
  • Understand project complexity

Without Content:

  • Only document titles
  • Less project detail

3. Preferred Topics

With Content:

  • Extract actual topics from document text
  • Identify recurring themes
  • Understand learning focus areas

Without Content:

  • Infer from titles only
  • Less topic accuracy

4. Knowledge Gaps

With Content:

  • Identify learning areas from tutorial content
  • See questions/problems being solved
  • Understand confusion patterns

Without Content:

  • Infer from document types
  • Less gap identification

Privacy & Security

What's Used

  • ✅ Only documents user has access to
  • ✅ Content truncated to 5000 chars per doc
  • ✅ Top 10 documents only
  • ✅ Requires explicit access token

What's NOT Used

  • ❌ Private documents without permission
  • ❌ Full document content (truncated)
  • ❌ Documents outside user's access

Files Modified

  • backend/agents/persona_architect.py

    • Added _fetch_google_docs_content() method
    • Added _get_user_google_token() placeholder
    • Updated process() to fetch content
    • Enhanced _build_analysis_prompt() with content
  • backend/routes/agents.py

    • Added google_access_token parameter

Token Management

Current Implementation

  • Token must be passed explicitly in request
  • Placeholder for database token storage

Future Enhancement

# TODO: When TOKENS table is created
async def _get_user_google_token(self, user_id: str):
    # Query TOKENS table for user's Google access token
    # Refresh token if expired
    # Return valid access token

Testing

Test with Access Token

# 1. Get Google OAuth access token (from frontend)
# 2. Call Agent 0.0
curl -X POST http://localhost:8000/api/agents/persona-architect \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "google_access_token": "ya29.a0AfH6SMB...",
    "include_docs": true
  }'

# 3. Check response - should include content analysis

Test without Access Token

# Should still work with metadata only
curl -X POST http://localhost:8000/api/agents/persona-architect \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "include_docs": true
  }'

Integration Status

Agent 0.0 - Can fetch Google Docs content
Google Drive Client - Ready to use
Content Analysis - Included in persona building
⚠️ Token Storage - Needs TOKENS table (future)


Next Steps

  1. Create TOKENS Table - Store access tokens securely
  2. Auto Token Retrieval - Get token from database automatically
  3. Token Refresh - Handle expired tokens
  4. Content Caching - Cache document content to reduce API calls

Status: ✅ READY - Agent 0.0 can analyze Google Docs content when access token is provided