Skip to content

Infra/gcp storage#328

Open
gagan-a11y wants to merge 79 commits intoZackriya-Solutions:mainfrom
gagan-a11y:infra/gcp-storage
Open

Infra/gcp storage#328
gagan-a11y wants to merge 79 commits intoZackriya-Solutions:mainfrom
gagan-a11y:infra/gcp-storage

Conversation

@gagan-a11y
Copy link

Description

[Provide a detailed description of your changes]

Related Issue

[Link to the issue this PR addresses (e.g., "Fixes #123")]

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please describe)

Testing

  • Unit tests added/updated
  • Manual testing performed
  • All tests pass

Documentation

  • Documentation updated
  • No documentation needed

Checklist

  • Code follows project style
  • Self-reviewed the code
  • Added comments for complex code
  • Updated README if needed
  • Branch is up to date with devtest
  • No merge conflicts

Screenshots (if applicable)

[Add screenshots here if your changes affect the UI]

Additional Notes

[Add any additional information that might be helpful for reviewers]

- Add PRD with timeline and architecture diagrams
- Add progress report tracking Phase 0 completion
- Add tech stack guide explaining 24 technologies
- Add Phase 1 implementation plan
- Add documentation index and navigation
- Browser audio capture via MediaRecorder API
- WebSocket streaming to backend
- ffmpeg WebM→WAV conversion
- Whisper integration with multilingual support
- Test UI at /test-audio
- Real-time transcription (~2-3s latency)
- Remove Tauri configuration and build files
- Delete Tauri-specific dependencies
- Clean up related scripts and references
- Add Groq Whisper API integration for low-latency transcription (~1-2s)
- Implement AudioWorklet for real-time 48kHz to 16kHz downsampling
- Add StreamingTranscriptionManager with VAD and rolling buffer
- Replace batch processing with continuous PCM streaming
- Remove old batch audio processing code and test pages
- Update documentation for Phase 1.5 completion
FEATURES:
- Ask AI: Chat about meetings with cross-meeting context search
- Catch Me Up: Quick summary for late joiners
- Context Linking: Link meetings to share context across sessions
- Vector Store: ChromaDB-based semantic search across all meetings

DOCS:
- Add meeting-copilot-docs/ with architecture and optimization guides
- Document 15 future optimizations (P0/P1/P2 prioritized)

COMPONENTS:
- ChatInterface: AI chat with streaming responses
- MeetingSelector: Link meetings for shared context
- Improved VAD and rolling buffer for transcription
This commit introduces Google Gemini as a primary LLM provider across the application and significantly enhances the 'Ask AI' capabilities.

Key Changes:

Backend:
- feat(db): Add `geminiApiKey` to the settings table with a simple migration.
- feat(api): Integrate Gemini for streaming responses in the 'Ask AI' chat endpoint.
- feat(api): Add Gemini as a supported provider for the 'Catch Up' summary feature.
- feat(ai): Implement intelligent context-linking and web search capabilities using Gemini Flash to determine when external information is needed.

Frontend:
- feat(settings): Add Gemini to the list of available providers in the Model Settings modal.
- feat(ui): The 'Ask AI' chat now dynamically uses the user's configured LLM, defaulting to Gemini.
- feat(ui): The 'Catch Up' feature now uses the configured model, with a fallback to Gemini for unsupported providers.

Docs:
- docs(optimizations): Add a proposal for a 'Custom Dictionary' feature to `FUTURE_OPTIMIZATIONS.md` to improve transcription accuracy for domain-specific terms.
- Add user context input for notes generation (fed to AI prompt)
- Implement AI-powered web search with Gemini (replaces DuckDuckGo)
- Add markdown rendering in chat interface
- Make chat panel resizable (350-800px drag handle)
- Fix /save-summary endpoint (was missing, causing saves to fail)
- Fix Sidebar delete/rename to use correct endpoints
- Fix get_transcript_data to query summary_processes directly
- Use gemini-2.0-flash model consistently across features
- Replace DuckDuckGo with SerpAPI Google Search (free tier)
- Crawl top results with httpx and extract content with trafilatura
- Gemini synthesizes findings with inline citations
- Filter non-English domains for better results
- Increase context prompt from 100 to 300 chars for better entity/term consistency
- Add prompt parameter to Groq translation mode (was missing)
- Add debug logging for context usage tracking

Benefits: names stay consistent (John not Jon), technical terms preserved (Kubernetes not Cube Netties)
- Revert context prompt from 300 to 100 chars
- Remove prompt from translation mode (caused garbled output)
- Keep original working configuration
…peline and remove Whisper dependency and related build steps
… context

- Add grounded system prompt to prevent hallucinations while being helpful
- Change context strategy: full transcripts for current + linked meetings
- Linked meetings: fetch full DB transcripts when triggered by keywords
- Global search: use vector search with 20 chunks when explicitly triggered
- Remove context truncation limits (Gemini 2.0 Flash has 1M token limit)
- Add keyword-only detection for linked meetings (no LLM classifier)
- Improve history handling: first 2 + last 8 messages for continuity
- Add debug logging for Gemini streaming
- Document new context flow with full transcript approach
- Add trigger keywords for each context type
- Include token estimation for extreme cases
- Document grounded prompt strategy
- Add usage examples
- Added explicit instructions for handling speech-to-text transcription errors
- Gemini now uses context to infer correct spellings for technical terms, names, acronyms
- Added 'Transcription Corrections' section at end of generated notes
- Shows what corrections were made from original transcript for transparency
- Replaced fixed 8s timeout with multi-condition triggers
- Silence threshold: 1.2s (balances speed vs clean output)
- Punctuation trigger: sentence + 3s speech duration
- Max timeout: 12s (safety net)
- Improved deduplication: 3-grams, 0.35 threshold, 50-word window
- Speech duration tracking for intelligent finalization
- Migrate documentation from 'meeting-copilot-docs' to 'pnyx-docs'
- Remove deprecated 'whisper-custom' server and 'whisper.cpp' submodule
- Add Neon DB migration and vector table setup scripts
- Add legacy table cleanup script
- Remove stale 'run' script
- Standardize database column names to lowercase for PostgreSQL compatibility
- Fix JSON serialization error for datetime objects in summary endpoints
- Ensure 'save_transcript' populates both segments and full transcripts
- Enforce ON DELETE CASCADE for meeting-related data
- Clean up temporary debug and test scripts
- Integrate 'store_meeting_embeddings' into the background transcript processing flow
- Ensure consistency between real-time recordings and file upload flows for vector store population
- Remove About page component and references
- Update Logo to link to home page instead of opening About dialog
- Remove About tab from settings
- Hide Info/About button in sidebar
- Add WebSocket heartbeat and force-flush to prevent data loss on disconnect
- Implement transcript versioning (backend schema + API)
- Add TranscriptVersionSelector UI with authenticated fetching
- Refactor TranscriptView:
  - Visualize alignment states (Confident, Uncertain, Overlap)
  - Hide 'Speaker 0' in live mode, show on every line in diarized mode
  - Fix duplicate version creation and auto-trigger issues
  - Fix runtime crashes with robust null checks
- Update database schema to support transcript source and alignment states
- Add Badge component to UI library
- Add 'Live Transcript' option to version selector to allow switching back from diarized views
- Update TranscriptPanel to fetch live data when 'Live Transcript' is selected
- Fix backend JSON serialization for transcript versions
- Ensure aligned transcript segments have UUIDs
- Apply code formatting improvements to diarization module
- Add POST /upload-meeting-recording endpoint
- Implement file processing pipeline (conversion -> transcribe -> diarize)
- Add ImportModal component and Sidebar integration
- Support importing audio/video files as new meetings
- Fix AttributeError in upload_meeting_recording: 'User' object has no 'id' or 'workspace_id'
- Use current_user.email as owner_id
- Default workspace_id to 'default'
- Import Path from pathlib in main.py
- Fix 500 error caused by missing import during file upload
- Import aiofiles for async file writing
- Ensure Path is imported from pathlib
- Resolves NameError and potential ImportError in upload_meeting_recording
- Update diarize_meeting endpoint to check for 'merged_recording.pcm' or 'merged_recording.wav'
- Fixes 'No audio recording found' error for imported meetings which don't have chunks
- Update run_diarization_job to support imported 'merged_recording' files
- Implement proper versioning strategy:
  1. Archive current 'live' transcript as Version 1 (if no history exists)
  2. Save new diarized transcript as next available Version
  3. Update main transcript view with diarized content
- Fix issue where diarization would overwrite history without saving versions
- Update get_diarization_status endpoint to check for 'merged_recording' files
- Ensures imported meetings correctly report as having audio available
- Add in-memory cache (1h TTL) for Google public keys to prevent rate limits and flakiness
- Add retry logic (3 attempts) for fetching certs
- Fix 'All connection attempts failed' error in auth verification
…andling

- Fix SQL parameter mismatch in diarization job
- Add missing aiofiles import in diarization service
- Update audio recorder to detect existing merged files
- Improve RBAC error handling to expose system errors
- Enable re-diarization in frontend UI
…ery reformulation

- Add Context Router to switch between Web Search and Internal Knowledge
- Implement Google Search integration via SerpAPI and Gemini
- Add Query Reformulation to resolve pronouns and meta-references
- Organize documentation into architecture/features/roadmap structure
- Update Phase 7 status to completed
- Add overlay loader when switching transcription versions in TranscriptPanel
- Add 'Thinking...' inline indicators for ChatInterface and RefineNotesSidebar
- Improve visual feedback during async operations
- Add Recovery Mode Action Bar with 'Save Meeting' and 'Discard' buttons
- Show meeting title in recovery bar for context
- Hide standard recording controls when in recovery mode
- Implement handleDiscardRecovery to clear unsaved data
…e actions

- Prevents overlapping of RecordingControls and Recovery Mode Actions banner
- Ensures 'Save Meeting' and 'Discard' buttons are visible when recovering unsaved meetings
- Display a clear toast message when diarization fails due to missing audio (e.g. recovered meetings)
- Catch and display other diarization errors in the UI
- Add StorageService to handle GCP/Local file operations
- Update AudioRecorder to merge and upload recordings to GCS
- Add GET /meetings/{id}/recording-url endpoint for secure playback
- Add AudioPlayer frontend component
- Add migration scripts for legacy local recordings
- Fix build error in page-content.tsx
- Add check_file_exists to StorageService to verify files in GCS before generating signed URLs
- Return proper 404 in recording-url endpoint if file is missing
- Update container_migrate.py to force GCP env vars before imports for reliability
- Fix NameError by importing StorageService in main.py
- Replace deprecated 'Open Meeting Folder' button with 'Download Audio'
- Integrate secure download logic using signed URLs from GCP/Local storage
- Update TranscriptButtonGroup and useMeetingOperations hook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

where id database data?why setting table has not data?

2 participants