Skip to content

Email Integration#3

Open
kamchettysadhika wants to merge 7 commits intoKellisLab:mainfrom
kamchettysadhika:feature/meeting-scheduler
Open

Email Integration#3
kamchettysadhika wants to merge 7 commits intoKellisLab:mainfrom
kamchettysadhika:feature/meeting-scheduler

Conversation

@kamchettysadhika
Copy link
Contributor

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kamchettysadhika, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a comprehensive suite of tools and scripts aimed at automating and enriching the process of handling meeting transcripts. The core functionality revolves around processing Panopto video transcripts, extracting key information and action items using AI (GPT), analyzing related codebases and GitHub activity, generating tailored recommendations for AI coding assistants (like Cursor and Copilot), and automating the scheduling of follow-up collaboration meetings via Google Calendar, complete with email reminders. The changes span multiple new Python files implementing distinct stages of this pipeline and modifications to existing files to integrate these new components and update dependencies (like the OpenAI SDK).

Highlights

  • Transcript Processing & AI Enrichment: New scripts (minutesToJson.py, gpt_recommendations.py, contextual_enrichment.py) are added/modified to parse meeting transcripts, extract topics, action items, and generate code/research recommendations using OpenAI and external sources (arXiv).
  • Codebase Analysis & Matching: A new Transcript_to_code.py script is introduced to analyze local or GitHub code repositories, extract functions, and semantically match them to topics discussed in the meeting transcript using TF-IDF and AI-generated reasoning.
  • AI Assistant Integration: The new cursor.py script provides functionality to analyze a codebase for potential improvements (complexity, type hints, docs, etc.) and generate structured recommendations formatted for AI assistants like Cursor and GitHub Copilot.
  • GitHub Collaboration Analysis: A new github_Auth.py script is added to interact with the GitHub API, analyze commit history, map file contributions to authors, detect potential conflicts, and provide insights into collaboration patterns.
  • Automated Calendar Scheduling: Scripts (parser.py, generate_meeting_payloads.py, send_payloads.py) are added/modified to extract structured tasks from meeting summaries, use AI to identify necessary coordination meetings between individuals, generate Google Calendar event payloads, schedule events via the Google Calendar API (handling conflicts), and send email reminders.
  • Utility Scripts & Data: Several utility scripts (jsontoexcel.py, match_names_to_emails.py, send_test_email.py, run_pipeline.py, run_pipeline_and_schedule.py, sample.py) and data files (name_email_map.json, calendar_payloads.json, github_collaboration_recommendations.json, github_commit_summary.json, output/sent_cache.json) are added or updated to support the new pipeline steps, data handling, and testing.

Changelog

Click here to see the changelog
  • Email Integration/Email.py
    • Added a new main script to sequentially run other Python scripts in the pipeline.
  • Email Integration/Transcript_to_code.py
    • Added a new script for matching transcript topics to code functions using AI (TF-IDF, ChatGPT) and extracting code from local or GitHub repositories.
    • Includes classes for TranscriptTopic, CodeBlock, CodeMatch, TranscriptCodeMatcher, GitHubCodeExtractor, and LocalCodeExtractor.
    • Implements logic for extracting topics, computing similarity, generating match reasoning, and creating a formatted report.
  • Email Integration/code_specific_recommendations.py
    • Added a new script to generate code library and optimization recommendations based on meeting summaries using GPT-4.
  • Email Integration/contextual_enrichment.py
    • Added a new script to enrich meeting topics by fetching relevant arXiv papers and generating implementation suggestions using GPT-4.
    • Includes functionality to fetch GitHub tutorial links.
  • Email Integration/cursor.py
    • Added a new script to analyze a Python codebase for issues and opportunities (complexity, type hints, docs, async, tests).
    • Generates structured recommendations for AI assistants like Cursor and GitHub Copilot.
    • Includes functions to export recommendations as Cursor rules, Copilot prompts, and VS Code tasks.
  • Email Integration/github_Auth.py
    • Added a new script to interact with the GitHub API.
    • Analyzes commit history to map file contributions to authors.
    • Detects potential conflicts and computes collaboration overlaps.
    • Provides role-based collaboration analysis and prints insights.
  • Email Integration/jsontoexcel.py
    • Added a new script to parse the markdown report generated by Transcript_to_code.py and convert it into an Excel file using pandas.
  • Email Integration/minutesToJson.py
    • Modified to orchestrate the execution of fullpipeline.py and subsequent enrichment steps.
    • Parses the output markdown, groups data by speaker, and calls functions from gpt_recommendations.py, contextual_enrichment.py, and code_specific_recommendations.py for enrichment.
    • Saves normalized, grouped, and enriched data to JSON files, including a final combined payload.
  • Email Integration/sample.py
    • Added a simple script to list Google Generative AI models, likely for testing API access.
  • calendarAuth.py
    • Updated the redirect_uris in the Google OAuth flow configuration to http://localhost:8081.
  • calendar_payloads.json
    • File content reset to an empty JSON array [].
  • fullpipeline.py
    • Added a check (lines 679-682) to ensure a Panopto URL is provided as a command-line argument and exits if missing.
  • generate_meeting_payloads.py
    • Modified to use the new OpenAI SDK (client.chat.completions.create).
    • Loads the name-email map from name_email_map.json.
    • Reads the latest meeting summaries markdown and uses GPT-4 to identify potential collaboration meetings and generate calendar event payloads.
    • Saves the generated payloads to output/calendar_payloads.json.
  • github_collaboration_recommendations.json
    • File content reset to an empty JSON object {}.
  • github_commit_summary.json
    • File content updated with sample commit summary data for 'Sihcaep'.
  • match_names_to_emails.py
    • Added a new script to read meeting summaries markdown, extract names using regex, and match them against the name_email_map.json.
  • name_email_map.json
    • File content updated with a large JSON object mapping numerous names to email addresses.
  • output/calendar_payloads.json
    • File content updated with a sample calendar event payload for a collaboration meeting.
  • output/sent_cache.json
    • File content reset to an empty JSON array [].
  • parser.py
    • Modified to read meeting summaries markdown (from argument or latest output file).
    • Extracts structured tasks starting with '- [ ]'.
    • De-duplicates tasks based on name and task description.
    • Uses GPT-4 to determine if pairs of individuals with tasks should coordinate.
    • Adds a check to prevent self-pairing.
    • Generates calendar event payloads for recommended coordination meetings and saves them to output/calendar_payloads.json.
  • parser12.py
    • Modified (similar to parser.py) to read markdown, extract tasks, and use GPT-4 for coordination pairing and payload generation.
    • Includes debug print statements.
  • run_pipeline.py
    • Added a new script to orchestrate the full pipeline execution: runs fullpipeline.py, waits for markdown output, runs parser.py, and then runs send_payloads.py.
  • run_pipeline_and_schedule.py
    • Added a new script providing two modes for running the pipeline: using a local test markdown file or running the full Panopto pipeline (fullpipeline.py -> generate_meeting_payloads.py).
    • Runs send_payloads.py after payload generation.
  • send_payloads.py
    • Modified to load calendar event payloads from output/calendar_payloads.json.
    • Uses Google Calendar API (via OAuth) to schedule events, checking attendee free/busy status.
    • Implements a fallback mechanism to reschedule if conflicts are found.
    • Includes RSVP polling and sends email reminders using SMTP for attendees who haven't responded.
    • Normalizes 'title' key to 'summary' in input event data.
    • Requires sender email and app password for reminders.
  • send_test_email.py
    • Added a simple script to send a test email using SMTP, likely for verifying email sending functionality.
  • speaker_summary_utils.py
    • Minor updates to compute_text_similarity, enhance_speaker_tracking, generate_enhanced_speaker_summary_html, generate_enhanced_speaker_summary_markdown, and generate_speaker_summaries_data.
    • Removed openai.api_key = api_key calls, likely adapting to the new OpenAI SDK usage.
    • Minor whitespace adjustments.
  • xlsx2html.py
    • Removed openai.api_key = api_key from summarize_batch function, likely adapting to the new OpenAI SDK usage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality for email integration, including transcript-to-code matching, GitHub analysis, and calendar scheduling based on meeting summaries. The code is ambitious and attempts to integrate several complex components.

However, the review identified several critical and high-severity issues, particularly concerning security (hardcoded credentials), correctness (logic errors, fragile parsing, hardcoded paths), and maintainability (hardcoded configuration). These issues need to be addressed before the code can be considered for merging.

The code also contains medium-severity issues related to robustness, flexibility, and adherence to standard practices (like using async correctly). Addressing these would significantly improve the quality and reliability of the system.

For Python code, I've referenced common practices, aligning with principles found in PEP 8.

Summary of Findings

  • Hardcoded Sensitive Credentials: Email sender credentials (email address and app password) are hardcoded in send_payloads.py and send_test_email.py. This is a critical security vulnerability and should be addressed immediately by loading these from environment variables or a secure configuration source.
  • Hardcoded Local File Path: The input file path in jsontoexcel.py is hardcoded to a specific path on a local machine, making the script unusable elsewhere. This should be provided via command-line arguments.
  • Critical Logic Error in parser12.py: The parsing loop in parser12.py is incorrectly structured, causing it to only process the last action item found in the markdown file instead of all of them.
  • Fragile Regex Parsing: Several scripts (Transcript_to_code.py, cursor.py, jsontoexcel.py, parser.py, parser12.py) rely on regular expressions to parse structured or semi-structured text (code syntax, markdown reports, action item lines). This approach is fragile and highly susceptible to breaking if the input format changes slightly. Using dedicated parsing libraries or processing structured data directly would be more robust.
  • Synchronous I/O in Async Context: The TranscriptCodeMatcher class in Transcript_to_code.py is async but uses the synchronous requests library for GitHub API calls, which blocks the event loop and negates the benefits of async programming. Use an async-compatible HTTP library like aiohttp or httpx.
  • Hardcoded Configuration Values: Several scripts contain hardcoded configuration values such as GitHub repository details, author roles, API models/temperatures, preferred meeting times, redirect URIs, time zones, and directory paths. These should be made configurable via environment variables or a dedicated configuration file to improve maintainability and reusability.
  • Logic Error in fullpipeline.py: Redundant and conflicting logic for parsing the Panopto URL argument exists in fullpipeline.py.
  • Limited Name Parsing: The regex for extracting names in match_names_to_emails.py and the name splitting logic in parser.py/parser12.py are limited and may not handle all name formats or lists of names correctly.
  • Basic Rate Limiting: The GitHub API rate limiting implementation in Transcript_to_code.py is basic and could be improved for better robustness and efficiency.
  • Missing OpenAI API Error Handling: Some functions calling the OpenAI API (code_specific_recommendations.py, contextual_enrichment.py) lack explicit error handling for API failures or invalid responses.
  • Simplified Dependency Tracking: The function dependency tracking in cursor.py is a simplified string-based approach that is not fully accurate.

Merge Readiness

This pull request introduces valuable features but contains critical and high-severity issues, particularly regarding security and correctness. The hardcoded credentials, local file paths, and logic errors must be fixed before merging. The reliance on fragile regex parsing and the incorrect use of synchronous I/O in async contexts also pose significant risks to the system's reliability and performance. I am unable to approve this pull request. Please address the identified issues and request reviews from other maintainers.

Comment on lines +5 to +6
from_email = "kamchettysadhika10@gmail.com"
app_password = "dewh lzfu ztee uoum" # Replace with your 16-digit app password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Hardcoding sensitive credentials like email addresses and app passwords directly in the source code is a critical security vulnerability. These credentials should be loaded from environment variables or a secure configuration management system and never committed to version control.

Suggested change
from_email = "kamchettysadhika10@gmail.com"
app_password = "dewh lzfu ztee uoum" # Replace with your 16-digit app password
from_email = os.getenv("TEST_SENDER_EMAIL")
app_password = os.getenv("TEST_APP_PASSWORD") # Replace with your 16-digit app password

Comment on lines +135 to +136
SENDER_EMAIL = "your_email@gmail.com" # <--- CHANGE THIS
APP_PASSWORD = "your_app_password" # <--- AND THIS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Hardcoding sensitive credentials like email addresses and app passwords directly in the source code is a critical security vulnerability. These credentials should be loaded from environment variables or a secure configuration management system and never committed to version control.

Suggested change
SENDER_EMAIL = "your_email@gmail.com" # <--- CHANGE THIS
APP_PASSWORD = "your_app_password" # <--- AND THIS
SENDER_EMAIL = os.getenv("SENDER_EMAIL") # <--- Load from environment variable
APP_PASSWORD = os.getenv("APP_PASSWORD") # <--- Load from environment variable

Comment on lines +38 to +42
for line in f:
if line.strip().startswith("- [ ]"):
match = re.match(r"- \[ \] ([\w\s@.]+?)\s+to\s+(.*)", line.strip())
print("[DEBUG] Line:", line.strip())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The if match: block is outside the for line in f: loop. This means that the code to extract names and tasks (lines 43-56) will only execute after the loop finishes, and it will only process the match object from the last line in the file that matched the - [ ] pattern. This is a critical logic error that prevents the script from processing all the action items in the markdown file.

with open(latest_md, encoding="utf-8") as f:
    for line in f:
        if line.strip().startswith("- [ ]"):
            match = re.match(r"- \[ \] ([\w\s@.]+?)\s+to\s+(.*)", line.strip())
            print("[DEBUG] Line:", line.strip())

            if match:
                names = match.group(1).strip().split(" and ")
                task = match.group(2).strip().rstrip(".")
                for name in names:
                    name = name.strip()
                    email = name_to_email.get(name)
                    if email:
                        structured.append({
                            "name": name,
                            "email": email,
                            "task": task
                        })
                    else:
                        print(f"[SKIP] Name '{name}' not found in name_email_map")




# Use OpenAI to group or pair tasks

Comment on lines +679 to +682
if len(sys.argv) < 2:
print("❌ Error: Panopto URL not provided.")
sys.exit(1)
panopto_url = sys.argv[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These lines check sys.argv directly to see if a URL was provided and exit if not. This duplicates and conflicts with the argparse logic (lines 676-678) which already handles the optional url argument and prompts the user if it's missing. The argparse approach is preferred, so these lines should be removed.

    url = args.url
    if not url:
        url = input("Enter Panopto video URL: ")
    
    run_pipeline_from_url(

Comment on lines +16 to +37
def generate_code_recommendations(summary_text):
prompt = f"""Given the following contributions and action items, provide:
6. Libraries that can be used to implement the new features
7. Libraries that can be used to implement the new optimizations
8. Libraries that can be used to implement the new libraries
9. Libraries that can be used to implement the new research papers and articles

Summary:
\"\"\"
{summary_text}
\"\"\"

Respond in JSON format with keys: "libraries", "refactoring", "implementation".
"""

response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3,
)

return response.choices[0].message.content.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The generate_code_recommendations function calls the OpenAI API but lacks explicit error handling for potential API failures (network issues, invalid API key, rate limits, etc.) or cases where the API returns non-JSON content. While the calling code might have some handling, it's good practice to include try...except blocks around external API calls within the function itself to make it more robust.

Comment on lines +35 to +36
"redirect_uris": ["http://localhost:8081"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirect URI (http://localhost:8081) and port (8080) for the OAuth flow are hardcoded. These values might need to change depending on the environment or deployment setup. Consider making them configurable via environment variables or a configuration file.

fbq = {
"timeMin": start,
"timeMax": end,
"timeZone": "America/New_York",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The time zone ("America/New_York") for the FreeBusy query is hardcoded. This should ideally be configurable, perhaps loaded from environment variables or determined based on the user's system or preferences, to ensure accurate scheduling for attendees in different time zones.

Comment on lines +531 to +543
def _find_function_dependencies(self, func: Dict) -> List[str]:
"""Find dependencies for a function"""
# This is a simplified version - in practice, you'd do AST analysis
dependencies = []
content = func["content"].lower()

# Look for function calls in the same file
for other_func in self.code_analysis["functions"]:
if other_func["file_path"] == func["file_path"] and other_func["name"] != func["name"]:
if other_func["name"].lower() in content:
dependencies.append(other_func["name"])

return dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _find_function_dependencies method is noted as a simplified version. Relying on simple string searching (other_func["name"].lower() in content) is not accurate for identifying code dependencies. It can produce false positives (e.g., a function name appearing in a comment or string literal) and miss dependencies (e.g., indirect calls, method calls on objects). As mentioned in the comment, a proper AST analysis is needed for accurate dependency tracking.

"metrics": {}
}

exclude_dirs = {"venv", ".venv", "__pycache__", "node_modules", "site-packages", ".git", ".local"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of excluded directories (exclude_dirs) is hardcoded. It would be more flexible and maintainable to make this list configurable, perhaps via the AIAssistantConfig dataclass or environment variables, allowing users to customize which directories are scanned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants