Gmail to Google Sheet Automation

Author: Yuvraj Singh

Project Overview

A python automation script that integrates with Gmail API and Google Sheets API to read unread emails from a Gmail inbox and marks them as read. It extracts the Sender, Subject, Date and Content and append to the sheet.

High-Level Architecture

Project Structure

gmail-to-sheets/
│
├── src/
│   ├── gmail_service.py     
│   ├── sheets_service.py   
│   ├── email_parser.py   
│   └── main.py         
│
├── credentials/
│   ├── credentials.json    
│   └── token.json                 
│              
| 
├── proof/
│   ├── 01_Unread_gmail_before.png
│   ├── 02_gmail_authentication.png
│   ├── 02_2_authentication.png
│   ├── 02_3_allow.png
│   ├── 02_4_oauth_finished.png
│   ├── 03_mainpy_logs.png>
│   ├── 04_sheet_after.png
│   ├── 05_Read_mails_after.png
│   ├── test_01.png
│   ├── test_02.png
│   └── high-level-diagram.png
│
├── .gitignore        
├── config.py             
├── requirements.txt    
├── README.md            
└── state.json

Setup Instructions

1. Prerequisites

Python 3.8 or higher
A Google Cloud Project with Gmail API and Google Sheets API enabled.

2. Installation

Clone the repository:

git clone https://github.com/uv-goswami/gmail-to-sheet
cd gmail-to-sheet

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Google Cloud Configuration:
1. Go to Google Cloud Console
2. Create a new project: "Gmail-Sheets-Automation"
3. Enable APIs:
  - Gmail API
  - Google Sheets API
4. Create OAuth 2.0 Credentials:
  - Go to Credentials -> Create Credentials -> OAuth Client ID
  - Application type: Desktop app
  - Download credentials.json
5. Place credentials.json in credentials/ folder
6. Create a new Google Sheet
7. Create a .env file in the root directory:
```
SPREADSHEET_ID=your_spreadsheet_id_here
SHEET_NAME=Sheet1
```

3. Running the Script

Run the main entry point:

python3 src/main.py

On the first run, a browser window will open asking you to log in and authorize the app.

Design Decision & Logic

OAuth 2.0 Authentication OAuth 2.0 is used to grant third-party applications limited access to users data hosted on a service provider(in our case Google Cloud) without requiring the user to share their passwords.I used google-auth-oauthlib library to handle OAuth flow. Because this is secure and simplifies the authentication and authorization process. It generates a 'token.json' after the first login, then it refreshes qutomatically for further runs.
Duplicate Preventation To prevent duplicates I implemented two layers of protection.
1. API only requests UNREAD emails. Since our script marks email as read after processing an Unread email. This method of only requesting UNREAD emails prevents duplication.
2. I also implemented ID-based filterin. Every Gmail has a unique id. Before processing the script checks if the id exists in state.json. If yes then the email is skipped else the email is processed.
State Persistence I chose a simple JSON file - state.json to store the ID of processed emails rather than a full database as JSON file is lightweight.This allows for O(1) time complexity regardless of how many emails are processed.

Limitations

Local State: If state.json is deleted, the script relies on Unreead label. If you have old unread emails, they might be duplicated.
Attachment Handling: Currently the script ignores attachments and images only extracting text content.
Content truncation: To keep the sheet readable the mail content is truncated to the first 500 characters.
Rate Limiting: The script fetches maximum of 50 emails per run to preserve API quotas.

Challenges

OAuth Token Expiration: OAuth token expires every 1 hour. Running script after expiration causes authentication failure. Then I have to reauthenticate.

Solution: I implemented automatic token refresh logic using google.auth.transport.requests.Request object to check for token expiration. If expired the script automatically uses refresh_token to request new Access Token from Google without requiring manual browser login from me.
Secure Configuration: Hardcoding sensitive data like SPREADSHEET_ID or file paths into code is a major security risk and makes the code hard to deploy.

Solution: I adopted the 12 Factor App principle for configuration
- I used python-dotenv to load configuration variables.
- I stored sensitive ID's into .env file
- I added .env and credentials.json to .gitignore
- I used os.path.join(Base_DIR, .....) to ensure the path works on any machine.
Rate Limiting: Google Cloud API has a quota for API. Fetching every single email in a large inbox one by one would hit limits quickly, causing the script with 429 Too Many Requests.

Solution: I implemented three stretigies:
- In gmail.service.py, I used the maxResults parameter that is set to 50. This prevents the script from trying to pull all emails at once.
- I implemented State based filtering by checking state.json before calling get_email_details() the script avoids redudant API calls.
- The loop processes emails individually and saves the state immediately. If we hit rate limit(or loose network), the next run starts exactly where it left without refetching already finished emails.
Email with HTML: Emails are often multipart (Text + HTML). Initially the parser would fail on emails that were HTML.

Solution: I implemented a recursive parsing strategy using BeautifulSoup.
- The EmailParser class inspects the payload.
- It prioritizes test/plain for accuracy. If text/plain is missing, it falls back to text/html, decodes the Base64 data and uses BeautifulSoup to strip tags and extract readable text.

Proof of Execution

Gmail inbox with unread emails
Gmail Authentication
Script Execution and Logs

Google Sheet after Populating
Inbox after execution

Testing

Duplicate Preventation

Run script twice consecutively

python3 src/main.py

Output:

New Emails After Processing Send yourself a new mail and Re-Run the script

python3 src/main.py

Output:

📜 License

This project is created for educational purposes as part of an internship assignment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gmail to Google Sheet Automation

Project Overview

High-Level Architecture

Project Structure

Setup Instructions

1. Prerequisites

2. Installation

3. Running the Script

Design Decision & Logic

Limitations

Challenges

Proof of Execution

Testing

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
proof		proof
src		src
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

uv-goswami/Gmail_to_sheet

Folders and files

Latest commit

History

Repository files navigation

Gmail to Google Sheet Automation

Project Overview

High-Level Architecture

Project Structure

Setup Instructions

1. Prerequisites

2. Installation

3. Running the Script

Design Decision & Logic

Limitations

Challenges

Proof of Execution

Testing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages