Skip to content

Commit 64852ce

Browse files
theletterfv1vclaude
authored
Add sheet2docs automation script and workflow (#4729)
<!-- Thank you for contributing to the Elastic Docs! 🎉 Use this template to help us efficiently review your contribution. --> ## Summary Fixes elastic/docs-content-internal#654 ## Generative AI disclosure <!-- To help us ensure compliance with the Elastic open source and documentation guidelines, please answer the following: --> 1. Did you use a generative AI (GenAI) tool to assist in creating this contribution? - [X] Yes - [ ] No 2. If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.). Tool(s) and model(s) used: Claude Opus 4.5 in Cursor --------- Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 3b1b10d commit 64852ce

File tree

5 files changed

+748
-0
lines changed

5 files changed

+748
-0
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
name: Sync Google Sheets to CSV (Keyless Auth)
2+
3+
on:
4+
# Scheduled trigger - daily at 2 AM UTC
5+
schedule:
6+
- cron: '0 2 * * *'
7+
8+
# TODO: Remove after testing
9+
push:
10+
branches: ['add-sheet2csv-automation']
11+
# TODO: Remove after testing
12+
push:
13+
branches: ['add-sheet2csv-automation']
14+
15+
# Manual trigger
16+
workflow_dispatch:
17+
inputs:
18+
dry_run:
19+
description: 'Dry run (skip PR creation)'
20+
required: false
21+
default: false
22+
type: boolean
23+
24+
# Required permissions
25+
permissions:
26+
contents: write
27+
pull-requests: write
28+
id-token: write # Required for OIDC token authentication
29+
30+
jobs:
31+
sync-sheet:
32+
runs-on: ubuntu-latest
33+
34+
steps:
35+
- name: Checkout repository
36+
uses: actions/checkout@v6
37+
38+
- name: Set up Python
39+
uses: actions/setup-python@v6
40+
with:
41+
python-version: '3.13'
42+
cache: 'pip'
43+
44+
- name: Install dependencies
45+
run: |
46+
python -m pip install --upgrade pip
47+
pip install -r scripts/sheet2docs/requirements.txt
48+
49+
# Keyless authentication using Workload Identity Federation
50+
- name: Authenticate to Google Cloud
51+
uses: google-github-actions/auth@7c6bc770dae815cd3e89ee6cdf493a5fab2cc093 # v3.0.0
52+
with:
53+
workload_identity_provider: ${{ vars.GCP_WORKLOAD_IDENTITY_PROVIDER }}
54+
service_account: ${{ vars.GCP_SERVICE_ACCOUNT_EMAIL }}
55+
project_id: ${{ vars.GCP_PROJECT_ID }}
56+
access_token_scopes: 'https://www.googleapis.com/auth/spreadsheets.readonly,https://www.googleapis.com/auth/drive.readonly'
57+
58+
# The auth action sets GOOGLE_APPLICATION_CREDENTIALS automatically
59+
- name: Run sync script
60+
env:
61+
GOOGLE_SHEET_URL: ${{ secrets.GOOGLE_SHEET_URL }}
62+
run: |
63+
python scripts/sheet2docs/sync_sheet.py --config scripts/sheet2docs/config.yml --verbose
64+
65+
- name: Create Pull Request
66+
if: github.event.inputs.dry_run != 'true'
67+
uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
68+
with:
69+
token: ${{ github.token }}
70+
branch: automated/sheets-sync
71+
delete-branch: true
72+
title: "Update CSV from Google Sheets"
73+
commit-message: |
74+
Update CSV from Google Sheets
75+
76+
Generated: ${{ github.run_id }}
77+
Workflow run: ${{ github.run_number }}
78+
body: |
79+
## Summary
80+
81+
This PR updates the CSV file with the latest data from Google Sheets.
82+
83+
### Changes
84+
85+
Please review the changes in the Files tab to ensure the data looks correct.
86+
87+
### Next Steps
88+
89+
- [ ] Review the CSV changes
90+
- [ ] Verify data accuracy
91+
- [ ] Merge when ready
92+
93+
---
94+
95+
🤖 Automated update from [sync-sheets workflow](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
96+
add-paths: |
97+
explore-analyze/elastic-inference/models.csv
98+
99+
- name: Upload CSV artifact
100+
if: always()
101+
uses: actions/upload-artifact@v6
102+
with:
103+
name: generated-csv-${{ github.run_number }}
104+
path: explore-analyze/elastic-inference/models.csv
105+
retention-days: 30
106+
if-no-files-found: warn
107+
108+
- name: Generate job summary
109+
if: always()
110+
run: |
111+
CSV_PATH="explore-analyze/elastic-inference/models.csv"
112+
echo "## Sync Summary" >> $GITHUB_STEP_SUMMARY
113+
echo "" >> $GITHUB_STEP_SUMMARY
114+
echo "**Status:** ${{ job.status }}" >> $GITHUB_STEP_SUMMARY
115+
echo "**Dry Run:** ${{ github.event.inputs.dry_run }}" >> $GITHUB_STEP_SUMMARY
116+
echo "" >> $GITHUB_STEP_SUMMARY
117+
118+
if [ -f "$CSV_PATH" ]; then
119+
echo "### Generated Files" >> $GITHUB_STEP_SUMMARY
120+
echo "" >> $GITHUB_STEP_SUMMARY
121+
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
122+
ls -lh "$CSV_PATH" >> $GITHUB_STEP_SUMMARY
123+
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
124+
echo "" >> $GITHUB_STEP_SUMMARY
125+
126+
# Show first few rows as preview
127+
echo "### Preview (first 5 rows)" >> $GITHUB_STEP_SUMMARY
128+
echo "" >> $GITHUB_STEP_SUMMARY
129+
echo "\`\`\`csv" >> $GITHUB_STEP_SUMMARY
130+
head -n 6 "$CSV_PATH" >> $GITHUB_STEP_SUMMARY
131+
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
132+
else
133+
echo "⚠️ No CSV file found" >> $GITHUB_STEP_SUMMARY
134+
fi

scripts/sheet2docs/config.yml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Google Sheets to CSV Configuration
2+
#
3+
# This file defines which Google Sheet to sync and how to transform the data
4+
5+
# Google Sheet source
6+
source:
7+
# Google Sheets URL or spreadsheet ID
8+
# SECURITY: For public repos, use environment variable substitution
9+
# to avoid exposing the sheet URL
10+
#
11+
# Option 1: Environment variable (recommended for public repos)
12+
# Set GOOGLE_SHEET_URL as a GitHub Secret
13+
# sheet_url: "${GOOGLE_SHEET_URL}"
14+
#
15+
# Option 2: Direct value (for testing/private repos)
16+
sheet_url: "${GOOGLE_SHEET_URL}"
17+
18+
# Name of the tab/sheet within the spreadsheet
19+
# Can also use environment variable if needed
20+
tab_name: "Models"
21+
22+
# Column configuration
23+
# Define which columns to include in the CSV and optionally rename them
24+
columns:
25+
- source: "Type"
26+
- source: "Author"
27+
- source: "Name"
28+
- source: "ID"
29+
30+
# Output configuration
31+
output:
32+
# Output CSV filename
33+
filename: "models.csv"
34+
35+
# Output directory (relative to repo root)
36+
directory: "explore-analyze/elastic-inference"
37+
38+
# CSV delimiter (default: comma)
39+
delimiter: ","

scripts/sheet2docs/readme.txt

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
sheet2docs - Google Sheets to CSV Sync
2+
======================================
3+
4+
This automation syncs data from a Google Sheet to a CSV file in the docs-content
5+
repository. It runs daily at 2 AM UTC and creates/updates a pull request when
6+
the sheet data changes.
7+
8+
9+
How it works
10+
------------
11+
12+
1. GitHub Actions runs the sync workflow daily (or manually).
13+
2. The Python script fetches data from the configured Google Sheet.
14+
3. If the CSV has changed, a PR is created or updated on the `automated/sheets-sync` branch.
15+
4. A team member reviews and merges the PR.
16+
5. The updated CSV is available in the repository.
17+
18+
19+
Configuration
20+
-------------
21+
22+
Edit `scripts/sheet2docs/config.yml` to configure:
23+
24+
- source.sheet_url: The Google Sheet URL (uses GOOGLE_SHEET_URL secret)
25+
- source.tab_name: The tab/sheet name within the spreadsheet
26+
- columns: Which columns to include and optionally rename
27+
- output.filename: The output CSV filename
28+
- output.directory: Where to save the CSV (relative to repo root)
29+
30+
31+
Adding or removing columns
32+
--------------------------
33+
34+
Edit the `columns` section in config.yml:
35+
36+
columns:
37+
- source: "Column Name" # Keep original name
38+
- source: "Old Name"
39+
target: "New Name" # Rename in CSV
40+
41+
42+
Changing the output location
43+
----------------------------
44+
45+
Edit the `output` section in config.yml:
46+
47+
output:
48+
filename: "models.csv"
49+
directory: "path/to/output"
50+
51+
Note: You must also update the CSV_PATH in the workflow file
52+
(.github/workflows/sync-sheets-keyless.yml) to match.
53+
54+
55+
Running manually
56+
----------------
57+
58+
1. Go to the Actions tab in GitHub.
59+
2. Select "Sync Google Sheets to CSV (Keyless Auth)".
60+
3. Click "Run workflow".
61+
4. Optionally enable "Dry run" to test without creating a PR.
62+
63+
64+
Required GitHub configuration
65+
-----------------------------
66+
67+
Secrets:
68+
- GOOGLE_SHEET_URL: Full URL of the Google Sheet
69+
70+
Variables:
71+
- GCP_WORKLOAD_IDENTITY_PROVIDER: Workload Identity Provider resource name
72+
- GCP_SERVICE_ACCOUNT_EMAIL: Service account email
73+
- GCP_PROJECT_ID: GCP project ID
74+
75+
76+
Google Sheet setup
77+
------------------
78+
79+
The Google Sheet must be shared with the service account email
80+
(Viewer permission). The service account email is the value of
81+
GCP_SERVICE_ACCOUNT_EMAIL.
82+
83+
84+
Troubleshooting
85+
---------------
86+
87+
"Spreadsheet not found"
88+
- Ensure the sheet is shared with the service account email.
89+
- Check that GOOGLE_SHEET_URL secret is correct.
90+
91+
"Tab 'X' not found"
92+
- Verify the tab name matches exactly (case-sensitive).
93+
94+
"Column 'X' not found"
95+
- Column names must match the sheet headers exactly (case-sensitive).
96+
97+
"Google Sheets API has not been enabled"
98+
- Enable Google Sheets API and Drive API in the GCP project.
99+
100+
For detailed GCP setup, see SETUP-KEYLESS.md.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Google Sheets API
2+
gspread==6.1.2
3+
google-auth==2.35.0
4+
requests>=2.31.0
5+
6+
# Configuration parsing
7+
PyYAML==6.0.2

0 commit comments

Comments
 (0)