This scraper watches a public Google Sheets file, crawls webpages listed inside it, and updates the sheet with freshly extracted text. Itβs built for teams that want automated content tracking without manually rechecking URLs or copying data back into spreadsheets.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Google Sheets Monitoring Scraper you've just found your team β Let's Chat. ππ
The Google Sheets Monitoring Scraper reads a Google Sheets file, fetches the URLs inside it, extracts text based on column-defined CSS selectors, and updates the sheet when changes occur. Itβs ideal for content monitoring, competitive tracking, structured data extraction, or any workflow where sheet-driven automation is essential.
- Automatically crawl URLs listed in a Google Sheet.
- Extract text using CSS selectors defined directly as column headers.
- Track changes over time with automatic timestamps.
- Build lightweight monitoring systems without backend infrastructure.
| Feature | Description |
|---|---|
| Sheet-Driven Automation | Reads URLs and selectors directly from a public Google Sheets file. |
| Web Content Crawling | Uses axios + cheerio to fetch and parse webpage content. |
| Selector-Based Extraction | Extracts text based on CSS selectors defined in sheet columns. |
| Change Detection | Updates the sheet when extracted values differ from previous ones. |
| Timestamp Logging | Automatically fills an UPDATED column with the latest change time. |
| Dataset Export | Saves updated rows into an exported dataset for external use. |
| Field Name | Field Description |
|---|---|
| url | The webpage URL to be crawled. |
| selectorFields | Any number of columns representing CSS selectors used to extract content. |
| extractedText | The text extracted from each selector. |
| updatedAt | Datetime when new data differs from previous extraction. |
| rowData | Complete row snapshot from the sheet. |
[
{
"url": "https://example.com",
"title": "Example Domain",
"description": "This domain is for use in illustrative examples.",
"updatedAt": "2024-04-20T10:45:00Z",
"rowData": {
"URL": "https://example.com",
"title": "Example Domain",
"description": "This domain is for use in illustrative examples."
}
}
]
Google Sheets Monitoring Scraper/
βββ src/
β βββ main.js
β βββ sheet/
β β βββ sheet_reader.js
β β βββ sheet_writer.js
β βββ crawler/
β β βββ page_fetcher.js
β β βββ content_extractor.js
β βββ utils/
β β βββ diff_checker.js
β β βββ time_formatter.js
β βββ config/
β βββ settings.example.json
βββ data/
β βββ sample_input.json
β βββ sample_output.json
βββ package.json
βββ README.md
- Content Monitoring track changes to website headlines, descriptions, or structured fields.
- Competitive Research follow updates on competitor pages automatically.
- SEO Tracking observe changes to meta descriptions or H1 tags.
- Data Collection use a spreadsheet as a lightweight crawler configuration panel.
- Automation Workflows trigger downstream actions when sheet data changes.
Does it work with private Google Sheets?
Only public Google Sheets URLs are supported.
How do selectors work?
Each column header (except URL and UPDATED) is treated as a CSS selector used to extract text.
Can it detect changes automatically?
Yes, if new extracted text differs, the UPDATED column is refreshed with the current timestamp.
What happens to updated rows?
They are stored in a dataset for export or external processing.
Primary Metric:
Processes hundreds of rows in minutes with lightweight network usage.
Reliability Metric:
Maintains >97% extraction accuracy across well-structured pages.
Efficiency Metric:
Minimizes redundant writes by updating only changed rows.
Quality Metric:
Outputs normalized row data with consistent formatting and timestamps.
