CorieW
diff --git a/‎.gitignore
Lines changed: 2 additions & 0 deletions b/‎.gitignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎CHANGELOG.md
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 3 additions & 0 deletions
diff --git a/‎POSTINSTALL.md
Lines changed: 105 additions & 0 deletions b/‎POSTINSTALL.md
Lines changed: 105 additions & 0 deletions
diff --git a/‎PREINSTALL.md
Lines changed: 40 additions & 0 deletions b/‎PREINSTALL.md
Lines changed: 40 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 22 additions & 0 deletions b/‎README.md
Lines changed: 22 additions & 0 deletions
diff --git a/‎_emulator/.firebaserc
Lines changed: 6 additions & 0 deletions b/‎_emulator/.firebaserc
Lines changed: 6 additions & 0 deletions
diff --git a/‎_emulator/.gitignore
Lines changed: 66 additions & 0 deletions b/‎_emulator/.gitignore
Lines changed: 66 additions & 0 deletions
diff --git a/‎_emulator/extensions/firestore-send-email.env.local
Lines changed: 1 addition & 0 deletions b/‎_emulator/extensions/firestore-send-email.env.local
Lines changed: 1 addition & 0 deletions
diff --git a/‎_emulator/firebase.json
Lines changed: 40 additions & 0 deletions b/‎_emulator/firebase.json
Lines changed: 40 additions & 0 deletions
diff --git a/‎_emulator/firestore.indexes.json
Lines changed: 4 additions & 0 deletions b/‎_emulator/firestore.indexes.json
Lines changed: 4 additions & 0 deletions
@@ -0,0 +1,2 @@
+.gcloudignore
+!emulator-params.env
@@ -0,0 +1,3 @@
+## Version 0.1.0
+
+Initial release of the extension.
@@ -0,0 +1,105 @@
+# Post-Installation Guide
+
+After installing the extension, follow this guide to configure scraping tasks and manage extracted data. Below you'll find detailed instructions, document structures, and examples.
+
+---
+
+## **Setting Up a Task**
+Create a document in your tasks collection **`${param:SCRAPE_COLLECTION}`** to define a scraping task. 
+
+### **Task Document Structure**
+| Field       | Type             | Description                                                                 |
+|-------------|------------------|-----------------------------------------------------------------------------|
+| `url`       | string           | **Required.** Target URL to scrape (e.g., `"https://example.com"`).        |
+| `queries`   | array of objects | **Required.** List of queries to extract data from the HTML content.       |
+
+### **1. `queries` Configuration**
+Each query in the `queries` array narrows down elements from the HTML. Queries execute **in sequence**, with each subsequent query applied to the results of the previous one.
+
+#### **1.1. Query Object**
+| Field    | Type   | Description                                                                                   |
+|----------|--------|-----------------------------------------------------------------------------------------------|
+| `id`     | string | **Required.** Unique identifier for the query.                                                              |
+| `type`   | string | **Required.** Selector type. Supported values: `id`, `class`, `tag`, `attribute`, `text`, `xpath`.          |
+| `value`  | string | **Required.** Value for the selector (see examples below).                                                  |
+| `target` | string (optional) | What to extract from the selected elements. Supported values: `html`, `text`, `attribute`. `html` is set by default |
+| `attr`   | string (optional) | Attribute name to extract when `target` is set to `attribute`.                     |
+
+#### **1.2. Examples by Query Type**
+| Type         | `value` Example               | Description                                      |
+|--------------|-------------------------------|--------------------------------------------------|
+| **`id`**     | `"header"`                    | Select element with ID `#header`.                |
+| **`class`**  | `"menu-item"`                 | Select elements with class `.menu-item`.         |
+| **`tag`**    | `"a"`                         | Select all `<a>` tags.                           |
+| **`attribute`** | `"href"` or `"[data-role='button']"` | Select elements with the `href` attribute or matching `data-role="button"`. |
+| **`xpath`**  | `"//div[@class='content']"`   | Select elements using an XPath expression.       |
+| **`selector`** | `"#header > h1"`              | Select elements using a CSS selector.            |
+
+#### **1.3. Examples By Target Type**
+| Target          | Description                                                                                   |
+|-----------------|-----------------------------------------------------------------------------------------------|
+| **`html`**      | Extracts the HTML content of the selected elements.                                           |
+| **`inner`**     | Extracts the inner HTML content of the selected elements.                                      |
+| **`text`**      | Extracts the text content of the selected elements.                                           |
+| **`attribute`** | Extracts the value of the specified attribute from the selected elements.                     |
+
+
+### **Example Task Document (Before Processing)**
+```json
+{
+  "url": "https://example.com",
+  "queries": [
+    {
+      "id": "title",
+      "type": "xpath",
+      "value": "//title",
+      "target": "text"
+    },
+    {
+      "id": "description",
+      "type": "class",
+      "value": "description"
+    },
+    {
+      "id": "links",
+      "type": "tag",
+      "value": "a",
+      "target": "attribute",
+      "attr": "href"
+    }
+  ]
+}
+```
+
+### **Example Data Document (After Processing)**
+```json
+{
+  "url": "https://example.com",
+  "queries": [
+    {
+      "id": "title",
+      "type": "xpath",
+      "value": "//title",
+      "target": "text"
+    },
+    {
+      "id": "description",
+      "type": "class",
+      "value": "description"
+    },
+    {
+      "id": "links",
+      "type": "tag",
+      "value": "a",
+      "target": "attribute",
+      "attr": "href"
+    }
+  ],
+  "data": {
+    "title": "Example Domain",
+    "description": "<p>This domain is for use in illustrative examples...</p>",
+    "links": ["https://www.iana.org/domains/example", "https://www.iana.org/domains/reserved"]
+  },
+  "timestamp": "2023-01-01T00:00:00Z"
+}
+```
@@ -0,0 +1,40 @@
+# Pre-Installation Guide
+
+This guide will help you install and configure this extension in your Firebase project.
+
+## Billing
+To install an extension, your project must be on the [Blaze (pay as you go) plan](https://firebase.google.com/pricing)
+
+## Setup
+
+### **Step 1: Install the Extension**
+
+You can install this extension locally by running the following commands:
+
+   ```bash
+   git clone https://github.com/CorieW/firestore-web-scraper.git
+   firebase ext:install ./firestore-web-scraper
+   ```
+
+In the future, this extension could be published to the Firebase Extensions registry for easier installation.
+
+### **Step 2: Configure the Extension**
+
+After installing the extension, you need setup the configuration in your Firebase project. The configuration includes the following parameters:
+
+| Parameter       | Description             |
+|-----------------|-------------------------|
+| `scrapeCollection` | The collection in which scraping tasks are stored and processed. Each document in this collection should contain the details of the task to be performed. The same document will be updated with the results of the scraping task. |
+
+
+**Example Configuration:**
+
+```json
+{
+  "scrapeCollection": "tasks",
+}
+```
+
+## GitHub Repository
+
+The source code for this extension is available on GitHub: [firestore-web-scraper](https://github.com/CorieW/firestore-web-scraper).
@@ -0,0 +1,22 @@
+# Firestore Web Scraper
+
+This is a web scraper that is configured via Firestore. You can create scraping tasks in the form of documents in a Firestore collection. The scraper will then scrape the website and extract the data based on the queries you define in the task document.
+
+## Usage
+
+You can read PREINSTALL.md and POSTINSTALL.md for more detailed instructions on how to use this extension.
+
+## Installation
+
+You can install this extension locally by running the following commands:
+
+   ```bash
+   git clone https://github.com/CorieW/firestore-web-scraper.git
+   firebase ext:install ./firestore-web-scraper
+   ```
+
+In the future, this extension could be published to the Firebase Extensions registry for easier installation.
+
+## Contributing
+
+Contributions are always welcome! If you have an idea for a new feature or a bug fix, please open an issue first to discuss the changes.
@@ -0,0 +1,6 @@
+{
+  "projects": {
+    "default": "demo-test"
+  },
+  "targets": {}
+}
@@ -0,0 +1,66 @@
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+firebase-debug.log*
+firebase-debug.*.log*
+
+# Firebase cache
+.firebase/
+
+# Firebase config
+
+# Uncomment this if you'd like others to create their own Firebase project.
+# For a team working on the same Firebase project(s), it is recommended to leave
+# it commented so all members can deploy to the same project(s) in .firebaserc.
+# .firebaserc
+
+# Runtime data
+pids
+*.pid
+*.seed
+*.pid.lock
+
+# Directory for instrumented libs generated by jscoverage/JSCover
+lib-cov
+
+# Coverage directory used by tools like istanbul
+coverage
+
+# nyc test coverage
+.nyc_output
+
+# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
+.grunt
+
+# Bower dependency directory (https://bower.io/)
+bower_components
+
+# node-waf configuration
+.lock-wscript
+
+# Compiled binary addons (http://nodejs.org/api/addons.html)
+build/Release
+
+# Dependency directories
+node_modules/
+
+# Optional npm cache directory
+.npm
+
+# Optional eslint cache
+.eslintcache
+
+# Optional REPL history
+.node_repl_history
+
+# Output of 'npm pack'
+*.tgz
+
+# Yarn Integrity file
+.yarn-integrity
+
+# dotenv environment variables file
+.env
@@ -0,0 +1 @@
+SCRAPE_COLLECTION=tasks
@@ -0,0 +1,40 @@
+{
+  "extensions": {
+    "firestore-web-scraper": "../"
+  },
+  "storage": {
+    "rules": "storage.rules"
+  },
+  "emulators": {
+    "hub": {
+      "port": 4000
+    },
+    "storage": {
+      "port": 9199
+    },
+    "auth": {
+      "port": 9099
+    },
+    "pubsub": {
+      "port": 8085
+    },
+    "functions": {
+      "port": 5001
+    },
+    "ui": {
+      "enabled": true
+    },
+    "firestore": {
+      "host": "127.0.0.1",
+      "port": 8080
+    }
+  },
+  "functions": {
+    "port": 5002,
+    "source": "functions"
+  },
+  "firestore": {
+    "rules": "firestore.rules",
+    "indexes": "firestore.indexes.json"
+  }
+}
@@ -0,0 +1,4 @@
+{
+  "indexes": [],
+  "fieldOverrides": []
+}
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+.gcloudignore`
	`2`	`+!emulator-params.env`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## Version 0.1.0`
	`2`	`+`
	`3`	`+Initial release of the extension.`
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +{
 +  "indexes": [],
 +  "fieldOverrides": []
 +}