Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions components/spider/actions/scrape-new-page/scrape-new-page.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import spider from "../../spider.app.mjs";

export default {
key: "spider-scrape-new-page",
name: "Scrape New Page",
description: "Initiates a new page scrape (crawl). [See the documentation](https://spider.cloud/docs/api#crawl-website)",
version: "0.0.{{ts}}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version: "0.0.{{ts}}",
version: "0.0.1",

type: "action",
props: {
spider,
infoBox: {
type: "alert",
alertType: "info",
content: "See [the Spider documentation](https://spider.cloud/docs/api#crawl-website) for information on limits and best practices.",
},
url: {
type: "string",
label: "URL",
description: "The URI resource to crawl, e.g. `https://spider.cloud`. This can be a comma split list for multiple urls.",
},
limit: {
type: "integer",
label: "Limit",
description: "The maximum amount of pages allowed to crawl per website. Default is 0, which crawls all pages.",
optional: true,
},
Comment on lines +21 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a default value to the limit prop to ensure it defaults to 0.

The limit prop is optional, but according to the description, it defaults to 0, which crawls all pages. Without specifying a default value in the prop definition, this.limit may be undefined when the action runs. Adding a default property will ensure it defaults to 0 when not specified by the user.

Apply this diff to set the default value for limit:

      limit: {
        type: "integer",
        label: "Limit",
        description: "The maximum amount of pages allowed to crawl per website. Default is 0, which crawls all pages.",
        optional: true,
+       default: 0,
      },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
limit: {
type: "integer",
label: "Limit",
description: "The maximum amount of pages allowed to crawl per website. Default is 0, which crawls all pages.",
optional: true,
},
limit: {
type: "integer",
label: "Limit",
description: "The maximum amount of pages allowed to crawl per website. Default is 0, which crawls all pages.",
optional: true,
default: 0,
},

storeData: {
type: "boolean",
label: "Store Data",
description: "Decide whether to store data. Default is `false`.",
optional: true,
},
Comment on lines +27 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a default value to the storeData prop to ensure it defaults to false.

The storeData prop is optional, but the description indicates that the default value is false. Without a default value in the prop definition, this.storeData may be undefined, which could lead to unexpected behavior when passed to the API. Adding a default property will ensure it defaults to false when not specified.

Apply this diff to set the default value for storeData:

      storeData: {
        type: "boolean",
        label: "Store Data",
        description: "Decide whether to store data. Default is `false`.",
        optional: true,
+       default: false,
      },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
storeData: {
type: "boolean",
label: "Store Data",
description: "Decide whether to store data. Default is `false`.",
optional: true,
},
storeData: {
type: "boolean",
label: "Store Data",
description: "Decide whether to store data. Default is `false`.",
optional: true,
default: false,
},

},
async run({ $ }) {
const content = await this.spider.initiateCrawl({
$,
data: {
url: this.url,
limit: this.limit,
store_data: this.storeData,
},
});
$.export("$summary", `Successfully scraped URL ${this.url}`);
return content;
},
};
7 changes: 5 additions & 2 deletions components/spider/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@pipedream/spider",
"version": "0.0.1",
"version": "0.1.0",
"description": "Pipedream Spider Components",
"main": "spider.app.mjs",
"keywords": [
Expand All @@ -11,5 +11,8 @@
"author": "Pipedream <[email protected]> (https://pipedream.com/)",
"publishConfig": {
"access": "public"
},
"dependencies": {
"@pipedream/platform": "^3.0.3"
}
}
}
29 changes: 25 additions & 4 deletions components/spider/spider.app.mjs
Original file line number Diff line number Diff line change
@@ -1,11 +1,32 @@
import { axios } from "@pipedream/platform";

export default {
type: "app",
app: "spider",
propDefinitions: {},
methods: {
// this.$auth contains connected account data
authKeys() {
console.log(Object.keys(this.$auth));
_baseUrl() {
return "https://api.spider.cloud";
},
Comment on lines +8 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using an environment variable for the base URL.

While the implementation is correct, hardcoding the base URL might make it difficult to change in the future, especially if there are different environments (e.g., staging, production).

Consider using an environment variable:

 _baseUrl() {
-  return "https://api.spider.cloud";
+  return process.env.SPIDER_API_BASE_URL || "https://api.spider.cloud";
 },

This change would allow for easier configuration across different environments while maintaining the current URL as a default.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_baseUrl() {
return "https://api.spider.cloud";
},
_baseUrl() {
return process.env.SPIDER_API_BASE_URL || "https://api.spider.cloud";
},

async _makeRequest({
$ = this, path = "/", headers, ...otherOpts
} = {}) {
return axios($, {
...otherOpts,
url: this._baseUrl() + path,
headers: {
...headers,
"Authorization": `Bearer ${this.$auth.api_key}`,
"Content-Type": "application/json",
},
});
},
async initiateCrawl(args) {
return this._makeRequest({
method: "POST",
path: "/crawl",
...args,
});
},
},
};
};
5 changes: 4 additions & 1 deletion pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading