-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Adding cloro.dev as LLM scraping tool #4038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rbatista191
wants to merge
8
commits into
crewAIInc:main
Choose a base branch
from
cloro-dev:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
8322291
feat: added cloro API as new tool
rbatista191 1e4d2b4
chore: more optimizations to cloro tool
rbatista191 194c3b5
Merge branch 'main' into main
rbatista191 d64b6be
fix: bugs identified by Codex
rbatista191 8867738
fix: bugs identified by Cursor bot
rbatista191 b901240
feat: added docs for cloro, renamed CloroDev
rbatista191 6fa2ba1
fix: bugs identified by Cursor
rbatista191 035f1be
chore: few more missing additions
rbatista191 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletions
75
lib/crewai-tools/src/crewai_tools/tools/cloro_dev_tool/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| # CloroDevTool | ||
|
|
||
| Use the `CloroDevTool` to search the web or query AI models using the cloro API. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```shell | ||
| pip install 'crewai[tools]' | ||
| ``` | ||
|
|
||
| ## Example | ||
|
|
||
| ```python | ||
| from crewai_tools import CloroDevTool | ||
|
|
||
| # make sure CLORO_API_KEY variable is set | ||
| tool = CloroDevTool() | ||
|
|
||
| result = tool.run(search_query="latest news about AI agents") | ||
|
|
||
| print(result) | ||
| ``` | ||
|
|
||
| ## Arguments | ||
|
|
||
| - `api_key` (str, optional): cloro API key. | ||
| - `engine` (str, optional): The engine to use for the query. Options are `google`, `chatgpt`, `gemini`, `copilot`, `perplexity`, `aimode`. Defaults to `google`. | ||
| - `country` (str, optional): The ISO 3166-1 alpha-2 country code for localized results (e.g., "US", "BR"). For a full list of supported country codes, refer to the [cloro API /v1/countries endpoint](https://docs.cloro.dev/api-reference/endpoint/countries). Defaults to "US". | ||
| - `device` (str, optional): The device type for Google search results (`desktop` or `mobile`). Defaults to "desktop". | ||
| - `pages` (int, optional): The number of pages to retrieve for Google search results. Defaults to 1. | ||
| - `save_file` (bool, optional): Whether to save the search results to a file. Defaults to `False`. | ||
|
|
||
| Get the credentials by creating a [cloro account](https://dashboard.cloro.dev). | ||
|
|
||
| ## Response Format | ||
|
|
||
| The tool returns a structured dictionary containing different fields depending on the selected engine. | ||
|
|
||
| ### Google Engine | ||
|
|
||
| - `organic`: List of organic search results with title, link, snippet, etc. | ||
| - `peopleAlsoAsk`: List of related questions. | ||
| - `relatedSearches`: List of related search queries. | ||
| - `ai_overview`: Google AI Overview data (if available). | ||
|
|
||
| ### LLM Engines (ChatGPT, Perplexity, Gemini, etc.) | ||
|
|
||
| - `text`: The main response text from the model. | ||
| - `sources`: List of sources cited by the model (if available). | ||
| - `shopping_cards`: List of product/shopping cards with prices and offers (if available). | ||
| - `hotels`: List of hotel results (if available). | ||
| - `places`: List of places/locations (if available). | ||
| - `videos`: List of video results (if available). | ||
| - `images`: List of image results (if available). | ||
| - `related_queries`: List of related follow-up queries (if available). | ||
| - `entities`: List of extracted entities (if available). | ||
|
|
||
| ## Advanced example | ||
|
|
||
| Check out the cloro [documentation](https://docs.cloro.dev/api-reference/introduction) to get the full list of parameters. | ||
|
|
||
| ```python | ||
| from crewai_tools import CloroDevTool | ||
|
|
||
| # make sure CLORO_API_KEY variable is set | ||
| tool = CloroDevTool( | ||
| engine="chatgpt", | ||
| country="BR", | ||
| save_file=True | ||
| ) | ||
|
|
||
| result = tool.run(search_query="Say 'Hello, Brazil!'") | ||
|
|
||
| print(result) | ||
| ``` |
Empty file.
204 changes: 204 additions & 0 deletions
204
lib/crewai-tools/src/crewai_tools/tools/cloro_dev_tool/cloro_dev_tool.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| import datetime | ||
| import json | ||
| import logging | ||
| import os | ||
| from typing import Any, Literal, TypedDict | ||
|
|
||
| import requests | ||
| from crewai.tools import BaseTool, EnvVar | ||
| from pydantic import BaseModel, Field | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class FormattedResults(TypedDict, total=False): | ||
| """Formatted search results from Cloro API.""" | ||
|
|
||
| # Google / Search | ||
| organic: list[dict[str, Any]] | ||
| peopleAlsoAsk: list[dict[str, Any]] | ||
| relatedSearches: list[dict[str, Any]] | ||
| ai_overview: dict[str, Any] | ||
|
|
||
| # LLM / Common | ||
| text: str | ||
| sources: list[dict[str, Any]] | ||
|
|
||
| # Rich Content (Perplexity / ChatGPT) | ||
| shopping_cards: list[dict[str, Any]] | ||
| hotels: list[dict[str, Any]] | ||
| places: list[dict[str, Any]] | ||
| videos: list[dict[str, Any]] | ||
| images: list[dict[str, Any]] | ||
| related_queries: list[str] | ||
| entities: list[dict[str, Any]] | ||
|
|
||
| credits: int | ||
|
|
||
|
|
||
| def _save_results_to_file(content: str) -> None: | ||
| """Saves the search results to a file.""" | ||
| try: | ||
| filename = f"cloro_results_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.json" | ||
| with open(filename, "w") as file: | ||
| file.write(content) | ||
| logger.info(f"Results saved to {filename}") | ||
| except IOError as e: | ||
| logger.error(f"Failed to save results to file: {e}") | ||
| raise | ||
|
|
||
|
|
||
| class CloroDevToolSchema(BaseModel): | ||
| """Input for CloroDevTool.""" | ||
|
|
||
| search_query: str = Field( | ||
| ..., description="Mandatory query/prompt you want to use to search/query the model" | ||
| ) | ||
|
|
||
|
|
||
| class CloroDevTool(BaseTool): | ||
| name: str = "Search/Query with Cloro" | ||
| description: str = ( | ||
| "A tool that can be used to search the internet or query LLMs using cloro API. " | ||
| "Supports engines: google, chatgpt, gemini, copilot, perplexity, aimode." | ||
| ) | ||
| args_schema: type[BaseModel] = CloroDevToolSchema | ||
| base_url: str = "https://api.cloro.dev/v1/monitor" | ||
| engine: Literal[ | ||
| "google", | ||
| "chatgpt", | ||
| "gemini", | ||
| "copilot", | ||
| "perplexity", | ||
| "aimode", | ||
| ] = "google" | ||
| country: str = "US" | ||
| device: str = "desktop" | ||
| pages: int = 1 | ||
| save_file: bool = False | ||
| api_key: str | None = Field(None, description="cloro API key") | ||
| env_vars: list[EnvVar] = Field( | ||
| default_factory=lambda: [ | ||
| EnvVar( | ||
| name="CLORO_API_KEY", description="API key for cloro", required=True | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
| def __init__(self, api_key: str | None = None, **kwargs): | ||
| super().__init__(**kwargs) | ||
| if api_key: | ||
| self.api_key = api_key | ||
|
|
||
| def _get_api_key(self) -> str: | ||
| if self.api_key: | ||
| return self.api_key | ||
| env_key = os.environ.get("CLORO_API_KEY") | ||
| if env_key: | ||
| return env_key | ||
| raise ValueError("cloro API key not found. Set CLORO_API_KEY environment variable or pass 'api_key' to constructor.") | ||
|
|
||
| def _get_endpoint(self) -> str: | ||
| return f"{self.base_url}/{self.engine}" | ||
|
|
||
| def _make_api_request(self, query: str) -> dict[str, Any]: | ||
| endpoint = self._get_endpoint() | ||
|
|
||
| payload: dict[str, Any] = { | ||
| "country": self.country, | ||
| } | ||
|
|
||
| if self.engine == "google": | ||
| payload["query"] = query | ||
| payload["device"] = self.device | ||
| payload["pages"] = self.pages | ||
| payload["include"] = { | ||
| "html": False, | ||
| "aioverview": {"markdown": True} | ||
| } | ||
| else: | ||
| payload["prompt"] = query | ||
|
|
||
| if self.engine in ["chatgpt", "gemini", "copilot", "perplexity", "aimode"]: | ||
| payload["include"] = {"markdown": True} | ||
|
|
||
| headers = { | ||
| "Authorization": f"Bearer {self._get_api_key()}", | ||
| "Content-Type": "application/json", | ||
| } | ||
|
|
||
| response = None | ||
| try: | ||
| response = requests.post( | ||
| endpoint, headers=headers, json=payload, timeout=60 | ||
| ) | ||
| response.raise_for_status() | ||
| return response.json() | ||
| except requests.exceptions.RequestException as e: | ||
| error_msg = f"Error making request to cloro API ({self.engine}): {e}" | ||
| if response is not None and hasattr(response, "content"): | ||
| error_msg += f"\nResponse content: {response.content.decode('utf-8', errors='replace')}" | ||
| logger.error(error_msg) | ||
| raise | ||
|
|
||
| def _run(self, **kwargs: Any) -> FormattedResults: | ||
| """Execute the search/query operation.""" | ||
| search_query: str | None = kwargs.get("search_query") or kwargs.get("query") | ||
| save_file = kwargs.get("save_file", self.save_file) | ||
|
|
||
| if not search_query: | ||
| raise ValueError("search_query is required") | ||
|
|
||
| api_response = self._make_api_request(search_query) | ||
|
|
||
| if not api_response.get("success"): | ||
| raise ValueError(f"cloro API returned unsuccessful response: {api_response}") | ||
|
|
||
| result = api_response.get("result", {}) | ||
| formatted_results: FormattedResults = {} # type: ignore | ||
|
|
||
| # Process Google Search Results | ||
| if self.engine == "google": | ||
| if "organicResults" in result: | ||
| formatted_results["organic"] = result["organicResults"] | ||
| if "peopleAlsoAsk" in result: | ||
| formatted_results["peopleAlsoAsk"] = result["peopleAlsoAsk"] | ||
| if "relatedSearches" in result: | ||
| formatted_results["relatedSearches"] = result["relatedSearches"] | ||
| if "aioverview" in result: | ||
| formatted_results["ai_overview"] = result["aioverview"] | ||
|
|
||
| # Process LLM Results | ||
| else: | ||
| if "text" in result: | ||
| formatted_results["text"] = result["text"] | ||
| if "sources" in result: | ||
| formatted_results["sources"] = result["sources"] | ||
|
|
||
| # Map rich content if available | ||
| if "shopping_cards" in result: | ||
| formatted_results["shopping_cards"] = result["shopping_cards"] | ||
| elif "shoppingCards" in result: | ||
| formatted_results["shopping_cards"] = result["shoppingCards"] | ||
|
|
||
| if "hotels" in result: | ||
| formatted_results["hotels"] = result["hotels"] | ||
| if "places" in result: | ||
| formatted_results["places"] = result["places"] | ||
| if "videos" in result: | ||
| formatted_results["videos"] = result["videos"] | ||
| if "images" in result: | ||
| formatted_results["images"] = result["images"] | ||
|
|
||
| if "related_queries" in result: | ||
| formatted_results["related_queries"] = result["related_queries"] | ||
| elif "relatedQueries" in result: | ||
| formatted_results["related_queries"] = result["relatedQueries"] | ||
|
|
||
| if "entities" in result: | ||
| formatted_results["entities"] = result["entities"] | ||
|
|
||
| if save_file: | ||
| _save_results_to_file(json.dumps(formatted_results, indent=2)) | ||
|
|
||
| return formatted_results |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.