A Java-based web crawler API built with Spring Boot that fetches domain pricing data from Namecheap's domain search page and saves it to both CSV and JSON formats. The API exposes RESTful endpoints to trigger crawling, retrieve crawled data, and download the generated files.
- Web Crawling: Extracts domain TLDs, free domain privacy status, price per year, and renewal price.
- Data Storage: Saves crawled data to
domains.csvanddomains.json. - REST API:
POST /api/v1/crawl: Triggers a crawl and saves data to files.GET /api/v1/domains: Retrieves the most recent crawled data.GET /api/v1/files/{type}: Downloads the CSV or JSON file.
- Modular Design: Follows OOP principles with separation of concerns.
- Spring Boot: Leverages dependency injection and robust configuration options.
- Java 17 or higher
- Maven 3.6+
- Internet connection (to fetch pages and download dependencies)
git clone <repository-url>
cd domain-crawlerMake sure Maven is installed, then run:s)
mvn clean install- This will download dependencies (Spring Boot, Jsoup, Jackson, etc.).
The crawler uses placeholder selectors in DomainParser.java.
- Open Namecheap's domain search page.
- Inspect the HTML (right-click → Inspect).
- Identify correct classes/IDs for:
- TLDs
- Prices
- Privacy status
- Renewal prices
Update the parse() method in:
src/main/java/com/crawler/parser/DomainParser.java- Example:
String tld = row.select("div.domain-name").text().trim(); // Replace with actual selector mvn spring-boot:run- The API will be available at http://localhost:8080.
curl -X POST http://localhost:8080/api/v1/crawl- *Response: JSON array of crawled domain data.
[ { "TLD": "sale.com", "Free Domain Privacy": true, "Price / Year": "$11.28", "Renewal Price": "$16.98" }, ... ] - Side effect: Creates domains.csv and domains.json in the project root. Which you can change to the resources directory.
curl http://localhost:8080/api/domains- Response: Latest crawled data (or empty array if none).
- CSV:
curl http://localhost:8080/api/v1/files/csv -o domains.csv- JSON:
curl http://localhost:8080/api/v1/files/json -o domains.json- Example domains.csv
TLD,Free Domain Privacy,Price / Year,Renewal Price
sale.com,true,$11.28,$16.98
sale.org,true,$7.48,$14.98
sale.eu,false,$4.48,$8.98- Example domains.json
[
{
"TLD": "sale.com",
"Free Domain Privacy": true,
"Price / Year": "$11.28",
"Renewal Price": "$16.98"
}
]-
CSS Selectors
- The default ones are placeholders.
- Always inspect Namecheap's current HTML structure to update them.
- If the page changes, repeat the inspection.
-
Dynamic Content (If Jsoup Doesn't Work)
- If the data is loaded via JavaScript, use Selenium instead:
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public Document fetchPage() throws IOException {
WebDriver driver = new ChromeDriver();
try {
driver.get(url);
return Jsoup.parse(driver.getPageSource());
} finally {
driver.quit();
}
}
- Add Selenium to pom.xml
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.15.0</version>
</dependency>
-
Ensure ChromeDriver is installed.
-
Rate Limiting
- Add delay to avoid being blocked:
Thread.sleep(1000);
- Respect Namecheap’s robots.txt.
- Default: Root of the project (domains.csv, domains.json)
- You can configure a custom path in application.properties.
- Basic error handling is present.
- Improve with custom exceptions, logging, and retry mechanisms for production.
- The current version is unauthenticated.
- Add Spring Security for production use (JWT, API keys, etc.).
📦 Dependencies
- Spring Boot 3.2.5: API framework
- Jsoup 1.17.2: HTML parsing
- Jackson 2.17.2: JSON handling Full list in pom.xml.
- New Output Formats: Add a XmlSaver or other DataSaver implementations.
- Database Storage: Use Spring Data JPA for persistence.
- Async Crawling: Annotate crawl method with @Async.
- Authentication: Add JWT or OAuth2 security with Spring Security.
- No Data Crawled: Check/Update selectors, verify URL access, and use Selenium if needed.
- API Errors: Check logs (mvn spring-boot:run).
- Dependency Issues: Ensure Java/Maven are up-to-date, try mvn clean install.
- This project is licensed under the MIT License. See LICENSE for details.
Contributions are welcome!
- Fork the repo
- Create a new branch
git checkout -b feature/your-feature- Commit changes
git commit -m "Add your feature"- Push your branch
git push origin feature/your-feature- Open a pull request 🚀