A simple Python script for web scraping the AMGR Directory website.
This script allows you to search for breeder information in the AMGR Directory using filters such as state, member, and breed type. Results are displayed in JSON format.
- Python 3.6+
- Python modules:
- requests
- beautifulsoup4
- requests (for Natural Language feature)
- Make sure Python is installed on your system
- Install the required modules:
pip install -r requirements.txt
# For Natural Language feature (optional)
pip install openaiIf you want to use the Natural Language feature, you need to set up the OpenAI API key using one of the following methods:
Create a file named .env in the same directory as the script, then add:
OPENAI_API_KEY=your-api-key-here
Make sure you have installed python-dotenv:
pip install python-dotenv
set OPENAI_API_KEY=your-api-key-here
export OPENAI_API_KEY=your-api-key-here
The Natural Language feature will not work if the API key is not available through one of the methods above.
To use interactive mode (easier for beginners), run the script without arguments:
python mrscraper.pyIn this mode, the script will:
- Ask if you want to enable debug mode
- Ask if you want to use Natural Language
- If yes, you can enter commands in natural language
- The script will analyze your command and translate it to search parameters
- If not using Natural Language, the script will:
- Ask you to select a state from the list
- Ask you to select a member from the list
- Ask you to select a breed from the list
- Perform the search and display the results
For more experienced users, the script can be run with parameters:
python mrscraper.py [OPTIONS]--state: Filter by state (e.g., "Kansas")--member: Filter by member (e.g., "Dwight Elmore")--breed: Filter by breed (e.g., "(AR) - American Red")--debug: Enable debug mode (saves HTML files in debug folder)
python mrscraper.py --state "Kansas" --member "Dwight Elmore"You can also use natural language commands to perform searches:
python mrscraper.py --nl "Find breeders named Elmore in Kansas"To use this feature, you need to:
- Provide OpenAI API key:
- Via parameter:
--api-key "your-api-key-here" - Or via environment variable:
OPENAI_API_KEY - Or enter interactively when prompted
- Via parameter:
- "Find breeders in Kansas"
- "Show all breeders with American Red breed"
- "Who is the breeder named Dwight Elmore?"
- "Find breeders in Alabama who have American Black"
Output is displayed in JSON format with the following structure:
{
"header": ["Action", "State", "Name", "Farm", "Phone", "Website"],
"data": [
[
"navigate_pagination",
"KS",
"Dwight Elmore",
"3TAC Ranch Genetics - 3TR",
"(620) 899-0770",
""
],
[
"navigate_pagination",
"KS",
"Mary Powell",
"Barnyard Weed Warriors - BWW",
"(785) 420-0472",
""
]
]
}If you encounter problems, enable debug mode:
python mrscraper.py --debugDebug files will be saved in the debug/ folder:
main_page.html: Main page HTMLresponse.html: Search results HTML
- The script uses correct field names for form submission:
stateID,memberID, andbreedID - The script has flexible form element detection mechanisms to handle page structure changes
- Natural Language feature uses OpenAI's
gpt-4o-minimodel
The test_scraper.py script provides automated testing to verify scraper accuracy. This feature allows you to ensure that the scraper works correctly and produces expected output.
python test_scraper.pyThe test script includes 8 different test cases:
- Search by state - Tests search capability by state (Kansas)
- Search by member - Tests search capability by breeder name (Dwight Elmore)
- Search by breed - Tests search capability by livestock type
- Combined parameter search - Tests search capability with combination of state and breed (Iowa and Savanna)
- Natural Language search - Tests ability to convert natural language queries to search parameters
- Complex NL search - Tests ability to convert complex queries like "Find breeders in IOWA with American Savanna type"
- Invalid parameter search - Tests resilience against invalid parameters
- Error handling - Tests error handling when connection problems occur
Test results are saved in the test_results/ folder:
- Individual files for each test case (e.g.,
test_01_search_by_state.json) - Summary file with overall statistics (
summary_TIMESTAMP.json)
Each test result file contains:
- Search parameters used
- Execution time
- Sample result data
- Expectations vs actual results
Example test output:
{
"test_name": "test_04_combined_search",
"timestamp": "2025-05-17 05:07:32",
"query_params": {
"state": "Iowa",
"breed": "(SA) - Savanna"
},
"execution_time": 0.87,
"result_count": 6,
"header": ["Action", "State", "Name", "Farm", "Phone", "Website"],
"sample_data": [
[
"navigate_pagination",
"IA",
"Steve & Syrie Vicary",
"Vicary Savanna Goats - VSG",
"(402) 203-2165",
""
]
],
"test_success": true,
"expected_output": {
"header": ["Action", "State", "Name", "Farm", "Phone", "Website"],
"data": [
[
"navigate_pagination",
"IA",
"Dennis & Stacy Ratashak",
"Ratashak Harvest Hills - RHH",
"(703) 850-4113",
""
]
]
}
}- This script is designed for educational purposes only
- Use wisely and respect the policies of the accessed website
- Using the Natural Language feature requires a valid OpenAI API key