Skip to content

Commit 2f3a4fa

Browse files
provided Gtihub actions and readne.md. Created tests folders to provide tests in the feature
1 parent 781f5db commit 2f3a4fa

File tree

5 files changed

+237
-62
lines changed

5 files changed

+237
-62
lines changed

.github/workflows/build_binary.yml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
name: Build Binaries (Win + macOS)
3+
4+
on:
5+
push: { branches: [main] }
6+
pull_request: { branches: [main] }
7+
workflow_dispatch:
8+
release: { types: [created] }
9+
10+
jobs:
11+
build-windows:
12+
runs-on: windows-latest
13+
steps:
14+
- uses: actions/checkout@v4
15+
- uses: actions/setup-python@v5
16+
with:
17+
python-version: "3.11"
18+
- name: Install deps
19+
run: |
20+
python -m pip install --upgrade pip
21+
if (Test-Path requirements.txt) { pip install -r requirements.txt }
22+
pip install pyinstaller
23+
- name: Build EXE
24+
run: |
25+
if (Test-Path pyinstaller.spec) {
26+
pyinstaller pyinstaller.spec
27+
} else {
28+
pyinstaller --onefile --name GinioCrawler app.py
29+
}
30+
- name: Upload artifact
31+
uses: actions/upload-artifact@v4
32+
with:
33+
name: GinioCrawler-windows-exe
34+
path: dist/*.exe
35+
36+
build-macos:
37+
runs-on: macos-13 # macos-14 is Apple Silicon; 13 is Intel
38+
steps:
39+
- uses: actions/checkout@v4
40+
- uses: actions/setup-python@v5
41+
with:
42+
python-version: "3.11"
43+
- name: Install deps
44+
run: |
45+
python -m pip install --upgrade pip
46+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
47+
pip install pyinstaller
48+
- name: Build .app
49+
run: pyinstaller --windowed --name GinioCrawler app.py
50+
- name: Create DMG
51+
run: hdiutil create -volname GinioCrawler -srcfolder "dist/GinioCrawler.app" -ov -format UDZO "dist/GinioCrawler.dmg"
52+
- name: Upload artifact
53+
uses: actions/upload-artifact@v4
54+
with:
55+
name: GinioCrawler-macOS
56+
path: dist/GinioCrawler.dmg
57+
58+
attach-on-release:
59+
if: github.event_name == 'release'
60+
runs-on: ubuntu-latest
61+
needs: [build-windows, build-macos]
62+
steps:
63+
- uses: actions/download-artifact@v4
64+
with:
65+
name: GinioCrawler-windows-exe
66+
path: dist
67+
- uses: actions/download-artifact@v4
68+
with:
69+
name: GinioCrawler-macOS
70+
path: dist
71+
- name: Upload to Release
72+
uses: softprops/action-gh-release@v2
73+
with:
74+
files: dist/*

.github/workflows/license.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
name: Python CI
3+
4+
on:
5+
push: { branches: [ main ] }
6+
pull_request: { branches: [ main ] }
7+
8+
jobs:
9+
ci:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
14+
- name: Set up Python
15+
uses: actions/setup-python@v5
16+
with:
17+
python-version: '3.11'
18+
19+
- name: Install deps
20+
run: |
21+
python -m pip install --upgrade pip
22+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
23+
pip install pytest ruff black
24+
25+
- name: Lint (ruff)
26+
run: ruff check .
27+
28+
- name: Format check (black)
29+
run: black --check .
30+
31+
- name: Tests
32+
run: pytest -q

readme.md

Lines changed: 118 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,154 @@
1-
# GinioCrawler — dokumentacja
1+
# GinioCrawler
22

3-
## O co chodzi?
3+
## DESCRIPTION
44

5-
Mała apka do wyszukiwania firm po frazie (np. *“producenci granulatu Polska”*), pobierania stron i wyciągania kontaktów (emaile, telefony). Zapisuje wyniki do **CSV** i **XLSX**.
5+
Small, pragmatic lead-gen helper: type a query → get company contacts → export to Excel/CSV. Built to help small businesses assemble contact lists without manual copy-paste.
66

7-
## Wymagania
7+
– Uses SerpAPI (Google results)
88

9-
* Python 3.10+ (dev) / Windows 10+ (EXE)
10-
* Klucz do wyszukiwarki: **SERPAPI\_KEY** [SERPAPI LINK](https://serpapi.com)
11-
* Internet 😅
9+
– Extracts emails and phone numbers from target pages
1210

13-
## Instalacja (dev)
11+
– Exports to .xlsx and .csv
1412

15-
```bash
16-
python -m venv .venv
17-
# Win PowerShell:
18-
.venv\Scripts\Activate.ps1
19-
# macOS/Linux:
20-
# source .venv/bin/activate
13+
– If SERPAPI\_KEY is missing, the app will prompt for it on first run
2114

22-
pip install -r requirements.txt
23-
# jeśli używasz GUI i zapisu klucza:
24-
pip install python-dotenv pandas openpyxl
25-
```
15+
## DEMO
16+
17+
// TODO: provide sample
18+
19+
## FEATURES
20+
21+
– Targeted search via SerpAPI (country/language aware)
22+
23+
– Email and phone extraction from result pages
2624

27-
## Konfiguracja klucza SERPAPI
25+
– Clean Excel/CSV export with consistent columns
2826

29-
Masz dwie drogi:
27+
– Simple GUI flow (and basic CLI)
3028

31-
1. **Zmienna środowiskowa**
32-
Windows (PowerShell):
29+
– Safety knobs: polite delays and rate limits
3330

34-
```powershell
35-
setx SERPAPI_KEY "TWÓJ_KLUCZ"
36-
```
31+
## ARCHITECTURE (HIGH LEVEL)
3732

38-
Potem zrestartuj terminal/aplikację.
39-
2. **GUI zapisze klucz samo** (jeśli masz `ensure_api_key()`):
40-
Przy pierwszym uruchomieniu **app\_gui.py** / EXE wyskoczy okno → wklejasz klucz → zapisze się do
41-
`%APPDATA%\GinioCrawler\.env`.
33+
**Query → SerpAPI (Google) → result URLs → fetch and parse → extract contacts → dedupe → export (xlsx/csv)**
4234

43-
## Uruchomienie — konsola (CLI)
35+
REQUIREMENTS
36+
37+
– Python 3.9+
38+
39+
– SerpAPI account (free tier works): [CLICK](https://serpapi.com)
40+
41+
*Note: no manual env setup required; the app will ask for the key if it’s missing.*
42+
43+
## **QUICKSTART — GUI**
4444

4545
```bash
46+
#1.
47+
git clone https://github.com/SculptTechProject/GinioCrawler.git
48+
# 2.
49+
cd GinioCrawler
50+
# 3.
51+
pip install -r requirements.txt
52+
# 4. Run your entry script, for example:
53+
python app_gui.py
54+
# If SERPAPI_KEY is not set, the app will prompt for it and continue.
55+
```
56+
57+
## QUICKSTART — CLI
58+
59+
```bash
60+
#1.
61+
git clone https://github.com/SculptTechProject/GinioCrawler.git
62+
# 2.
63+
cd GinioCrawler
64+
# 3.
65+
pip install -r requirements.txt
66+
# 4. Please make sure you provided SERPAPI_KEY, then:
4667
python main.py
47-
# wpisz frazę, np. "SoftwareHouse Warszawa"
4868
```
4969

50-
Wyniki lecą do:
70+
## OUTPUT SCHEMA (TYPICAL COLUMNS)
71+
72+
// TODO: provide sample
73+
74+
## GOOD CITIZEN (ETHICS AND LIMITS)
75+
76+
– Respect websites’ robots.txt and Terms of Service
77+
78+
– Keep reasonable rate limits; do not hammer the same domain
79+
80+
– SerpAPI has quotas; heavy usage may require a paid plan
81+
82+
– Use responsibly; this tool is for legitimate contact discovery (no spam)
83+
84+
## TROUBLESHOOTING
5185

52-
* `wyniki/csv/wyniki_YYYYMMDD_HHMMSS.csv`
53-
* `wyniki/excel/wyniki_YYYYMMDD_HHMMSS.xlsx`
86+
– Empty results: make the query more specific; check SerpAPI quota; set proper country/lang
5487

55-
Kolumny: `url, title, emails, phones, contact_url`.
56-
W `emails` i `phones` wartości są rozdzielone **spacją**.
88+
– Slow or blocked: increase delays, lower concurrency, fetch fewer pages
5789

58-
## Uruchomienie — GUI
90+
– Excel won’t open: try CSV, or ensure .xlsx is written with a supported library
91+
92+
– Key prompt loops: verify your SerpAPI key and remaining credits
93+
94+
## PACKAGING (DISTRIBUTABLES)
95+
96+
**Windows (.exe):**
5997

6098
```bash
61-
python app_gui.py
62-
```
99+
pip install pyinstaller
63100

64-
* Wpisz frazę.
65-
* (Opcjonalnie) kliknij **Wybierz…** i wskaż folder wyjściowy (w środku stworzy `csv/` i `excel/`).
66-
* Kliknij **Start**. Po zakończeniu otworzy folder z Excellem.
101+
pyinstaller –onefile –name GinioCrawler app.py
67102

68-
## Budowanie EXE (Windows)
103+
Output: dist/GinioCrawler.exe
104+
```
105+
106+
**macOS (.app / .dmg):**
69107

70108
```bash
71109
pip install pyinstaller
72-
pyinstaller --onefile --windowed --name "GinioCrawler" app_gui.py
73-
# opcjonalnie: --icon icon.ico
110+
111+
pyinstaller –windowed –name GinioCrawler app.py
112+
113+
hdiutil create -volname GinioCrawler -srcfolder dist/GinioCrawler.app -ov -format UDZO dist/GinioCrawler.dmg
74114
```
75115

76-
Plik znajdziesz w `dist/GinioCrawler.exe`. Zrób skrót na pulpit.
116+
*Note: unsigned app; users can open via Right-click → Open. (Signing/notarization can be added later in CI.)*
117+
118+
## TESTS (WHAT TO COVER + QUICK START)
119+
120+
Install and run:
121+
122+
pip install pytest
123+
124+
pytest -q
125+
126+
Recommended coverage:
127+
128+
– search/SerpAPI: correct request, pagination, error handling and rate/limit behavior
129+
130+
– fetch: retries with backoff, timeouts, robots.txt respected
131+
132+
– extract: email/phone patterns (various formats), duplicates handling, URL normalization
133+
134+
– export: column order and names, files openable in Excel and CSV
135+
136+
– CLI/UX: missing SERPAPI\_KEY triggers prompt; flag parsing; happy path without real network calls (mocked)
137+
138+
## ROADMAP (SUGGESTED)
139+
140+
– Saved queries and recent exports
141+
142+
– De-duplication across sessions
143+
144+
– Fallback engines and smarter retry strategy
77145

78-
## Jak to działa (skrót techniczny)
146+
– Better parsing and validation for contacts
79147

80-
* **SerpAPI** zwraca listę URL-i dla frazy.
81-
* **httpx + BeautifulSoup** pobiera stronę, szuka maili/telefonów i linku **Kontakt** (głębia 1).
82-
* Szanuje `robots.txt`.
83-
* Zapis: **CSV (UTF-8-SIG)** + **XLSX** (auto-szerokości, nagłówki, hiperlinki).
84-
* Separator wielu maili/telefonów: **spacja**.
148+
– Dockerfile for one-command runs
85149

86-
## Częste problemy
150+
## LICENSE
87151

88-
* **„Brak SERPAPI\_KEY”** – ustaw zmienną środowiskową albo użyj GUI z zapisem do `.env`.
89-
* **„ModuleNotFoundError: pandas/openpyxl”**`pip install pandas openpyxl`.
90-
* **Puste wyniki** – fraza zbyt ogólna / strony blokują boty / brak kontaktu na [www](http://www/).
91-
* **Excel zlepia numery** – w XLSX kolumna „phones” jest tekstem; jeśli nie, włącz format „Tekst”.
152+
MIT 👀️
92153

93-
## Dobre praktyki / etyka
94154

95-
* Szanuj **`robots.txt`** i limity serwisów.
96-
* Nie bombarduj równoległymi żądaniami (możesz dodać `httpx.Limits` i `asyncio.Semaphore`).
97-
* Sprawdzaj regulaminy serwisów; używaj oficjalnych API wyszukiwarek.

requirements.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
httpx>=0.27
2+
23
beautifulsoup4>=4.12
34

45
lxml>=5.2
@@ -13,4 +14,14 @@ openpyxl
1314

1415
black
1516

16-
pyinstaller
17+
pyinstaller
18+
19+
pytest-mock
20+
21+
respx
22+
23+
freezegun
24+
25+
dirty-equals
26+
27+
pytest-cov

tests/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
todo: tests

0 commit comments

Comments
 (0)