Skip to content

Commit 2949510

Browse files
authored
Merge pull request #1286 from benbusby/updates
Updates, Features, and Bugfixes; Oh My!
2 parents b3c09ad + 255f1a2 commit 2949510

File tree

15 files changed

+1150
-108
lines changed

15 files changed

+1150
-108
lines changed

README.md

Lines changed: 104 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,9 @@ Contents
4040
1. [Arch/AUR](#arch-linux--arch-based-distributions)
4141
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
4242
4. [Environment Variables and Configuration](#environment-variables)
43-
5. [Usage](#usage)
44-
6. [Extra Steps](#extra-steps)
43+
5. [Google Custom Search (BYOK)](#google-custom-search-byok)
44+
6. [Usage](#usage)
45+
7. [Extra Steps](#extra-steps)
4546
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
4647
2. [Custom Redirecting](#custom-redirecting)
4748
2. [Custom Bangs](#custom-bangs)
@@ -50,10 +51,10 @@ Contents
5051
5. [Using with Firefox Containers](#using-with-firefox-containers)
5152
6. [Reverse Proxying](#reverse-proxying)
5253
1. [Nginx](#nginx)
53-
7. [Contributing](#contributing)
54-
8. [FAQ](#faq)
55-
9. [Public Instances](#public-instances)
56-
10. [Screenshots](#screenshots)
54+
8. [Contributing](#contributing)
55+
9. [FAQ](#faq)
56+
10. [Public Instances](#public-instances)
57+
11. [Screenshots](#screenshots)
5758

5859
## Features
5960
- No ads or sponsored content
@@ -475,7 +476,6 @@ There are a few optional environment variables available for customizing a Whoog
475476
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
476477
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
477478
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
478-
| WHOOGLE_RESULTS_PER_PAGE | Set the number of results per page |
479479
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
480480
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
481481
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
@@ -512,6 +512,103 @@ These environment variables allow setting default config values, but can be over
512512
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
513513
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
514514

515+
### Google Custom Search (BYOK) Environment Variables
516+
517+
These environment variables configure the "Bring Your Own Key" feature for Google Custom Search API:
518+
519+
| Variable | Description |
520+
| -------------------- | ----------------------------------------------------------------------------------------- |
521+
| WHOOGLE_CSE_API_KEY | Your Google API key with Custom Search API enabled |
522+
| WHOOGLE_CSE_ID | Your Custom Search Engine ID (cx parameter) |
523+
| WHOOGLE_USE_CSE | Enable Custom Search API by default (set to '1' to enable) |
524+
525+
## Google Custom Search (BYOK)
526+
527+
If Google blocks traditional search scraping (captchas, IP bans), you can use your own Google Custom Search Engine credentials as a fallback. This uses Google's official API with your own quota.
528+
529+
### Why Use This?
530+
531+
- **Reliability**: Official API never gets blocked or rate-limited (within quota)
532+
- **Speed**: Direct JSON responses are faster than HTML scraping
533+
- **Fallback**: Works when all scraping workarounds fail
534+
- **Privacy**: Your searches still don't go through third parties—they go directly to Google with your own API key
535+
536+
### Limitations vs Standard Whoogle
537+
538+
| Feature | Standard Scraping | CSE API |
539+
|------------------|--------------------------|---------------------|
540+
| Daily limit | None (until blocked) | 100 free, then paid |
541+
| Image search | ✅ Full support | ✅ Supported |
542+
| News/Videos tabs || ❌ Web results only |
543+
| Speed | Slower (HTML parsing) | Faster (JSON) |
544+
| Reliability | Can be blocked | Always works |
545+
546+
### Setup Steps
547+
548+
#### 1. Create a Custom Search Engine
549+
1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all)
550+
2. Click **"Add"** to create a new search engine
551+
3. Under "What to search?", select **"Search the entire web"**
552+
4. Give it a name (e.g., "My Whoogle CSE")
553+
5. Click **"Create"**
554+
6. Copy your **Search Engine ID**
555+
556+
#### 2. Get an API Key
557+
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
558+
2. Create a new project or select an existing one
559+
3. Go to **APIs & Services****Library**
560+
4. Search for **"Custom Search API"** and click **Enable**
561+
5. Go to **APIs & Services****Credentials**
562+
6. Click **"Create Credentials"****"API Key"**
563+
7. Copy your API key (looks like `AIza...`)
564+
565+
#### 3. (Recommended) Restrict Your API Key
566+
To prevent misuse if your key is exposed:
567+
1. Click on your API key in Credentials
568+
2. Under **"API restrictions"**, select **"Restrict key"**
569+
3. Choose only **"Custom Search API"**
570+
4. Under **"Application restrictions"**, consider adding IP restrictions if using on a server
571+
5. Click **Save**
572+
573+
#### 4. Configure Whoogle
574+
575+
**Option A: Via Settings UI**
576+
1. Open your Whoogle instance
577+
2. Click the **Config** button
578+
3. Scroll to "Google Custom Search (BYOK)" section
579+
4. Enter your API Key and CSE ID
580+
5. Check "Use Custom Search API"
581+
6. Click **Apply**
582+
583+
**Option B: Via Environment Variables**
584+
```bash
585+
WHOOGLE_CSE_API_KEY=AIza...
586+
WHOOGLE_CSE_ID=23f...
587+
WHOOGLE_USE_CSE=1
588+
```
589+
590+
### Pricing & Avoiding Charges
591+
592+
| Tier | Queries | Cost |
593+
|------|------------------|-----------------------|
594+
| Free | 100/day | $0 |
595+
| Paid | Up to 10,000/day | $5 per 1,000 queries |
596+
597+
**⚠️ To avoid unexpected charges:**
598+
599+
1. **Don't add a payment method** to Google Cloud (safest option—API stops at 100/day)
600+
2. **Set a billing budget alert**: [Billing → Budgets & Alerts](https://console.cloud.google.com/billing/budgets)
601+
3. **Cap API usage**: APIs & Services → Custom Search API → Quotas → Set "Queries per day" to 100
602+
4. **Monitor usage**: APIs & Services → Custom Search API → Metrics
603+
604+
### Troubleshooting
605+
606+
| Error | Cause | Solution |
607+
|---------------------|---------------------------|-----------------------------------------------------------------|
608+
| "API key not valid" | Invalid or restricted key | Check key in Cloud Console, ensure Custom Search API is enabled |
609+
| "Quota exceeded" | Hit 100/day limit | Wait until midnight PT, or enable billing |
610+
| "Invalid CSE ID" | Wrong cx parameter | Copy ID from Programmable Search Engine control panel |
611+
515612
## Usage
516613
Same as most search engines, with the exception of filtering by time range.
517614

app/__init__.py

Lines changed: 53 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,19 @@
1212
import json
1313
import logging.config
1414
import os
15+
import sys
1516
from stem import Signal
1617
import threading
1718
import warnings
1819

1920
from werkzeug.middleware.proxy_fix import ProxyFix
2021

21-
from app.utils.misc import read_config_bool
2222
from app.services.http_client import HttpxClient
2323
from app.services.provider import close_all_clients
2424
from app.version import __version__
2525

26-
app = Flask(__name__, static_folder=os.path.dirname(
27-
os.path.abspath(__file__)) + '/static')
26+
app = Flask(__name__, static_folder=os.path.join(
27+
os.path.dirname(os.path.abspath(__file__)), 'static'))
2828

2929
app.wsgi_app = ProxyFix(app.wsgi_app)
3030

@@ -76,7 +76,10 @@
7676
app.config['SESSION_FILE_DIR'] = os.path.join(
7777
app.config['CONFIG_PATH'],
7878
'session')
79-
app.config['MAX_SESSION_SIZE'] = 4000 # Sessions won't exceed 4KB
79+
# Maximum session file size in bytes (4KB limit to prevent abuse and disk exhaustion)
80+
# Session files larger than this are ignored during cleanup to avoid processing
81+
# potentially malicious or corrupted files
82+
app.config['MAX_SESSION_SIZE'] = 4000
8083
app.config['BANG_PATH'] = os.getenv(
8184
'CONFIG_VOLUME',
8285
os.path.join(app.config['STATIC_FOLDER'], 'bangs'))
@@ -118,18 +121,53 @@ def _teardown_clients(exception):
118121
print(f"Warning: Could not initialize UA pool: {e}")
119122
app.config['UA_POOL'] = []
120123

121-
# Session values
122-
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
123-
if os.path.exists(app_key_path):
124+
# Session values - Secret key management
125+
# Priority: environment variable → file → generate new
126+
def get_secret_key():
127+
"""Load or generate secret key with validation.
128+
129+
Priority order:
130+
1. WHOOGLE_SECRET_KEY environment variable
131+
2. Existing key file
132+
3. Generate new key and save to file
133+
134+
Returns:
135+
str: Valid secret key for Flask sessions
136+
"""
137+
# Check environment variable first
138+
env_key = os.getenv('WHOOGLE_SECRET_KEY', '').strip()
139+
if env_key:
140+
# Validate env key has minimum length
141+
if len(env_key) >= 32:
142+
return env_key
143+
else:
144+
print(f"Warning: WHOOGLE_SECRET_KEY too short ({len(env_key)} chars, need 32+). Using file/generated key instead.", file=sys.stderr)
145+
146+
# Check file-based key
147+
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
148+
if os.path.exists(app_key_path):
149+
try:
150+
with open(app_key_path, 'r', encoding='utf-8') as f:
151+
key = f.read().strip()
152+
# Validate file key
153+
if len(key) >= 32:
154+
return key
155+
else:
156+
print(f"Warning: Key file too short, regenerating", file=sys.stderr)
157+
except (PermissionError, IOError) as e:
158+
print(f"Warning: Could not read key file: {e}", file=sys.stderr)
159+
160+
# Generate new key
161+
new_key = str(b64encode(os.urandom(32)))
124162
try:
125-
with open(app_key_path, 'r', encoding='utf-8') as f:
126-
app.config['SECRET_KEY'] = f.read()
127-
except PermissionError:
128-
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
129-
else:
130-
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
131-
with open(app_key_path, 'w', encoding='utf-8') as key_file:
132-
key_file.write(app.config['SECRET_KEY'])
163+
with open(app_key_path, 'w', encoding='utf-8') as key_file:
164+
key_file.write(new_key)
165+
except (PermissionError, IOError) as e:
166+
print(f"Warning: Could not save key file: {e}. Key will not persist across restarts.", file=sys.stderr)
167+
168+
return new_key
169+
170+
app.config['SECRET_KEY'] = get_secret_key()
133171
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=365)
134172

135173
# NOTE: SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's

0 commit comments

Comments
 (0)