This project presents a high-performance, multi-threaded HTTP/1.1 server built entirely from scratch using Python's low-level socket and threading modules. It serves as a comprehensive demonstration of concurrent programming, network protocol handling, and fundamental security practices.
This server requires Python 3.x and relies only on standard library modules.
-
Repository Structure: Ensure the following directory structure is present. The
resourcesdirectory serves as the web root for static content.project/├── server.py└── resources/├── index.html├── about.html├── contact.html├── sample.txt├── large_image.png (> 1MB)├── photo.jpg└── uploads/ (Directory for POST results - **must be created and writable**)
-
Test Files: The
resources/directory must be populated with the required test content: three HTML files, two text files, two PNG images (one exceeding 1MB), and two JPEG images.
The server accepts up to three command-line arguments, overriding the defaults: <Port> <Host Address> <Max Thread Pool Size>.
| Argument | Default Value | Description |
|---|---|---|
| Port | 8080 |
The TCP port for the server to bind and listen on. |
| Host Address | 127.0.0.1 |
The interface address (e.g., 0.0.0.0 for all interfaces). |
| Pool Size | 10 |
The maximum number of concurrent threads available to handle clients. |
-
Default Run:
python3 server.py
(Starts server on 127.0.0.1:8080 with 10 worker threads)
-
Custom Run Example:
python3 server.py 8000 0.0.0.0 20
(Starts server on 0.0.0.0:8000 with 20 worker threads)
The server implements the Producer-Consumer pattern using a fixed-size thread pool for concurrent request processing.
- Core Components:
ThreadPool: The manager responsible for spawning and maintaining a fixed count ofWorkerThreadinstances.WorkerThread(Consumers): Threads that block on the connection queue, dequeue client sockets, and execute theRequestHandlerlogic.connection_queue(Shared Buffer): A global list where the main thread (Producer) places accepted client sockets.
- Synchronization:
threading.Lock(queue_lock): A Mutex strictly protecting theconnection_queueduring read and write operations, eliminating race conditions.threading.Condition(pool_condition): Utilized for efficient inter-thread communication. Worker threads callwait()when idle. The main thread callsnotify()when a new client arrives, allowing threads to wake up efficiently without busy-waiting.
- Saturation Handling: If the queue exceeds capacity while all threads are busy, the system returns a 503 Service Unavailable response with a
Retry-Afterheader directly to the client and logs the saturation event.
The server handles both static HTML rendering and robust binary data streaming with strict adherence to HTTP headers.
- Content Identification and MIME Types:
.htmlfiles are served astext/html; charset=utf-8.- Files with extensions
.png,.jpg,.jpeg, and.txtare treated as binary downloads and served withContent-Type: application/octet-stream.
- Forced Download Headers:
Content-Disposition: attachment; filename="[filename]"is explicitly included in binary responses. This instructs the client (browser) to download the file rather than attempting to display or render the content inline.
- Efficient Data Streaming:
- File I/O uses Python's binary read mode (
'rb') to ensure raw byte data integrity. - Files are read and sent over the socket in 8192-byte chunks (
BUFFER_SIZE). This chunking strategy minimizes memory usage for large files and ensures efficient, non-blocking network transmission. Content-Lengthis always set to the exact file size in bytes for accurate transmission and client verification.
- File I/O uses Python's binary read mode (
The RequestHandler incorporates two mandatory security validation steps to prevent common network attacks.
- Function:
_safe_path_resolve(request_path) - Mechanism: The function uses
os.path.abspath()to resolve the requested path into its canonical form. - Strict Validation: The canonicalized path is checked to ensure it begins with the absolute path of the designated
resourcesdirectory. Requests containing malicious sequences like..or./that attempt to access files outside the server root are blocked, resulting in a 403 Forbidden response.
- Function:
_validate_host() - Mechanism: The mandatory
Hostheader is checked against the server's configured host/port tuple ([Host:Port]). - Validation Rules:
- Missing Host: Returns 400 Bad Request.
- Mismatch: Returns 403 Forbidden if the header value does not match the expected server address (e.g.,
127.0.0.1:8080,localhost:8080, or the server's configured IP).
While the server meets all specified requirements, it has inherent limitations due to its low-level, educational implementation:
- Brittle HTTP Parsing: The parser relies on simple string splitting for the request line and headers. It is not resilient against complex, non-standard, or deliberately malformed HTTP payloads, unlike a production-grade library.
- Sequential Keep-Alive Processing: HTTP/1.1 connection persistence is implemented, but all requests within a single persistent connection are handled sequentially by the same thread. The concurrency benefits apply only to simultaneous connections from different clients.
- Basic Timeout Handling: The socket timeout mechanism is limited to detecting connection idleness (no data received). It lacks advanced features for monitoring slow data transmission rates to protect against Denial of Service (DoS) attacks like Slowloris.