Skip to content

🛠️ Implementation Notes

Yohay Ohayon edited this page Apr 8, 2025 · 2 revisions

Environment and Tools

The following technologies and tools were utilized throughout the development of the project:

  • Operating System: Initially developed on Windows. Later stages involved running the project on Ubuntu 24.04.1 LTS via WSL, in accordance with the project requirements. The project is compatible with both OS.
  • Programming Language: Python 3.12.3
  • IDE: Development was primarily conducted in PyCharm Community Edition. Later, PyCharm Professional was used to enable seamless integration with WSL.
  • Version Control: GitHub was used for version control. You are on the right place!
  • Dependencies Management: All project dependencies are listed in the requirements.txt file, located in the main code directory, following standard Python practices.
  • Packaging: The entire project was bundled into a single executable file using PyInstaller.

🧑‍💻 Coding Standards

  • The codebase follows the PEP 8 Python coding convention.
  • Code style enforcement was supported by the IDE.
  • Docstrings were added to every class and method. Minimal or no inline documentation is included; instead, emphasis was placed on writing clean and readable code.

🔐 Concurrency Safety

The application involves multi-threaded execution. To ensure thread safety, the following shared data structures were managed as described:

  • urls_to_visit (Queue):
    This is accessed and modified by all threads. The native Queue.Queue class from Python’s standard library is used, which provides built-in thread safety.

  • visited_urls (Set):
    Used by all crawling threads. Access to this set is synchronized using a lock to ensure that only one thread can read or modify it at a time.

  • broken_urls (List):
    This list is used concurrently by the crawling threads (writers) and the main thread (reader). A lock is used to protect write operations.
    Once the crawling phase is complete, the list becomes read-only and is accessed exclusively by the main thread to generate reports.

📚 Project Navigation

Clone this wiki locally