|
| 1 | +--- |
| 2 | +title: Week 6 |
| 3 | +author: Vaibhav Sahu |
| 4 | +tags: [gsoc25, OSSelot] |
| 5 | +--- |
| 6 | + |
| 7 | +<!-- |
| 8 | +SPDX-License-Identifier: CC-BY-SA-4.0 |
| 9 | +SPDX-FileCopyrightText: 2025 Vaibhav Sahu <sahusv4527@gmail.com> |
| 10 | +--> |
| 11 | + |
| 12 | +# Week 6 |
| 13 | + |
| 14 | +*(July 8, 2025 – July 14, 2025)* |
| 15 | + |
| 16 | +## Meeting 6 |
| 17 | + |
| 18 | +*(July 11, 2025)* |
| 19 | + |
| 20 | +### Attendees |
| 21 | + |
| 22 | +* [Vaibhav Sahu](https://github.com/Vaibhavsahu2810) |
| 23 | +* [Jan Altenberg](https://github.com/JanAltenberg) |
| 24 | + |
| 25 | +### Discussions |
| 26 | + |
| 27 | +* Demonstrated the working prototype of parallelized `ScanCode` scanning within FOSSology. |
| 28 | +* Discussed runtime behavior with multiple workers and how resource constraints are enforced. |
| 29 | +* Reviewed performance benchmarks from busybox test uploads. |
| 30 | +* Collected feedback on user-configurable CLI arguments, some bugs while testing and their integration. |
| 31 | + |
| 32 | +## Updates |
| 33 | + |
| 34 | +* **Parallel ScanCode implementation completed** |
| 35 | + |
| 36 | + * Fully implemented multiprocessing support in `runscanonfiles.py`. |
| 37 | + * Introduced CLI parameters for customization: |
| 38 | + parser.add_argument("--parallel", type=int, default=1, help="Number of parallel processes (will be adjusted based on available memory)") |
| 39 | + parser.add_argument("--nice-level", type=int, default=10, help="Process nice level (0-19)") |
| 40 | + parser.add_argument("--max-tasks", type=int, default=1000, help="Max tasks per worker process") |
| 41 | + parser.add_argument("--heartbeat-interval", type=int, default=60, help="Heartbeat interval in seconds") |
| 42 | + * Enforced memory and CPU resource limits for each worker process using OS-level controls. |
| 43 | + |
| 44 | +* **Performance testing** |
| 45 | + |
| 46 | + * Tested using busybox-1.36.1.tar.bz2 uploads. |
| 47 | + * Scanning time improved drastically — from **\~12 minutes to just \~4 minutes**. |
| 48 | + * Below are side-by-side comparisons: |
| 49 | + |
| 50 | + **Single-process run** |
| 51 | +  |
| 52 | + |
| 53 | + **Parallelized run (4 processes)** |
| 54 | +  |
| 55 | + |
| 56 | +## Plan for Next Week |
| 57 | + |
| 58 | +1. Avoid starting worker threads if available memory is too low. Ensure threads gracefully shut down if they don’t receive sufficient memory (without hard memory limits). |
| 59 | +2. Implement proper cleanup to ensure all worker processes terminate when the user cancels the agent. |
| 60 | +3. Reduce the default number of parallel jobs to a safer baseline. |
| 61 | +4. Finalize and verify heartbeat handling for worker monitoring. |
| 62 | + |
| 63 | + |
0 commit comments