Skip to content

Commit dc6a35f

Browse files
committed
Add GSoC 2025 report
Signed-off-by: Varsha U N <[email protected]>
1 parent 9bd7f14 commit dc6a35f

File tree

3 files changed

+95
-0
lines changed

3 files changed

+95
-0
lines changed
89.3 KB
Loading

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ GSoC 2025
1414
.. toctree::
1515
:maxdepth: 2
1616

17+
gsoc/reports/2025/scancodeio_varsha
1718
gsoc/reports/2025/scancodeio_aayush
1819
gsoc/reports/2025/scancodeio_manit
1920
gsoc/reports/2025/scancode_toolkit_alok
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
=====================================================
2+
Adding ability to store/query downloaded packages
3+
=====================================================
4+
5+
**Organization:** `AboutCode <https://aboutcode.org>`_
6+
7+
**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`_
8+
9+
10+
| **Varsha U N**
11+
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
12+
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
13+
14+
15+
**Mentors:**
16+
17+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
18+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
19+
20+
Overview
21+
--------
22+
23+
Currently ScanCode.io scans the packages but doesn’t store it.
24+
This makes it difficult for users to maintain a reference of packages used in their projects,
25+
meet source redistribution obligations, or revisit scanned packages for future.
26+
27+
This project enhanced ScanCode.io by adding the ability to store and query downloaded packages locally
28+
and re-use packages that were already scanned.
29+
30+
--------------------------------------------------------------------------------
31+
32+
Implementation
33+
--------------
34+
35+
The project involved the following key components and steps:
36+
37+
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
38+
:alt: Project Flow Diagram
39+
:align: center
40+
:width: 70%
41+
42+
Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
43+
44+
Storage System Development:
45+
46+
- Created a `DownloadStore` abstract base class in `archiving.py` to define the interface for managing package content and metadata storage.
47+
48+
- Built the `LocalFilesystemProvider` class to store downloads on the local filesystem, using a SHA256-based nested directory structure.
49+
50+
- Implemented methods for storing (`put`), retrieving (`get`), listing (`list`), and searching (`find`) downloads, with metadata saved in `origin-<hash>.json` files.
51+
52+
53+
Integration with ScanCode.io:
54+
55+
- Updated `pipelines/init.py` to incorporate the archiving system into ScanCode.io’s pipeline workflow, ensuring downloaded packages are stored during execution.
56+
57+
- Revised `input.py` to process package download inputs, passing content, `download_url`, `download_date`, and `filename` to the archiving system.
58+
59+
60+
User Interface Enhancements:
61+
62+
- Modified the project resource view to display stored package information, including download URLs and dates.
63+
64+
65+
Validation and Testing:
66+
67+
- Wrote unit tests in `test_archiving.py` to verify `LocalFilesystemProvider` functionality (`put`, `get`, `list`, `find`), testing normal cases, edge cases (e.g., empty files), and errors (e.g., duplicate origins).
68+
69+
Linked Pull Request:
70+
----------------------------------------
71+
72+
Add download archiving system with local filesystem provider -
73+
(https://github.com/aboutcode-org/scancode.io/pull/1815)
74+
75+
Related Issue:
76+
----------------------------------------
77+
78+
Store and retrieve on demand scanned packages/archives -
79+
(https://github.com/aboutcode-org/scancode.io/issues/1063)
80+
81+
Links:
82+
----------------------------------------
83+
84+
| Project Idea: `Idea Link <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
85+
| GSoC Project Page: `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
86+
| Proposal: `Proposal Link <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_
87+
88+
Closing Notes
89+
-------------
90+
91+
During the GSoC coding period, my mentors and I had weekly meetings to discuss progress, challenges, and next steps.
92+
Thank you so much to my mentors for being there every step of the way during GSoC 2025. Your encouragement and insights made a huge difference in my learning journey.
93+
94+

0 commit comments

Comments
 (0)