Skip to content

Commit 56bef77

Browse files
committed
add the final report of gsoc
Signed-off-by: Varsha U N <[email protected]>
1 parent 9bd7f14 commit 56bef77

File tree

3 files changed

+141
-0
lines changed

3 files changed

+141
-0
lines changed
89.3 KB
Loading

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ GSoC 2025
1414
.. toctree::
1515
:maxdepth: 2
1616

17+
gsoc/reports/2025/scancodeio_varsha
1718
gsoc/reports/2025/scancodeio_aayush
1819
gsoc/reports/2025/scancodeio_manit
1920
gsoc/reports/2025/scancode_toolkit_alok
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
=====================================================
2+
Adding Ability to Store and Query Downloaded Packages
3+
=====================================================
4+
5+
**Organization:** `AboutCode <https://aboutcode.org>`__
6+
7+
**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`__
8+
9+
| **Contributor:** Varsha U N
10+
| **GitHub:** `VarshaUN <https://github.com/VarshaUN>`__
11+
| **LinkedIn:** `Varsha U N <https://www.linkedin.com/in/varsha-un/>`__
12+
13+
**Mentors:**
14+
- `Philippe Ombredanne <https://github.com/pombredanne>`__
15+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`__
16+
17+
Overview
18+
--------
19+
20+
ScanCode.io currently stores scanned packages on disk without a centralized index,
21+
leading to duplicate storage, project-specific data, and potential data loss when
22+
inputs are deleted. This project enhances ScanCode.io by introducing structured
23+
package storage and querying, enabling indexing, reuse across projects, and
24+
reliable preservation.
25+
26+
Implementation
27+
--------------
28+
29+
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
30+
:alt: Project Flow Diagram
31+
:align: center
32+
:width: 70%
33+
34+
This project addresses the limitations of ScanCode.io's unstructured package
35+
storage by adding a system to index, reuse, and preserve packages reliably.
36+
37+
Storage System Development:
38+
39+
- Created a `DownloadStore` abstract base class in `archiving.py` to
40+
define the interface for managing package content and metadata
41+
storage.
42+
43+
- Built the `LocalFilesystemProvider` class to store downloads on the
44+
local filesystem, using a SHA256-based nested directory structure.
45+
46+
- Implemented methods for storing (`put`), retrieving (`get`), listing
47+
(`list`), and searching (`find`) downloads, with metadata saved in
48+
`origin-<hash>.json` files.
49+
50+
Integration with ScanCode.io:
51+
52+
- Updated `pipelines/init.py` to incorporate the archiving system into
53+
ScanCode.io’s pipeline workflow, ensuring downloaded packages are
54+
stored during execution.
55+
56+
- Revised `input.py` to process package download inputs, passing
57+
content, `download_url`, `download_date`, and `filename` to the
58+
archiving system.
59+
60+
User Interface Enhancements:
61+
62+
- Modified the project resource view to display stored package
63+
information, including download URLs and dates.
64+
65+
Validation and Testing:
66+
67+
- Wrote unit tests in `test_archiving.py` to verify
68+
`LocalFilesystemProvider` functionality (`put`, `get`, `list`,
69+
`find`), testing normal cases, edge cases (e.g., empty files), and
70+
errors (e.g., duplicate origins).
71+
72+
Linked Pull Requests
73+
--------------------
74+
75+
.. list-table::
76+
:widths: 10 40 20
77+
:header-rows: 1
78+
79+
* - Sr. No
80+
- Name
81+
- Link
82+
* - 1
83+
- Add download archiving system
84+
- `scancode.io#1815 <https://github.com/aboutcode-org/scancode.io/pull/1815>`__
85+
* - 2
86+
- Support local package storage
87+
- `scancode.io#1685 <https://github.com/aboutcode-org/scancode.io/pull/1685>`__
88+
89+
Related Issues
90+
--------------
91+
92+
.. list-table::
93+
:widths: 10 40 20
94+
:header-rows: 1
95+
96+
* - Sr. No
97+
- Name
98+
- Link
99+
* - 1
100+
- Store and retrieve scanned packages
101+
- `#1063 <https://github.com/aboutcode-org/scancode.io/issues/1063>`__
102+
* - 2
103+
- Support local package storage
104+
- `#1683 <https://github.com/aboutcode-org/scancode.io/issues/1683>`__
105+
106+
Pre-GSoC Work
107+
-------------
108+
109+
Here are some PRs submitted before GSoC:
110+
111+
- `Add bluefin-container image support <https://github.com/aboutcode-org/scancode.io/pull/1620>`__
112+
- `Tag whitedout files <https://github.com/aboutcode-org/scancode.io/pull/1529>`__
113+
- `Support python-private-classifier <https://github.com/aboutcode-org/scancode-toolkit/pull/4075>`__
114+
- `Parse labels in Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
115+
- `Add OCI labels to Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
116+
- `Extract LibreOffice documents <https://github.com/aboutcode-org/extractcode/pull/67>`__
117+
118+
Links
119+
-----
120+
121+
- **Project Idea:** `GSoC 2025 Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`__
122+
- **GSoC Project Page:** `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`__
123+
- **Proposal:** `Project Proposal <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`__
124+
125+
Future Work
126+
-----------
127+
128+
Future enhancements include implementing the web UI for the `LocalFilesystemProvider`
129+
to enable package uploads, searches, listings, and retrievals in ScanCode.io, with
130+
Django views, templates, and URL routes, backed by comprehensive testing. Additionally,
131+
integrating an external cloud storage option (e.g., AWS S3) alongside the local
132+
filesystem will extend the `DownloadStore` interface, providing scalable and remote
133+
storage capabilities.
134+
135+
Closing Note
136+
------------
137+
138+
During GSoC 2025, my mentors and I held weekly meetings to discuss progress,
139+
challenges, and next steps. I am deeply grateful to my mentors for their guidance
140+
and support, which greatly enriched my learning experience.

0 commit comments

Comments
 (0)