Skip to content

Commit 20a26a3

Browse files
committed
fix CI issues
Signed-off-by: Varsha U N <[email protected]>
1 parent dc6a35f commit 20a26a3

File tree

1 file changed

+115
-94
lines changed

1 file changed

+115
-94
lines changed
Lines changed: 115 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,115 @@
1-
=====================================================
2-
Adding ability to store/query downloaded packages
3-
=====================================================
4-
5-
**Organization:** `AboutCode <https://aboutcode.org>`_
6-
7-
**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`_
8-
9-
10-
| **Varsha U N**
11-
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
12-
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
13-
14-
15-
**Mentors:**
16-
17-
- `Philippe Ombredanne <https://github.com/pombredanne>`_
18-
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
19-
20-
Overview
21-
--------
22-
23-
Currently ScanCode.io scans the packages but doesn’t store it.
24-
This makes it difficult for users to maintain a reference of packages used in their projects,
25-
meet source redistribution obligations, or revisit scanned packages for future.
26-
27-
This project enhanced ScanCode.io by adding the ability to store and query downloaded packages locally
28-
and re-use packages that were already scanned.
29-
30-
--------------------------------------------------------------------------------
31-
32-
Implementation
33-
--------------
34-
35-
The project involved the following key components and steps:
36-
37-
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
38-
:alt: Project Flow Diagram
39-
:align: center
40-
:width: 70%
41-
42-
Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
43-
44-
Storage System Development:
45-
46-
- Created a `DownloadStore` abstract base class in `archiving.py` to define the interface for managing package content and metadata storage.
47-
48-
- Built the `LocalFilesystemProvider` class to store downloads on the local filesystem, using a SHA256-based nested directory structure.
49-
50-
- Implemented methods for storing (`put`), retrieving (`get`), listing (`list`), and searching (`find`) downloads, with metadata saved in `origin-<hash>.json` files.
51-
52-
53-
Integration with ScanCode.io:
54-
55-
- Updated `pipelines/init.py` to incorporate the archiving system into ScanCode.io’s pipeline workflow, ensuring downloaded packages are stored during execution.
56-
57-
- Revised `input.py` to process package download inputs, passing content, `download_url`, `download_date`, and `filename` to the archiving system.
58-
59-
60-
User Interface Enhancements:
61-
62-
- Modified the project resource view to display stored package information, including download URLs and dates.
63-
64-
65-
Validation and Testing:
66-
67-
- Wrote unit tests in `test_archiving.py` to verify `LocalFilesystemProvider` functionality (`put`, `get`, `list`, `find`), testing normal cases, edge cases (e.g., empty files), and errors (e.g., duplicate origins).
68-
69-
Linked Pull Request:
70-
----------------------------------------
71-
72-
Add download archiving system with local filesystem provider -
73-
(https://github.com/aboutcode-org/scancode.io/pull/1815)
74-
75-
Related Issue:
76-
----------------------------------------
77-
78-
Store and retrieve on demand scanned packages/archives -
79-
(https://github.com/aboutcode-org/scancode.io/issues/1063)
80-
81-
Links:
82-
----------------------------------------
83-
84-
| Project Idea: `Idea Link <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
85-
| GSoC Project Page: `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
86-
| Proposal: `Proposal Link <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_
87-
88-
Closing Notes
89-
-------------
90-
91-
During the GSoC coding period, my mentors and I had weekly meetings to discuss progress, challenges, and next steps.
92-
Thank you so much to my mentors for being there every step of the way during GSoC 2025. Your encouragement and insights made a huge difference in my learning journey.
93-
94-
1+
###################################################
2+
Adding ability to store/query downloaded packages
3+
###################################################
4+
5+
**Organization:** `AboutCode <https://aboutcode.org>`_
6+
7+
**Project:** `ScanCode.io
8+
<https://github.com/aboutcode-org/scancode.io>`_
9+
10+
| **Varsha U N**
11+
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
12+
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
13+
14+
**Mentors:**
15+
16+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
17+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
18+
19+
**********
20+
Overview
21+
**********
22+
23+
Currently ScanCode.io scans the packages but doesn’t store it. This
24+
makes it difficult for users to maintain a reference of packages used in
25+
their projects, meet source redistribution obligations, or revisit
26+
scanned packages for future.
27+
28+
This project enhanced ScanCode.io by adding the ability to store and
29+
query downloaded packages locally and re-use packages that were already
30+
scanned.
31+
32+
----
33+
34+
****************
35+
Implementation
36+
****************
37+
38+
The project involved the following key components and steps:
39+
40+
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
41+
:alt: Project Flow Diagram
42+
:align: center
43+
:width: 70%
44+
45+
Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
46+
47+
Storage System Development:
48+
49+
- Created a `DownloadStore` abstract base class in `archiving.py` to
50+
define the interface for managing package content and metadata
51+
storage.
52+
53+
- Built the `LocalFilesystemProvider` class to store downloads on the
54+
local filesystem, using a SHA256-based nested directory structure.
55+
56+
- Implemented methods for storing (`put`), retrieving (`get`), listing
57+
(`list`), and searching (`find`) downloads, with metadata saved in
58+
`origin-<hash>.json` files.
59+
60+
Integration with ScanCode.io:
61+
62+
- Updated `pipelines/init.py` to incorporate the archiving system into
63+
ScanCode.io’s pipeline workflow, ensuring downloaded packages are
64+
stored during execution.
65+
66+
- Revised `input.py` to process package download inputs, passing
67+
content, `download_url`, `download_date`, and `filename` to the
68+
archiving system.
69+
70+
User Interface Enhancements:
71+
72+
- Modified the project resource view to display stored package
73+
information, including download URLs and dates.
74+
75+
Validation and Testing:
76+
77+
- Wrote unit tests in `test_archiving.py` to verify
78+
`LocalFilesystemProvider` functionality (`put`, `get`, `list`,
79+
`find`), testing normal cases, edge cases (e.g., empty files), and
80+
errors (e.g., duplicate origins).
81+
82+
**********************
83+
Linked Pull Request:
84+
**********************
85+
86+
Add download archiving system with local filesystem provider -
87+
(https://github.com/aboutcode-org/scancode.io/pull/1815)
88+
89+
****************
90+
Related Issue:
91+
****************
92+
93+
Store and retrieve on demand scanned packages/archives -
94+
(https://github.com/aboutcode-org/scancode.io/issues/1063)
95+
96+
********
97+
Links:
98+
********
99+
100+
| Project Idea: `Idea Link
101+
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
102+
| GSoC Project Page: `GSoC 2025
103+
<https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
104+
| Proposal: `Proposal Link
105+
<https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_
106+
107+
***************
108+
Closing Notes
109+
***************
110+
111+
During the GSoC coding period, my mentors and I had weekly meetings to
112+
discuss progress, challenges, and next steps. Thank you so much to my
113+
mentors for being there every step of the way during GSoC 2025. Your
114+
encouragement and insights made a huge difference in my learning
115+
journey.

0 commit comments

Comments
 (0)