fossology · shaheemazmalmmd · Jul 29, 2025 · Jul 28, 2025
diff --git a/docs/2025/data-pipeline/updates/2025-07-16.md b/docs/2025/data-pipeline/updates/2025-07-16.md
@@ -0,0 +1,40 @@
+---
+title: Week 7
+author: Abdulsobur Oyewale
+tags: [gsoc25, Data Pipeline for Safaa]
+---
+
+<!--
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
+-->
+
+# WEEK 7
+*(July 16, 2025)*
+
+## Attendees:
+- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
+- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
+
+### Engagements
+* Last week I got a lot of reviews, comments and corrections on the current scripts I opened a pull request for.
+* Therefore what I did mostly this week was making adjustment and adhering to the advice and corrections of the comments i received, while also ensuring code quality.
+* Below are the corrections i made;
+  - Added individual copyright and license information in all the files created by me.
+  - Made the output path from fetched copyright content configurable via argument.
+  - Refactored the file name to contain date (DD/MM/YY formats) of the fetched content.
+  - Re-adjusted initializations to avoid exception errors.
+  - Add the ability for the contents in the server to be fetched in batches.
+  - Made fetched content input limit to be configurable through arguments
+  - Add some checks to sanitize the availability of DB environmental variable.
+  - Removed the calling of the fetch_copyright_data() unconditionally.
+  - Introduced a base path for all scripts used in the `pipeline.yml` file
+  - Combined the preprocess, declutter, and split scripts into a single utility file
+
+
+## Meeting Discussion:
+* I had the opportunity to show my mentors what I did this week, while also taking them through how i went by solving each issues and comments, while still maintaining code quality
+
+## Subsequent Steps
+* I will be proceeding with continuing adjustments on other of the comments received and areas that needs improvement
diff --git a/docs/2025/data-pipeline/updates/2025-07-23.md b/docs/2025/data-pipeline/updates/2025-07-23.md
@@ -0,0 +1,45 @@
+---
+title: Week 8
+author: Abdulsobur Oyewale
+tags: [gsoc25, Data Pipeline for Safaa]
+---
+
+<!--
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
+-->
+
+# WEEK 8
+*(July 23, 2025)*
+
+## Attendees:
+- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
+- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
+
+### Engagements
+* This week I started by continuing with the corrections from the reviews and comments I got from the current scripts we have.
+* Below are the adjustment I made;
+  - Updated the pipeline.yml to accept environmental variables for our DB.
+  - Used workflow secrets to handle injection of .env files into the pipeline.
+  - Renamed the pipeline flow `safaa-model-retraining` to showcase its functionalities
+  - Removed artifacts upload functionality since we aren't passing the artifacts among workflows
+
+* Also, this week I created a new branch named `testing` on my own end to test out changes we have made so far.
+  - The first testing required manual testing of the pipeline through GitHub actions manual triggering
+  - The second testing was automated by adding the `on: push: branches: [testing]` for `testing` branch, thereby triggering on github action anytime the branch gets an update
+
+    * Lastly since we are trying to find a better ways to install our dependencies, I introduced the use of shell script in our pipeline instead of a `requirements.txt` file
+
+
+## Meeting Discussion:
+* This week, i discussed with my mentors on what i have done so far regarding the corrections on comments. I was told to made some little improvements like;
+  - Direct installation of dependencies in the pipeline script.
+  - Removal of requirements.txt or shell scripts installations.
+  - Splitting of each processing into different commands using argument flags, thereby enabling us to understand steps in process.
+  - I was also tasked to introduce some global variable to some repetitive paths.
+
+* And lastly i was also told to introduce a pickle file into the system using the available data in the current safaa repository.
+
+## Subsequent Steps
+* I will proceed to making adjust and integrating new task while continuing on the work towards achieving the pipeline goals.