Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/2025/data-pipeline/updates/2025-07-16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Week 7
author: Abdulsobur Oyewale
tags: [gsoc25, Data Pipeline for Safaa]
---

<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
-->

# WEEK 7
*(July 16, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

### Engagements
* Last week I got a lot of reviews, comments and corrections on the current scripts I opened a pull request for.
* Therefore what I did mostly this week was making adjustment and adhering to the advice and corrections of the comments i received, while also ensuring code quality.
* Below are the corrections i made;
- Added individual copyright and license information in all the files created by me.
- Made the output path from fetched copyright content configurable via argument.
- Refactored the file name to contain date (DD/MM/YY formats) of the fetched content.
- Re-adjusted initializations to avoid exception errors.
- Add the ability for the contents in the server to be fetched in batches.
- Made fetched content input limit to be configurable through arguments
- Add some checks to sanitize the availability of DB environmental variable.
- Removed the calling of the fetch_copyright_data() unconditionally.
- Introduced a base path for all scripts used in the `pipeline.yml` file
- Combined the preprocess, declutter, and split scripts into a single utility file


## Meeting Discussion:
* I had the opportunity to show my mentors what I did this week, while also taking them through how i went by solving each issues and comments, while still maintaining code quality

## Subsequent Steps
* I will be proceeding with continuing adjustments on other of the comments received and areas that needs improvement
45 changes: 45 additions & 0 deletions docs/2025/data-pipeline/updates/2025-07-23.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Week 8
author: Abdulsobur Oyewale
tags: [gsoc25, Data Pipeline for Safaa]
---

<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
-->

# WEEK 8
*(July 23, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

### Engagements
* This week I started by continuing with the corrections from the reviews and comments I got from the current scripts we have.
* Below are the adjustment I made;
- Updated the pipeline.yml to accept environmental variables for our DB.
- Used workflow secrets to handle injection of .env files into the pipeline.
- Renamed the pipeline flow `safaa-model-retraining` to showcase its functionalities
- Removed artifacts upload functionality since we aren't passing the artifacts among workflows

* Also, this week I created a new branch named `testing` on my own end to test out changes we have made so far.
- The first testing required manual testing of the pipeline through GitHub actions manual triggering
- The second testing was automated by adding the `on: push: branches: [testing]` for `testing` branch, thereby triggering on github action anytime the branch gets an update

* Lastly since we are trying to find a better ways to install our dependencies, I introduced the use of shell script in our pipeline instead of a `requirements.txt` file


## Meeting Discussion:
* This week, i discussed with my mentors on what i have done so far regarding the corrections on comments. I was told to made some little improvements like;
- Direct installation of dependencies in the pipeline script.
- Removal of requirements.txt or shell scripts installations.
- Splitting of each processing into different commands using argument flags, thereby enabling us to understand steps in process.
- I was also tasked to introduce some global variable to some repetitive paths.

* And lastly i was also told to introduce a pickle file into the system using the available data in the current safaa repository.

## Subsequent Steps
* I will proceed to making adjust and integrating new task while continuing on the work towards achieving the pipeline goals.