Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/2025/data-pipeline/updates/2025-07-16.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
*(July 16, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

### Engagements
Expand Down
2 changes: 1 addition & 1 deletion docs/2025/data-pipeline/updates/2025-07-23.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
*(July 23, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

### Engagements
Expand Down
44 changes: 44 additions & 0 deletions docs/2025/data-pipeline/updates/2025-07-30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: Week 9
author: Abdulsobur Oyewale
tags: [gsoc25, Data Pipeline for Safaa]
---

<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
-->

# WEEK 9
*(July 23, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
- [Kaushlendra Pratap](https://github.com/Kaushl2208)

### Engagements
* Lastly I was tasked told to introduce a pickle file into our pipeline system using the available data in the current safaa repository.
* This week I started by rewriting how some things like dependency installation are injected into our pipeline
* Below are the summary of things i re-wrote for our pipeline;
- Declared global variable path for static path in the repo
- Splitted all pipeline steps into different independent steps for better tracking of processes
- Introduced the use of arguments to run each steps in the pipeline sequentially.
- Removed `requirements.txt` file and introduced direct dependency installation in the pipeline script

* To introduce a pickle file into the pipeline, i introduced a training step into pipeline, by integrating the existing safaa agent training related script.
* Also, this week I created a new branch named `testing-train` on my own end to test out changes we have made so far, and monitor/simulate how the training will run in the pipeline.
- I introduced a new argument named "--train" in the retraining steps for the training process.
- Routed our model to be saved into a new path so we can visualize pickle file if the training was successful and avoid different permissions error.

* The image below shows our completed training process
![image](/img/data-pipeline/train.png)

## Meeting Discussion:
* This week, I had discussion with my mentors on what I have done so far regarding training and modifications I made.
* We also had hands on session with update on our sql to ignore contents ignored on the server when fetching copyright content.
* And lastly i was also told to introduce testing of our trained model in the pipeline

## Subsequent Steps
* I will proceed to implementing the testing phase in the pipeline.
40 changes: 40 additions & 0 deletions docs/2025/data-pipeline/updates/2025-08-06.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Week 10
author: Abdulsobur Oyewale
tags: [gsoc25, Data Pipeline for Safaa]
---

<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
-->

# WEEK 10
*(July 23, 2025)*

## Attendees:
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
- [Kaushlendra Pratap](https://github.com/Kaushl2208)

### Engagements
* I began this week by introducing the testing process to the pipline.
* A lot of challenges were faced, which are;
- Permission error to save in the model main path in the current working director
![image](/img/data-pipeline/permission.png)
- If we try saving our model in a new path, we will encounter error regarding no available entity_recognizer or declutter_model path in the directory.
![image](/img/data-pipeline/noentpath.png)

* To successfully integrate the testing process in the pipeline, i introduced the entity_recognizer folder and declutter_model folder into the directory we saved our model during training.
* This adjustment will allow us to test the model we saved in the new path from training
* With this we will be able to test and visualized our testing metrics anything the pipeline runs. This will enable us to make decision in the future on the next steps of any newly trained model coming from the pipeline with respect to the metrics
![image](/img/data-pipeline/test.png)


## Meeting Discussion:
* This week, I discussed with my mentors on what i have done so far regarding the integration of testing to the pipeline.
* I was able to present them with the testing and metrics visualization through the GitHub action branch in my repo.
* A decision was also made regarding how the newly trained model from the pipeline can be uploaded as an artifact, automatically create a new repository branch and raised as a new pull request from the pipeline.

## Subsequent Steps
* I will proceed to working on the next task by implementing the automation of raising a new pull request from the pipeline.
Binary file added static/img/data-pipeline/noentpath.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/data-pipeline/permission.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/data-pipeline/test.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/data-pipeline/train.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.