Skip to content

Conversation

@Goziee-git
Copy link
Contributor

@Goziee-git Goziee-git commented Oct 10, 2025

Fixes

Description

the OTL_education_read_csv.pyscript reads the pre-automation OTL.csv file and copies it to the current quarter's data directory for processing.

The script:
• Reads from pre-automation/education/datasets/OTL.csv
• Copies the data to data/{quarter}/1-fetch/otl_raw_data.csv
• Includes proper error handling and logging
• Uses the --enable-save flag to control whether data is actually saved
• Follows the project's established patterns for fetch phase scripts

Technical Note: Script Execution Requirements

Issue: The OTL_education_read_csv.py script fails with ModuleNotFoundError: No module named 'shared' when run directly from the 1-fetch directory, necessitating the need for adjusting the need import structure. if import is adjusted, the pre-commit fails.

Root Cause: The script imports the shared module from the parent scripts directory, but Python cannot locate it without proper path configuration.

Checklist

  • I have read and understood the Developer Certificate of Origin (DCO), below, which covers the contents of this pull request (PR).
  • My pull request doesn't include code or content generated with AI.
  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main or master).
  • My commit messages follow [best practices][best_practices].
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@Goziee-git Goziee-git requested review from a team as code owners October 10, 2025 05:32
@Goziee-git Goziee-git requested review from TimidRobot and possumbilities and removed request for a team October 10, 2025 05:32
@cc-open-source-bot cc-open-source-bot moved this to In review in TimidRobot Oct 10, 2025
Copy link
Member

@TimidRobot TimidRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix PR description

Please correct the Checklist in the pull request (PR) description (use [x] where appropriate).

Relatedly it isn't appropriate to modify the template. The following change isn't welcome:

- <!-- Replace the [ ] with [x] to check the boxes (there is no space between x and square brackets). -->
+ <!-- Replace the [ ] with [y/n] to check the boxes (there is no space between x and square brackets). -->

Query API

The goal of this project is to regularly query APIs. The pre-automation/education/datasets/OTL.csv file is not a valid fetch source.

Lowercase file names

Please use lowercase files names.

@TimidRobot TimidRobot self-assigned this Oct 10, 2025
@Goziee-git
Copy link
Contributor Author

Thank you. I'ld refocus on the task on fetching data from sources using API since the sources in pre-automation/ are largely not valid data sources currently.

@TimidRobot
Copy link
Member

TimidRobot commented Oct 10, 2025

Also:

Technical Note: Script Execution Requirements

Issue: The OTL_education_read_csv.py script fails with ModuleNotFoundError: No module named 'shared' when run directly from the 1-fetch directory, necessitating the need for adjusting the need import structure. if import is adjusted, the pre-commit fails.

Root Cause: The script imports the shared module from the parent scripts directory, but Python cannot locate it without proper path configuration.

The scripts should be run from the root of the repository using pipenv run ./SCRIPT_PATH. For example:

pipenv run ./scripts/1-fetch/github_fetch.py -h

When run this way, the shared library (scripts/shared.py) provides easy access to all of the necessary paths and all of the modules managed by pipenv are available.

Thank you for highlighting this issue. I've addressed the missing documentation by adding the Running the scripts section to the repository README.

@Goziee-git
Copy link
Contributor Author

Fix PR description

Please correct the Checklist in the pull request (PR) description (use [x] where appropriate).

Relatedly it isn't appropriate to modify the template. The following change isn't welcome:

- <!-- Replace the [ ] with [x] to check the boxes (there is no space between x and square brackets). -->
+ <!-- Replace the [ ] with [y/n] to check the boxes (there is no space between x and square brackets). -->

Query API

The goal of this project is to regularly query APIs. The pre-automation/education/datasets/OTL.csv file is not a valid fetch source.

Lowercase file names

Please use lowercase files names.

Hello @TimidRobot, I apologise for the delay here, i've had to read through your reviews to pin down exactly what the issue here was and i have made changes as requested for this PR. As regards the lower case for file names, I have adhered to that pattern for file naming in subsequent scripts submitted for your review other than the otl_fetch.py script because you mentioned that the OTL.csv data source is not a valid source. I'ld like to kindly ask that you please reopen the closed PR submitted as to give me a good chance to contributing to the Quantifying Creative Commons Project. I acknowledge my errors in this PR and i'll pay attention to it going forward. Thank You🙏🏼🙏🏼

@TimidRobot TimidRobot added the ⛔️ status: discarded Will not be worked on label Oct 17, 2025
@TimidRobot TimidRobot changed the title Add OTL education CSV reader script [Discarded] Add OTL education CSV reader script Oct 17, 2025
@TimidRobot TimidRobot closed this Oct 17, 2025
@github-project-automation github-project-automation bot moved this from In review to Done in TimidRobot Oct 17, 2025
@TimidRobot TimidRobot changed the title [Discarded] Add OTL education CSV reader script [DISCARDED] Add OTL education CSV reader script Nov 4, 2025
@Goziee-git Goziee-git deleted the feature/OTL-education branch November 5, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⛔️ status: discarded Will not be worked on

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add OTL (Open Texbook Library) Education data source fetch script

2 participants