Skip to content

Conversation

@Joyakis
Copy link
Contributor

@Joyakis Joyakis commented Oct 15, 2025

Fixes

Description

Add Europeana API integration for metrics collection

This PR adds a new script europeana_fetch.py that fetches and aggregates data from the Europeana Search API.
The script collects high-level statistics about cultural heritage content available through Europeana, focusing on data provider distribution and content types rather than fragile license parsing.

Technical details

Script Location: scripts/1-fetch/europeana_fetch.py

  • Data Output: data/2025Q4/1-fetch/europeana_1_count.csv
  • Key Features:
    • Fetches data from Europeana Search API using multiple content queries
    • Aggregates by DATA_PROVIDER and content metadata
    • Includes proper error handling and API rate limiting
    • Integrates with existing project structure and git workflows
  • Environment: Updated env.example with EUROPEANA_API_KEY placeholder

Tests

  1. Set up Europeana API key in environment variables
  2. Run the script: python scripts/1-fetch/europeana_fetch.py --enable-save
  3. Verify CSV output is generated in data/2025Q4/1-fetch/europeana_1_count.csv
  4. Check that data contains aggregated counts by DATA_PROVIDER

Checklist

  • I have read and understood the Developer Certificate of Origin (DCO), below, which covers the contents of this pull request (PR).
  • My pull request doesn't include code or content generated with AI.
  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main or master).
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@Joyakis Joyakis requested review from a team as code owners October 15, 2025 19:48
@Joyakis Joyakis requested review from TimidRobot and possumbilities and removed request for a team October 15, 2025 19:48
@cc-open-source-bot cc-open-source-bot moved this to In review in TimidRobot Oct 15, 2025
@TimidRobot TimidRobot self-assigned this Oct 18, 2025
@TimidRobot TimidRobot changed the title Added Europeana integration Add Europeana integration Oct 20, 2025
@TimidRobot
Copy link
Member

@Joyakis I resolved the conflicts with the main branch (due to changes from another pull request being merged). Please remember to fetch the changes to your computer.

@Joyakis
Copy link
Contributor Author

Joyakis commented Oct 21, 2025

@Joyakis I resolved the conflicts with the main branch (due to changes from another pull request being merged). Please remember to fetch the changes to your computer.

@TimidRobot Done!

Copy link
Member

@TimidRobot TimidRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve all comments before requesting a new review

@TimidRobot
Copy link
Member

Please focus on resolving conversations before you add new features.

Please remove custom logging (print statements) and follow logging conventions in other scripts.

@Joyakis
Copy link
Contributor Author

Joyakis commented Oct 24, 2025

Please focus on resolving conversations before you add new features.

Please remove custom logging (print statements) and follow logging conventions in other scripts.

Hello @TimidRobot The script now gets all providers and rights.

Thank you for pointing out where the problem was.

I have also removed the unnecessary printing statements

@Joyakis
Copy link
Contributor Author

Joyakis commented Oct 28, 2025

2. you review the pull request Files changed tab yourself to see if there are any unintended files or other obvious issues

Hello....I reviewed the Files changed tab in my pull request as you advised.
Only the intended updates to europeana_fetch.py are listed.
The other files in the folder (env.example and sources.md) came from merging the latest main branch into my feature branch and they do not show up in the diff .

@Joyakis
Copy link
Contributor Author

Joyakis commented Oct 28, 2025

Greetings @TimidRobot
This is a screenshot of the data without themes

image

This is the one with themes

image

@Joyakis
Copy link
Contributor Author

Joyakis commented Oct 30, 2025

Hello @TimidRobot

I have updated both the env.example and sources file

Joyakis and others added 5 commits October 31, 2025 13:06
Copy link
Member

@TimidRobot TimidRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! Thank you!!

@TimidRobot TimidRobot merged commit 46dcd3f into creativecommons:main Nov 1, 2025
@github-project-automation github-project-automation bot moved this from In review to Done in TimidRobot Nov 1, 2025
@Joyakis
Copy link
Contributor Author

Joyakis commented Nov 1, 2025

Great work on this! Thank you!!

Happy to have done this!...Thank you as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add Europeana as a data source

3 participants