forked from ifgi/optimetaPortal
-
Notifications
You must be signed in to change notification settings - Fork 0
Working implementation of journal meta data using openalex. Passes te… #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
10df5e1
Updates for adding ISSN information and adding OpenAlex functionality.
1715735
Updated Sync files and added information
66272a8
Update journal entries, corrected data retrival, added api and data m…
4b64c3b
Changes for updated ISSN
1e7a1ee
Working implementation of journal meta data using openalex. Passes te…
783ab4e
Updated test_data and tests, changed journal to source and added addi…
a88e631
Merge branch 'feature/add-journal-data-5' into add-journal-data-5
1396b8f
Merge branch 'main' into feature/add-journal-data-5
BharatVe f16a39d
Fix merge issues and change remaining journal to source changes
2f7ab63
Updated tets and files, some errors still need to be worked on.
f4aa463
Minor updates, testing ongoing
885a61f
Update requirements-dev.txt
BharatVe abd419a
Update urls.py
BharatVe d890cb2
re-add geopy dependency, wrongly removed in other PR
nuest 94b93ca
re-add historic test data
nuest 7490e8f
fix tests around harvesting
nuest cff2171
Merge branch 'main' into feature/add-journal-data-5
nuest f2849cc
fix CI
nuest 240db24
Fix tests
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,14 @@ | ||
| """Publications API URL Configuration.""" | ||
|
|
||
| from rest_framework import routers | ||
|
|
||
| from publications.viewsets import PublicationViewSet, SubscriptionViewset | ||
| from publications.viewsets import ( JournalViewSet, | ||
| PublicationViewSet, | ||
| SubscriptionViewSet, | ||
| ) | ||
|
|
||
| router = routers.DefaultRouter() | ||
| router.register(r"publications", PublicationViewSet) | ||
| router.register(r"subscriptions", SubscriptionViewset, basename='subscription') | ||
| router.register(r"journals", JournalViewSet, basename="journal") | ||
| router.register(r"publications", PublicationViewSet, basename="publication") | ||
| router.register(r"subscriptions", SubscriptionViewSet, basename="subscription") | ||
|
|
||
| urlpatterns = router.urls |
76 changes: 76 additions & 0 deletions
76
publications/management/commands/update_openalex_journals.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # publications/management/commands/update_openalex_journals.py | ||
|
|
||
| from django.core.management.base import BaseCommand | ||
| from publications.models import Journal | ||
| import requests | ||
|
|
||
| def fetch_openalex_for_issn(issn: str) -> dict | None: | ||
| """ | ||
| Query OpenAlex for a given ISSN-L and return the JSON dict. | ||
| Follows 302 redirects if necessary. | ||
| """ | ||
| try: | ||
| # Initial request to /sources/issn:<ISSN> | ||
| resp = requests.get(f"https://api.openalex.org/sources/issn:{issn}", timeout=10) | ||
| # If OpenAlex returns a 302 redirect, follow it to the canonical URL | ||
| if resp.status_code == 302 and "Location" in resp.headers: | ||
| resp = requests.get(resp.headers["Location"], timeout=10) | ||
| if resp.status_code == 200: | ||
| return resp.json() | ||
| except requests.RequestException: | ||
| pass | ||
| return None | ||
|
|
||
| class Command(BaseCommand): | ||
| help = "Update Journal metadata (openalex_id, publisher_name, works_count, works_api_url, etc.) from OpenAlex." | ||
|
|
||
| def handle(self, *args, **options): | ||
| journals_qs = Journal.objects.exclude(issn_l__isnull=True) | ||
| total = journals_qs.count() | ||
| self.stdout.write(f"Found {total} journal(s) with ISSN-L.") | ||
|
|
||
| for journal in journals_qs: | ||
| data = fetch_openalex_for_issn(journal.issn_l) | ||
| if not data: | ||
| self.stdout.write(f"Skipped (no data): {journal.name}") | ||
nuest marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| continue | ||
|
|
||
| changed = False | ||
|
|
||
| # 1. openalex_id & openalex_url | ||
| new_openalex = data.get("id") # e.g., "https://openalex.org/S137773608" | ||
| if new_openalex and journal.openalex_id != new_openalex: | ||
| journal.openalex_id = new_openalex | ||
BharatVe marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| journal.openalex_url = new_openalex # mirror the same URL | ||
BharatVe marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| changed = True | ||
|
|
||
| # 2. works_count & works_api_url | ||
| new_works_count = data.get("works_count") | ||
| if new_works_count is not None and journal.works_count != new_works_count: | ||
| journal.works_count = new_works_count | ||
| changed = True | ||
|
|
||
| api_url = data.get("works_api_url") | ||
| if api_url and journal.works_api_url != api_url: | ||
| journal.works_api_url = api_url | ||
| changed = True | ||
|
|
||
| # 3. publisher_name: read from "host_organization.display_name" | ||
| host_org = data.get("host_organization", {}) | ||
| new_publisher = None | ||
| if isinstance(host_org, dict): | ||
| new_publisher = host_org.get("display_name") | ||
| # Fallback: if still None, use data["display_name"] as proxy | ||
| if not new_publisher: | ||
| new_publisher = data.get("display_name") | ||
| if new_publisher and journal.publisher_name != new_publisher: | ||
| journal.publisher_name = new_publisher | ||
| changed = True | ||
nuest marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if changed: | ||
| journal.save() | ||
| self.stdout.write(f"Updated: {journal.name} ({journal.issn_l})") | ||
| else: | ||
| self.stdout.write(f"Skipped (unchanged): {journal.name}") | ||
|
|
||
| self.stdout.write("Done updating OpenAlex metadata.") | ||
28 changes: 28 additions & 0 deletions
28
publications/migrations/0004_journal_alter_publication_source.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Generated by Django 5.1.9 on 2025-06-02 11:00 | ||
BharatVe marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| import django.db.models.deletion | ||
| from django.db import migrations, models | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('publications', '0003_remove_customuser_deleted_and_more'), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.CreateModel( | ||
| name='Journal', | ||
| fields=[ | ||
| ('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
| ('name', models.CharField(max_length=255)), | ||
| ('issn_l', models.CharField(blank=True, max_length=9, null=True)), | ||
| ('openalex_id', models.CharField(blank=True, max_length=50, null=True)), | ||
| ], | ||
| ), | ||
| migrations.AlterField( | ||
| model_name='publication', | ||
| name='source', | ||
| field=models.ForeignKey(null=True, on_delete=django.db.models.deletion.SET_NULL, related_name='publications', to='publications.journal'), | ||
| ), | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| from django.db import migrations, models | ||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('publications', '0004_journal_alter_publication_source'), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AddField( | ||
| model_name='journal', | ||
| name='publisher_name', | ||
| field=models.CharField( | ||
| max_length=255, | ||
| null=True, | ||
| blank=True, | ||
| help_text='Name of the publisher as returned by OpenAlex' | ||
| ), | ||
| ), | ||
| migrations.AddField( | ||
| model_name='journal', | ||
| name='works_count', | ||
| field=models.IntegerField( | ||
| null=True, | ||
| blank=True, | ||
| help_text='Total number of works (articles, books, etc.) from this journal' | ||
| ), | ||
| ), | ||
| migrations.AddField( | ||
| model_name='journal', | ||
| name='works_api_url', | ||
| field=models.URLField( | ||
| max_length=512, | ||
| null=True, | ||
| blank=True, | ||
| help_text='API endpoint to list all works from this journal' | ||
| ), | ||
| ), | ||
| migrations.AddField( | ||
| model_name='journal', | ||
| name='openalex_url', | ||
| field=models.URLField( | ||
| max_length=512, | ||
| null=True, | ||
| blank=True, | ||
| help_text='Canonical OpenAlex URL for this journal (source.id)' | ||
| ), | ||
| ), | ||
| ] |
33 changes: 33 additions & 0 deletions
33
publications/migrations/0006_alter_journal_openalex_url_and_more.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| # Generated by Django 5.1.9 on 2025-06-02 14:39 | ||
|
|
||
| from django.db import migrations, models | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('publications', '0005_journal_extra_fields'), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AlterField( | ||
| model_name='journal', | ||
| name='openalex_url', | ||
| field=models.URLField(blank=True, max_length=512, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name='journal', | ||
| name='publisher_name', | ||
| field=models.CharField(blank=True, max_length=255, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name='journal', | ||
| name='works_api_url', | ||
| field=models.URLField(blank=True, max_length=512, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name='journal', | ||
| name='works_count', | ||
| field=models.IntegerField(blank=True, null=True), | ||
| ), | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand where you're coming from, but this is actually not what our values are.
If we want to use real journals here because their ISSN actually exists, then we go with diamond open access journals (https://en.wikipedia.org/wiki/Diamond_open_access).
https://github.com/loreabad6/doaj-geo is a good starting point, so let's use some that we might also want to collaborate with. Please add the following journals to the test data:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in the test data, @BharatVe double checks.