Skip to content

Working implementation of journal meta data using openalex. Passes te…#168

Merged
nuest merged 19 commits intomainfrom
feature/add-journal-data-5
Jul 8, 2025
Merged

Working implementation of journal meta data using openalex. Passes te…#168
nuest merged 19 commits intomainfrom
feature/add-journal-data-5

Conversation

@BharatVe
Copy link
Collaborator

@BharatVe BharatVe commented Jun 4, 2025

Closes #5

Data Model Changes
Added a new Journal model with fields name, issn_l, openalex_id, openalex_url, publisher_name, works_count, works_api_url and converted Publication.source from a text field to a ForeignKey pointing at this Journal model .

Serializers and ViewSets
Created a JournalSerializer exposing all eight journal fields, then updated PublicationSerializer (a GeoFeatureModelSerializer) to include a nested field source_details = JournalSerializer(source="source"). The PublicationViewSet filters out any publications lacking a valid source or null geometry so that GeoJSON serialization (via DRF‐GIS) never fails .

OpenAlex Sync Command
Wrote a management command update_openalex_journals that, for each Journal with a non-null ISSN-L, fetches metadata from OpenAlex (using requests.get("https://api.openalex.org/sources/issn:")). It populates or updates each journal’s openalex_id, openalex_url, publisher_name, works_count, and works_api_url, saving only when changes occur.

Test Coverage for Journals + Publications

  • Confirm /api/v1/journals/ lists all journals with the correct eight fields.
  • Confirm /api/v1/journals/{pk}/ returns exactly the expected fields.
  • Confirm that /api/v1/publications/ responses include a nested source_details block whose keys match the Journal model.
  • Confirm that filtering by ?journal_id= returns only publications whose source_details["id"] == pk .

TODO:

@BharatVe
Copy link
Collaborator Author

BharatVe commented Jun 4, 2025

@nuest I think the changes I made to add the journal functionality might be causing issues with harvesting in tasks.py. Would it be okay to change some of the previous code a little to fix the conflicts and get everything running smoothly?

@nuest
Copy link
Member

nuest commented Jun 4, 2025

Yes, of course. Please fix the tests. I can nevertheless make a first review even with the current changes. Tomorrow.

Copy link
Member

@nuest nuest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a major update, so let's try to get things as right as possible:

Let's change the name from "journals" to "sources", in line with the entity names in OpenAlex, see https://docs.openalex.org/api-entities/entities-overview

We certainly also want to geolocate preprint articles, and then "journal = preprint server" makes so sense.

"publications" also is not a good name? Right, we'll get to that in #169

"model": "publications.journal",
"pk": 1,
"fields": {
"name": "Nature",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand where you're coming from, but this is actually not what our values are.

If we want to use real journals here because their ISSN actually exists, then we go with diamond open access journals (https://en.wikipedia.org/wiki/Diamond_open_access).

https://github.com/loreabad6/doaj-geo is a good starting point, so let's use some that we might also want to collaborate with. Please add the following journals to the test data:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the test data, @BharatVe double checks.

@BharatVe BharatVe marked this pull request as ready for review June 30, 2025 20:57
@BharatVe
Copy link
Collaborator Author

BharatVe commented Jul 1, 2025

These are the errors I am having the most difficulty resolving

TypeError: cannot unpack non-iterable NoneType object
======================================================================
FAIL: test_parse_missing (tests.test_htmlparser.SimpleTest.test_parse_missing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/optimap/optimap/tests/test_htmlparser.py", line 58, in test_parse_missing
    self.assertEqual(period_start, [None])
AssertionError: Lists differ: [] != [None]
Second list contains 1 additional elements.
First extra element 0:
None
- []
+ [None]

FAIL: test_api_publication_2 (tests.test_harvesting.SimpleTest.test_api_publication_2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/optimap/optimap/tests/test_harvesting.py", line 109, in test_api_publication_2
    self.assertEqual(resp.status_code, 200)
AssertionError: 404 != 200
======================================================================
FAIL: test_no_duplicates_after_initial_harvest (tests.test_harvesting.SimpleTest.test_no_duplicates_after_initial_harvest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/optimap/optimap/tests/test_harvesting.py", line 162, in test_no_duplicates_after_initial_harvest
    self.assertEqual(Publication.objects.count(), 2)
AssertionError: 0 != 2


FAIL: test_api_redirect (tests.test_publications_api.PublicationsApiTest.test_api_redirect)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/optimap/optimap/tests/test_publications_api.py", line 48, in test_api_redirect
    self.assertEqual(response.status_code, 302)
AssertionError: 404 != 302
======================================================================
FAIL: test_publication_includes_source_details (tests.test_source_publication.PublicationAPITest.test_publication_includes_source_details)
GET /api/v1/publications/ should return ≥1 publication,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/optimap/optimap/tests/test_source_publication.py", line 185, in test_publication_includes_source_details
    self.assertEqual(len(pubs_list), 1)
AssertionError: 2 != 1

@BharatVe BharatVe closed this Jul 1, 2025
@BharatVe BharatVe reopened this Jul 1, 2025

if src and getattr(src, "is_preprint", False) and geom.empty:
try:
loc = Nominatim(user_agent="optimap-tasks").geocode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the geocoding come from? Can you please clarify which issue this implements?

)

ps_list, pe_list = extract_timeperiod_from_html(soup)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never silently catch exceptions.

@nuest
Copy link
Member

nuest commented Jul 1, 2025

This my my local test output:

Found 49 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
....................ss.....................F...ss
======================================================================
FAIL: test_publication_includes_source_details (tests.test_source_publication.PublicationAPITest)
GET /api/v1/publications/ should return ≥1 publication,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/daniel/git/KOMET/optimap/tests/test_source_publication.py", line 203, in test_publication_includes_source_details
    self.assertIn(key, details)
AssertionError: 'openalex_id' not found in {'id': 42, 'name': 'API source', 'issn_l': '1111-2222', 'openalex_url': 'https://openalex.org/S011112222'}

----------------------------------------------------------------------
Ran 49 tests in 20.408s

FAILED (failures=1, skipped=4)
Destroying test database for alias 'default'...

@nuest
Copy link
Member

nuest commented Jul 8, 2025

Failed remaining test obsoleted by 2fa0215

@nuest nuest merged commit 0e1809c into main Jul 8, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Journal as a proper resource in the model

2 participants