Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
0783ca9
Add arXiv data fetching and processing functionality
Goziee-git Oct 11, 2025
6e16e63
Delete scripts/3-report/gcs_report.py
Goziee-git Oct 11, 2025
9ea0071
added fixes: refactor request and license extraction logic
Goziee-git Oct 16, 2025
f8c9774
Refactor HTTP requests and license extraction logic.
Goziee-git Oct 16, 2025
5348e8b
refactord to use url library, enhanced retry and extraction logic
Goziee-git Oct 16, 2025
41ff221
Enhance ArXiv script with category reporting, author bucketing, and i…
Goziee-git Oct 20, 2025
2fd81a1
Delete data/2025Q4/1-fetch/arxiv_2_count_by_language.csv
Goziee-git Oct 18, 2025
9d09d9d
Delete data/2025Q4/1-fetch/arxiv_2_count_by_category.csv
Goziee-git Oct 18, 2025
300cb25
modified regex pattern Creative Commons to Unknown CC legat tool
Goziee-git Oct 22, 2025
8f6a409
Remove HTTP adapter configuration to ensure all API calls use HTTPS
Goziee-git Oct 22, 2025
70d191f
Add User-Agent header and remove HTTP adapter for HTTPS-only requests
Goziee-git Oct 22, 2025
a544cab
Add category converter in /dev called by arxiv_fetch.py to generate u…
Goziee-git Oct 22, 2025
4fb8f30
Fix converter output to use correct filename and location in data/202…
Goziee-git Oct 22, 2025
587e2e0
Fix static analysis issues in arxiv_fetch.py - line length and format…
Goziee-git Oct 22, 2025
7dbf3c0
Fix static analysis issues in arxiv_category_converter.py - formattin…
Goziee-git Oct 22, 2025
03b7c69
Convert provenance output from JSON to YAML and store in /data directory
Goziee-git Oct 22, 2025
4980324
Restore gcs_report.py to upstream version
Goziee-git Oct 22, 2025
a425ee5
Delete arxiv_fetch.py
Goziee-git Oct 22, 2025
5bb4144
Add data files to gitignore to prevent accidental commits
Goziee-git Oct 22, 2025
c531b75
Delete data/2025Q4/1-fetch/arxiv_1_count.csv
Goziee-git Oct 22, 2025
1de32c7
Delete data/2025Q4/1-fetch/arxiv_3_count_by_country.csv
Goziee-git Oct 22, 2025
d8724bb
Delete data/2025Q4/1-fetch/arxiv_3_count_by_year.csv
Goziee-git Oct 22, 2025
386989a
Delete data/2025Q4/1-fetch/arxiv_4_count_by_author_count.csv
Goziee-git Oct 22, 2025
f14e4ce
Delete .gitignore
Goziee-git Oct 22, 2025
f9e5ae7
Merge remote-tracking branch 'upstream/main' into feature/arxiv
Goziee-git Oct 23, 2025
6769a33
Merge branch 'feature/arxiv' of https://github.com/Goziee-git/quantif…
Goziee-git Oct 23, 2025
95df48a
Improve arxiv_fetch.py: add debug logging, organize constants, use sh…
Goziee-git Oct 24, 2025
267105b
Add PyYAML and feedparser dependencies for ArXiv functionality
Goziee-git Oct 24, 2025
0aa919c
Update arxiv_fetch.py
Goziee-git Oct 24, 2025
9b83241
Remove shebang from imported module
Goziee-git Oct 24, 2025
58a9f99
Remove type hints from arxiv_fetch.py
Goziee-git Oct 24, 2025
7defab5
Add logging and fix silent exception handling in arxiv_category_conve…
Goziee-git Oct 25, 2025
076b95a
feat: centralize ArXiv category management in shared.py
Goziee-git Oct 27, 2025
9183088
refactor: use shared module for comprehensive ArXiv categories
Goziee-git Oct 27, 2025
2ef6c6f
refactor: use shared category functions in arxiv_fetch.py
Goziee-git Oct 27, 2025
b2f96f9
Refactor arxiv_fetch.py: move CATEGORIES constant local, reorganize c…
Goziee-git Oct 27, 2025
0798f0d
Delete dev/arxiv_category_converter.py
Goziee-git Oct 27, 2025
ba8bce7
Delete dev/create_arxiv_category_map.py
Goziee-git Oct 27, 2025
69e86be
Delete scripts/shared.py
Goziee-git Oct 27, 2025
37214af
Restore scripts/shared.py - required dependency for arxiv_fetch.py
Goziee-git Oct 27, 2025
9993111
Revert shared.py to pre-category state, make arxiv_fetch.py fully sel…
Goziee-git Oct 27, 2025
cf29f8d
Add .gitignore file
Goziee-git Oct 28, 2025
1880043
Revert "Add .gitignore file"
Goziee-git Oct 28, 2025
df6fe6b
Replace HTTP retry and API constants with literal values
Goziee-git Oct 28, 2025
0414859
Remove PERCENT column and aggregated category report generation
Goziee-git Oct 28, 2025
f304aaa
Move provenance file to quarterly data directory
Goziee-git Oct 28, 2025
704fa28
Move script execution log to main function
Goziee-git Oct 28, 2025
ba74f05
Clarify limit argument help text and add documentation
Goziee-git Oct 28, 2025
d04179b
Replace consecutive calls logging with per-query result summary
Goziee-git Oct 28, 2025
76d2184
Reorganize constants in logical order and fix static analysis issues
Goziee-git Oct 28, 2025
a9d91de
Fix error handling in arxiv_fetch.py to raise QuantifyingException
Goziee-git Oct 28, 2025
c2fcfd8
Remove verbose per-paper logging in arxiv_fetch.py
Goziee-git Oct 28, 2025
32a8c60
Revert .gitignore changes (f14e4ce and 5bb4144)
Goziee-git Oct 28, 2025
5317b77
chore: Fix encoding and newlines in arxiv_fetch.py per issue #217
Goziee-git Oct 29, 2025
5953472
Add arXiv source
Goziee-git Oct 30, 2025
faa2b27
Update arXiv documentation links
Goziee-git Oct 30, 2025
6d50654
Refine author count bucketing to individual buckets for 1-4 authors a…
Goziee-git Oct 31, 2025
f554e91
Merge branch 'creativecommons:main' into feature/arxiv
Goziee-git Oct 31, 2025
6ddde78
Remove redundant None check in bucket_author_count function
Goziee-git Oct 31, 2025
bef203f
Refactor: alphabetize file path constants in arxiv_fetch.py
Goziee-git Oct 31, 2025
7adc610
Merge branch 'main' into feature/arxiv
TimidRobot Nov 1, 2025
8a058eb
order soruces and cleanup formatting and labeling
TimidRobot Nov 1, 2025
631df48
use standard backoff_factor=10
TimidRobot Nov 1, 2025
01f4e01
order/sort data
TimidRobot Nov 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ verify_ssl = true
name = "pypi"

[packages]
feedparser = "*"
flickrapi = "*"
GitPython = "*"
google-api-python-client = "*"
Expand All @@ -18,6 +19,7 @@ pillow = ">=11.3.0" # Ensure dependency is secure
Pyarrow = "*"
Pygments = "*"
python-dotenv = "*"
PyYAML = "*"
requests = ">=2.31.0"
seaborn = "*"
urllib3 = ">=2.5.0"
Expand Down
Loading