OpenAlex

Updated: Jan 16, 2026

Preprocessing

Install uv if needed and create the openalex uv environment by running uv init inside the openalex directory.
Download the OpenAlex snapshots from this link to a directory of your choosing (say, basedir).
To run the flattening script, first activate the uv openalex environment (if needed) by running source .venv/bin/activate inside the directory, then execute uv run preprocessing/flatten_openalex_files.py.
Open preprocessing/flatten_openalex_files.py and update the following variables: a. BASEDIR to the directory in Step 1. b. MONTH to the month of the snapshot, eg: may-2025
Scroll down to the if __name__ == '__main__': block near the end of the file.
Uncomment the lines one at a time and run the script flatten_<entity> functions to generate the flattened compressed CSV files. a. Start with flatten_merged_entries, then b. Then flatten_funders, flatten_concepts, ...., flatten_topics.
To flatten works, uncomment the block of code for flatten_works_v3. You flatten all JSONs at once, or do it N files at a time by changing the files_to_process variable to either all or an integer N.

Warnings:

flattening authors and works take anywhere between 15 and 30 hours. The code will cache the files, so you should consider running it in batches by setting the files_to_process variable.

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.idea		.idea
_old		_old
data		data
notebooks		notebooks
preprocessing		preprocessing
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
minimal_environment.yml		minimal_environment.yml
pyproject.toml		pyproject.toml
refactor_SciSciNetv2.py		refactor_SciSciNetv2.py
slim-env.yml		slim-env.yml
uv.lock		uv.lock