Updated: Jan 16, 2026
- Install
uvif needed and create theopenalexuv environment by runninguv initinside theopenalexdirectory. - Download the OpenAlex snapshots from this
link to a directory of your choosing (say,
basedir). - To run the flattening script, first activate the uv
openalexenvironment (if needed) by runningsource .venv/bin/activateinside the directory, then executeuv run preprocessing/flatten_openalex_files.py. - Open
preprocessing/flatten_openalex_files.pyand update the following variables: a.BASEDIRto the directory in Step 1. b.MONTHto the month of the snapshot, eg:may-2025 - Scroll down to the
if __name__ == '__main__':block near the end of the file. - Uncomment the lines one at a time and run the script
flatten_<entity>functions to generate the flattened compressed CSV files. a. Start withflatten_merged_entries, then b. Thenflatten_funders,flatten_concepts, ....,flatten_topics. - To flatten
works, uncomment the block of code forflatten_works_v3. You flatten all JSONs at once, or do itNfiles at a time by changing thefiles_to_processvariable to eitherallor an integerN.
Warnings:
- flattening authors and works take anywhere between 15 and 30 hours. The code will cache the files, so you
should consider running it in batches by setting the
files_to_processvariable.