Create a .env file with your OpenAI API key:
cp .env.template .envThen fill in your PYATLAS__OPENAI_API_KEY. This is used to generate cluster labels with gpt-5-mini. The setup script makes about 170 API calls (one per cluster), using less than 200,000 tokens in total, so incurred costs are minimal.
The setup script will:
- Download and process the PyPI dataset.
- Create vector embeddings for the package descriptions.
- Run UMAP and HDBSCAN to compute coordinates and clusters.
- Generate cluster labels using OpenAI.
Run it with:
uv run python pyatlas/scripts/setup.pyStart the frontend:
cd frontend
npm install
npm run devThe application will be available at http://localhost:5173.
The dataset for this project is created using the PyPI dataset on Google BigQuery. The SQL query used can be found in pypi_bigquery.sql.