⚠️ This README and the code is AI-generated
Push verified smart contract labels from Sourcify to the Open Labels Initiative (OLI). Processes 8.8+ million verified contracts efficiently using locally downloaded Parquet files and DuckDB.
source_code_verified
: "sourcify"is_contract
: truecode_language
: solidity, vyper, etc.code_compiler
: compiler version useddeployment_block
: deployment block numberdeployment_tx
: deployment transaction hashdeployer_address
: deployer addresscontract_name
: contract name from compilation metadata
-
Setup environment:
git clone <repository> cd oli-sourcify-labels python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Download Sourcify data (~10-15GB):
python download_parquet_files.py
-
Test the pipeline:
python test_local_processing.py
-
Configure OLI and process all contracts:
export OLI_PRIVATE_KEY="your_private_key_here" python main.py
main.py
- Main entry point for processing all contractsdownload_parquet_files.py
- Downloads Sourcify Parquet exports locallylocal_data_processor.py
- Processes local files with DuckDB for efficient joinstest_local_processing.py
- Comprehensive test suiteoli_submitter.py
- Handles OLI validation and submission
Run comprehensive tests:
python test_local_processing.py
Tests validate data integrity, joins, OLI tag generation, and full pipeline functionality.
Quick processor test:
python local_data_processor.py
Shows data statistics, join results, and OLI tag preview.
Required:
OLI_PRIVATE_KEY
- Your private key (wallet must have ETH on Base)
Optional:
USE_PRODUCTION="true"
- Use Base Mainnet (default: Base Sepolia testnet)DEFAULT_BATCH_SIZE="5000"
- Contracts per batch (default: 1000)SUBMISSION_DELAY="0.5"
- Delay between submissions in seconds (default: 1.0)SUBMIT_ONCHAIN="true"
- Submit onchain (costs gas, default: false)MAX_PARALLEL_WORKERS="20"
- Parallel workers for offchain submissions (default: 10)DATA_DIR="./my_data"
- Data directory path (default: ./sourcify_data)
For production (Base Mainnet) with onchain attestations:
export OLI_PRIVATE_KEY="your_private_key"
export USE_PRODUCTION="true"
export SUBMIT_ONCHAIN="true"
export DEFAULT_BATCH_SIZE="2000"
python main.py
For testnet (Base Sepolia) with free offchain submissions:
export OLI_PRIVATE_KEY="your_private_key"
python main.py
- Downloads all Sourcify Parquet files (~10-15GB) to
./sourcify_data/
- Joins 3 tables locally using DuckDB hash tables:
verified_contracts
⟵⟶contract_deployments
verified_contracts
⟵⟶compiled_contracts
- Processes contracts in batches with OLI tag generation
- Submits to OLI platform (offchain free, onchain requires gas)
Performance: ~1GB RAM, processes millions of contracts efficiently without API rate limits.
- verified_contracts:
id
,deployment_id
,compilation_id
,created_at
- contract_deployments:
id
,chain_id
,address
,transaction_hash
,block_number
,deployer
- compiled_contracts:
id
,language
,compiler
,version