This project presents a comprehensive, data-driven analysis of 4.94 million Aadhaar records, uncovering critical systemic patterns in India's digital identity ecosystem. By employing advanced analytical techniquesβincluding trivariate analysis, cross-dataset clustering, and predictive demand modelingβwe have identified a major shift in the Aadhaar lifecycle from "Acquisition" to "Maintenance."
π Hackathon Track: Online Hackathon on Data-Driven Innovation for Aadhaar - 2026
| Metric | Value |
|---|---|
| Total Records Analyzed | 4.94 Million |
| Time Period | March - December 2025 |
| Geographic Coverage | 38 States/UTs |
| Granularity | 985 Districts, 19,000+ Pincodes |
Our analysis revealed three ground-breaking insights that drive our strategic recommendations:
The Aadhaar system now processes 22 update transactions for every 1 new registration.
- Insight: The ecosystem has fundamentally transitioned from an acquisition-heavy model to a service-maintenance model.
- Implication: Infrastructure must pivot to prioritize biometric update stations over enrolment kits.
Northern India exhibits 2.7x higher activity intensity per center compared to Southern India.
- Insight: There are significant regional disparities in service utilization and access.
- Implication: Targeted investigations are needed to understand if this gap is due to saturation or access barriers.
29% of potential demand is lost due to weekend closures.
- Insight: Identifying latent demand from working professionals and school-age children who cannot visit during weekdays.
- Implication: Implementing a "Weekend Warrior" operational model could unlock massive service capacity.
We utilized a structured, multi-phase workflow to transform raw data into actionable policy insights:
-
Phase 1: Data Exploration π΅οΈββοΈ
- Initial profiling of Enrolment, Biometric, and Demographic datasets.
- Verification of data integrity (100% complete data confirmed).
-
Phase 2: Data Cleaning π§Ή
- The "68-State Problem": Solved a critical data quality issue where 68 state name variations were standardized to 38 official names using a semantic cleaning pipeline.
-
Phase 3: Exploratory Data Analysis (EDA) π
- Age-Gender Segmentation.
- Geographic Heatmaps and Temporal Trend Analysis.
-
Phase 4: Pattern Discovery π
- Seasonality Detection (April-July peaks).
- Anomaly Detection using Z-scores.
-
Phase 5: Advanced Analytics & Modeling π§
- K-Means Clustering: Segmented states into "Saturated," "High-Growth," and "Lagging" clusters.
- Demand Prediction: Forecasted stress zones using Pincode Density vs. Activity Volume.
uidai_project/
βββ π dataset/ # Dataset files (Official UIDAI data)
βββ π notebooks/ # Jupyter Notebooks for analysis
β βββ 01_data_exploration.ipynb
β βββ 02_data_cleaning.ipynb
β βββ 03_eda_visualizations.ipynb
β βββ 04_pattern_discovery.ipynb
β βββ 05_advanced_analysis.ipynb
β βββ 06_hackathon_advanced_analysis.ipynb
βββ π processed_data/ # Cleaned datasets
β βββ biometric_cleaned.csv
β βββ demographic_cleaned.csv
β βββ enrolment_cleaned.csv
βββ π results/ # Generated plots and findings
β βββ figures/
β βββ insights/
β βββ reports/
βββ π presentation/ # Slides and presentation assets
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation (this file)
North India (8,802 activities/pincode) vs. South India (3,281 activities/pincode).
65% of new enrolments are Children (0-5 years). Adult saturation is near 100%.
The following districts are identified as CRITICAL for immediate intervention due to overcrowding:
- North East Delhi (Delhi)
- West Delhi (Delhi)
- Mahasamund (Chhattisgarh)
Follow these steps to replicate the analysis or explore the insights.
- Python 3.8 or higher
- Jupyter Notebook
-
Clone the repository
git clone https://github.com/askvs/uidai_addhar_hackathon cd uidai_project -
Create a virtual environment (Optional but recommended)
python -m venv .venv # Windows .venv\Scripts\activate # macOS/Linux source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
Run the Jupyter notebooks in sequential order to reproduce the analysis pipeline:
jupyter notebook- Start with
notebooks/01_data_exploration.ipynbto load the data. - Proceed through the numbered notebooks to replicate the cleaning, EDA, and advanced analysis phases.
- UIDAI for providing the dataset and the opportunity to innovate.
- Open Source Community for the incredible tools and libraries.
Built with β€οΈ by Vikash Sharma for the UIDAI Hackathon 2026