Modern ETL pipeline for Brazilian demographic and economic data analysis using IBGE APIs
This project demonstrates a comprehensive ETL (Extract, Transform, Load) pipeline for Brazilian government data, featuring interactive visualizations, economic analysis, and modern data engineering practices.
- Multi-source Data Extraction: Brazilian states, municipalities, and economic indicators
- Advanced Data Transformation: Cleaning, normalization, and feature engineering
- Interactive Visualizations: Plotly-based charts, maps, and dashboards
- Economic Analysis: Clustering, ranking, and correlation analysis
- Cloud-Ready: Google Colab optimized with BigQuery integration support
- International Standards: English documentation and modern coding practices
brazilian-data-etl/
├── notebooks/
│ ├── brazilian_demographics_etl.ipynb # Main ETL pipeline
│ └── brazilian_economic_analysis.ipynb # Economic analysis & clustering
├── README.md # Project documentation
└── requirements.txt # Python dependencies
- Click the "Open in Colab" badge above
- Run all cells in the notebook
- Data will be automatically downloaded and processed
- Interactive visualizations will be generated inline
git clone https://github.com/bellDataSc/Projeto-ETL-com-Python-e-Google-BigQuery
cd
# Install dependencies
pip install -r requirements.txt
# Launch Jupyter Notebook
jupyter notebook
All data is sourced from official Brazilian government APIs:
-
IBGE API: Brazilian Institute of Geography and Statistics
- States and municipalities data
- Population estimates
- Geographic boundaries
-
SIDRA API: IBGE's Automatic Recovery System
- Economic indicators
- Demographic statistics
- Time series data
- Python 3.8+: Main programming language
- Pandas: Data manipulation and analysis
- Plotly: Interactive visualizations
- Requests: API data extraction
- Scikit-learn: Machine learning and clustering
- Google BigQuery (optional): Cloud data warehouse
- Population distribution by state and region
- Geographic data processing
- Name pattern analysis
- GDP per capita analysis
- Human Development Index (HDI) correlations
- Economic clustering using K-means
- Comprehensive ranking system
- Population distribution maps
- Economic performance comparisons
- Correlation matrices
- 3D clustering plots
The analysis reveals:
- Regional Disparities: Significant economic differences between Brazilian regions
- Development Patterns: Clear correlations between GDP, HDI, and economic diversity
- Growth Opportunities: Identification of high-potential development areas
- Professional Portfolio: Demonstrates modern data engineering skills
- Real-World Data: Uses authentic government datasets
- International Appeal: English documentation for global audience
- Cloud-Native: Ready for deployment and scaling
- Best Practices: Clean code, comprehensive documentation, reproducible analysis
Perfect for learning:
- ETL pipeline development
- API data extraction
- Data visualization with Plotly
- Economic data analysis
- Machine learning clustering
- Google Colab development
Contributions are welcome! Areas for improvement:
- Additional data sources (economic sectors, environmental data)
- Advanced predictive models
- Geographic visualizations with maps
- Real-time data updates
- Performance optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
Created by Bel - [email protected]
LinkedIn: http://www.linkedin.com/in/belcruz
Portfolio:
⭐ If you found this project useful, please give it a star! ⭐
Made with ☕ by Isabel Cruz | in Google Colab | in Brazil | Data from IBGE