End-to-end data pipeline for the Open Apparel Registry (OAR). The project extracts, cleans, analyzes, and visualizes company and facility data across 7 target countries using an OAR-style dataset.
# Clone the repository
git clone https://github.com/your-username/oar-data-pipeline.git
cd oar-data-pipeline
# Create virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# Mac / Linux
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtpython main.pypython scrape_oar.py # Data extraction
python clean_companies.py # Company cleaning
python clean_facilities.py # Facility cleaning
python relational_builder.py # Relational modeling
python analytics_dashboards.py # Visual analytics
python ai_module.py # AI analysis
python export_final.py # Final exportoar-data-pipeline/
├── main.py
├── scrape_oar.py
├── clean_companies.py
├── clean_facilities.py
├── relational_builder.py
├── analytics_dashboards.py
├── ai_module.py
├── export_final.py
├── requirements.txt
├── README.md
└── .gitignore
- Automated data ingestion (API or synthetic fallback)
- Filtering by target countries
- Automatic test data generation if API is unavailable
- Company name normalization
- Country name standardization
- Unique ID generation
- Duplicate removal
- Companies, Facilities, and Link tables
- Referential integrity checks
- Data consistency validation
- Companies per country visualization
- Facilities per company distribution
- Sector-based analysis
- Statistical summaries
- Sustainability keyword detection
- Automatic text summarization
- Sustainability scoring
- CSV, JSON, and Excel formats
- Auto-generated documentation
- Timestamped archives
- 🇲🇦 Morocco
- 🇪🇸 Spain
- 🇵🇹 Portugal
- 🇮🇹 Italy
- 🇫🇷 France
- 🇬🇷 Greece
- 🇲🇹 Malta
- Python 3.11
- pandas
- requests
- matplotlib
- scikit-learn
- hashlib
data/outputs/relational_companies.csvdata/outputs/relational_facilities.csvdata/outputs/ai_analysis.csvcompanies_by_country.pngfacilities_per_company.png
- 10,000+ companies processed
- 15,000+ facilities extracted
- Automated sustainability detection
- Multi-format exports
- Open Apparel Registry
- CommonShare
- Python open-source community
Ayoub Aguezar
Software & Data Engineering Student
MIT License – see LICENSE file for details.


