Skip to content

galafis/Data-analysis-practice

Repository files navigation

📊 Data Analysis Practice

Professional Python project implementing Data Analysis Practice

Python License

English | Português


English

🎯 Overview

Data Analysis Practice is a production-grade Python application complemented by R that showcases modern software engineering practices including clean architecture, comprehensive testing, containerized deployment, and CI/CD readiness.

The codebase comprises 375 lines of source code organized across 3 modules, following industry best practices for maintainability, scalability, and code quality.

✨ Key Features

  • 🔄 Data Pipeline: Scalable ETL with parallel processing
  • ✅ Data Validation: Schema validation and quality checks
  • 📊 Monitoring: Pipeline health metrics and alerting
  • 🔧 Configurability: YAML/JSON-based pipeline configuration

🏗️ Architecture

graph TB
    subgraph Core["🏗️ Core"]
        A[Main Module]
        B[Business Logic]
        C[Data Processing]
    end
    
    subgraph Support["🔧 Support"]
        D[Configuration]
        E[Utilities]
        F[Tests]
    end
    
    A --> B --> C
    D --> A
    E --> B
    F -.-> B
    
    style Core fill:#e1f5fe
    style Support fill:#f3e5f5
Loading

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/Data-analysis-practice.git
cd Data-analysis-practice

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

📁 Project Structure

Data-analysis-practice/
├── tests/         # Test suite
│   ├── __init__.py
│   └── test_main.py
├── LICENSE
├── README.md
├── analyze_association_fixed.py
├── college_major_analysis.R
└── create_college_dataset.py

🛠️ Tech Stack

Technology Description Role
Python Core Language Primary
R 1 files Supporting

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Gabriel Demetrios Lafis


Português

🎯 Visão Geral

Data Analysis Practice é uma aplicação Python de nível profissional, complementada por R que demonstra práticas modernas de engenharia de software, incluindo arquitetura limpa, testes abrangentes, implantação containerizada e prontidão para CI/CD.

A base de código compreende 375 linhas de código-fonte organizadas em 3 módulos, seguindo as melhores práticas do setor para manutenibilidade, escalabilidade e qualidade de código.

✨ Funcionalidades Principais

  • 🔄 Data Pipeline: Scalable ETL with parallel processing
  • ✅ Data Validation: Schema validation and quality checks
  • 📊 Monitoring: Pipeline health metrics and alerting
  • 🔧 Configurability: YAML/JSON-based pipeline configuration

🏗️ Arquitetura

graph TB
    subgraph Core["🏗️ Core"]
        A[Main Module]
        B[Business Logic]
        C[Data Processing]
    end
    
    subgraph Support["🔧 Support"]
        D[Configuration]
        E[Utilities]
        F[Tests]
    end
    
    A --> B --> C
    D --> A
    E --> B
    F -.-> B
    
    style Core fill:#e1f5fe
    style Support fill:#f3e5f5
Loading

🚀 Início Rápido

Prerequisites

  • Python 3.12+
  • pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/Data-analysis-practice.git
cd Data-analysis-practice

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

📁 Estrutura do Projeto

Data-analysis-practice/
├── tests/         # Test suite
│   ├── __init__.py
│   └── test_main.py
├── LICENSE
├── README.md
├── analyze_association_fixed.py
├── college_major_analysis.R
└── create_college_dataset.py

🛠️ Stack Tecnológica

Tecnologia Descrição Papel
Python Core Language Primary
R 1 files Supporting

🤝 Contribuindo

Contribuições são bem-vindas! Sinta-se à vontade para enviar um Pull Request.

📄 Licença

Este projeto está licenciado sob a Licença MIT - veja o arquivo LICENSE para detalhes.

👤 Autor

Gabriel Demetrios Lafis

Releases

No releases published

Packages

No packages published