6- Data Mining / Data Cleaning, Preparation and Detection of Anomalies (Outlier Detection)
Institution: Pontifical Catholic University of São Paulo (PUC-SP)
School: Faculty of Interdisciplinary Studies
Program: Humanistic AI and Data Science
Semester: 2nd Semester 2025
Professor: Professor Doctor in Mathematics Daniel Rodrigues da Silva
Important
- Projects and deliverables may be made publicly available whenever possible.
- The course emphasizes practical, hands-on experience with real datasets to simulate professional consulting scenarios in the fields of Data Analysis and Data Mining for partner organizations and institutions affiliated with the university.
- All activities comply with the academic and ethical guidelines of PUC-SP.
- Any content not authorized for public disclosure will remain confidential and securely stored in private repositories.
🎶 Prelude Suite no.1 (J. S. Bach) - Sound Design Remix
Statistical.Measures.and.Banking.Sector.Analysis.at.Bovespa.mp4
📺 For better resolution, watch the video on YouTube.
Tip
This repository is a review of the Statistics course from the undergraduate program Humanities, AI and Data Science at PUC-SP.
Access Data Mining Main Repository
If you’d like to explore the full materials from the 1st year (not only the review), you can visit the complete repository here.
Explore datasets from the University of California Irvine (UCI) Machine Learning Repository : such as the Balloon, Bank Marketing, and Mammogram datasets to practice these concepts of data pre-processing and mining.
- Introduction
- Common Problems in Raw Data
- Garbage In, Garbage Out (GIGO)
- Types of Data
- Structured, Semi-Structured, Unstructured
- Data Attributes and Their Types
- Datasets from University of California - Irvine (UCI)
- Balloon Dataset
- Bank Marketing Dataset
- Mammographic Mass Dataset
- Steps of Data Pre-Processing
- Cleaning
- Integration
- Reduction
- Transformation
- Discretization
- Data Cleaning Techniques
- Handling Missing Values
- Noise Reduction Techniques
- Handling Inconsistencies
- Data Integration Issues
- Data Reduction Techniques
- Data Standardization & Normalization
- Discretization
- Python Code Examples
- ASCII Diagrams
===================================================== Still shaping this repo ✌️ =====================================================
1. Castro, L. N. & Ferrari, D. G. (2016). Introduction to Data Mining: Basic Concepts, Algorithms, and Applications. Saraiva.
2. Ferreira, A. C. P. L. et al. (2024). Artificial Intelligence – A Machine Learning Approach. 2nd Ed. LTC.
3. Larson & Farber (2015). Applied Statistics. Pearson.
🛸๋ My Contacts Hub
────────────── 🔭⋆ ──────────────
➣➢➤ Back to Top
Copyright 2025 Quantum Software Development. Code released under the MIT License license.