Course: IAA
Academic Year: 2025/2026
Degree: Computer Science Engineering - 3rd Year
University of Aveiro
Paper Theme: "Multivariate Exploration: Beyond Pairwise Analysis"
This project was developed as part of the IAA course for writing an academic paper on the theme "Multivariate Exploration: Beyond Pairwise Analysis". The work explores multivariate exploratory analysis techniques applied to the Wine Quality Dataset. The main objective is to demonstrate how multivariate methods reveal global patterns in the data that are not visible through traditional pairwise analysis.
- Compare traditional pairwise analysis vs. multivariate analysis
- Apply PCA (Principal Component Analysis) for dimensionality reduction
- Identify natural clusters in the data using K-means
- Visualize complex multidimensional structures
- Interpret loadings and principal components
Wine Quality Dataset
- Source: UCI Machine Learning Repository
- Samples: 6497 wines (1599 red + 4898 white)
- Variables: 11 physicochemical attributes + quality (target)
- Type: Real data from Portuguese wines (Vinho Verde)
- Fixed acidity
- Volatile acidity
- Citric acid
- Residual sugar
- Chlorides
- Free sulfur dioxide
- Total sulfur dioxide
- Density
- pH
- Sulphates
- Alcohol
- Quality (score 0-10)
- Python 3.x
- Jupyter Notebook
- pandas - Data manipulation
- numpy - Numerical operations
- matplotlib & seaborn - Visualization
- scikit-learn - PCA, K-means, StandardScaler
- scipy - Hierarchical clustering
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/multivariate-wine-analysis.git cd multivariate-wine-analysis -
Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn scipy jupyter
-
Run the notebook:
jupyter notebook multivariate_exploration.ipynb
- Individual correlations between variables
- 2D scatter plots
- Limitations: does not capture global structure
- PCA: Reduction from 11D → 3-4D while maintaining 80-85% of variance
- Clustering: Identification of 4 natural groups
- Separation: Clear distinction between red and white wines in multivariate space
- Insight: Quality depends on combinations of properties, not isolated metrics
- Dimensionality reduction with PCA
- Interpretation of loadings and biplots
- K-means clustering and elbow method
- Hierarchical analysis (dendrograms)
- Multidimensional data visualization
- Difference between pairwise correlation and multivariate structure
Computer Science Engineering Student
University of Aveiro
This project was developed for academic purposes as part of the evaluation for the IAA course. The analysis and visualizations presented here support the paper "Multivariate Exploration: Beyond Pairwise Analysis", which compares traditional pairwise exploratory techniques with advanced multivariate methods.
This project is licensed under the MIT License - See the LICENSE file for details.
Academic Use: This work may be freely used, modified, and distributed for educational and research purposes, provided proper attribution is given.