Welcome to my professional repository. This space documents my journey as an Information Technologies for Science student at UNAM (ENES Morelia). It showcases a specialized blend of mathematical rigor, statistical inference, and state-of-the-art Deep Learning.
Advanced neural architectures for complex visual tasks and multi-objective optimization.
- Multi-output Regression: A hybrid model predicting age (regression) and multiple categories simultaneously from images using ResNet backbones.
- Image Classification: High-fidelity classifiers for animal and cartoon domains using Transfer Learning and the FastAI/PyTorch ecosystem.
- Core Skills: Computer Vision, Transfer Learning, Multi-task Learning and GPU Acceleration (CUDA).
Implementation of classical AI algorithms categorized by learning paradigm with a focus on robust evaluation.
- Supervised Learning:
- Sentiment Analysis (NLP) using Naive Bayes.
- Medical Outcome Prediction for Horse Colic using SVM and Logistic Regression.
- Unsupervised Learning:
- Market Basket Analysis for ingredient patterns within the wolrds kitchen. * Topic Modeling (NMF) for unstructured text analysis.
- Core Skills: Feature Engineering, NLP Preprocessing, Dimensionality Reduction, Association Rules, and Hyperparameter Tuning.
Engineering the backbone of data: from relational foundations to high-volume ETL pipelines.
- SQL Advanced Analytics: * High-Volume ETL: Normalization and ingestion of 155,000+ Mexican Postal records.
- Relational Logic: Advanced implementations of Set Theory (Joins, Unions) and complex subqueries.
- Distributed Banking (WIP): Architecture for multi-node Galera Clusters and ACID-compliant transactions.
- Core Skills: ETL Pipeline Design, Relational Algebra, Database Normalization, and Query Optimization.
The core of data engineering, model assessment, and statistical validation.
- Featured Project: Pokemon Statistical Study (Gen 1-8)
- A comprehensive three-stage study on "Power Creep" and franchise balance.
- Stage 1 (R): Inferential statistics, T-Tests, and OLS diagnostics (Normality & Homoscedasticity) to validate design shifts.
- Stage 2 (Python): Advanced visualization and predictive modeling of base stats.
- Stage 3 (Multivariate Analysis): Study of internal correlations, covariance matrices, and the impact of categorical types on power scaling.
- Titanic Wrangling: Data cleaning and heuristic prediction modeling (Non-black-box approach).
- Web Scraping: Automated extraction of unstructured news and book data using BeautifulSoup.
- Cross Validation: Implementation of K-Fold metrics for robust model assessment.
- Core Skills: Model Validation (K-Fold Cross-Validation, Precision-Recall), Inferential Statistics, Web Scraping (BeautifulSoup), and Data Wrangling.
The mathematical engine driving the code through algorithmic implementation.
- Linear Algebra: Matrix-based image manipulation, SVD, and color space transformations.
- Calculus: Interactive Python visualizers for derivatives and local linearity.
- Core Skills: Numerical Computing, Matrix Decomposition, Algorithmic Visualization, and Optimization Theory.
- Languages: Python 3.10+ (Main), R 4.x (Statistical Rigor), SQL (MariaDB/MySQL).
- OS & Tools: Ubuntu Linux, Git, Visual Studio Code, DBeaver, Jupyter Ecosystem, Conda.
- Databases: MariaDB, MySQL, SQL Server (Theory), Galera Cluster.
- Deep Learning: PyTorch, FastAI, Torchvision.
- Machine Learning: Scikit-Learn, SciPy, Statsmodels, Yellowbrick (Visual Analysis).
- Data Manipulation: Pandas, NumPy, Tidyverse (R), Unidecode, SQLAlchemy, Python-dotenv.
- Visualization: Matplotlib, Seaborn, Ggplot2 (R).
- Bilingual Approach: Validation of results across different languages (R & Python) to ensure statistical reliability.
- Explainability: Focus on model diagnostics (Residuals, Normality tests) rather than just accuracy.
- Reproducibilidad: Structured modularity with clear dependency management.
Developed by Francisco Solís Pedraza | Bachelor's in Information Technology for Science | ENES Morelia, UNAM.