Applied Computer Science & Data Science Portfolio

Welcome to my professional repository. This space documents my journey as an Information Technologies for Science student at UNAM (ENES Morelia). It showcases a specialized blend of mathematical rigor, statistical inference, and state-of-the-art Deep Learning.

Portfolio Structure

1. Deep Learning & Computer Vision

Advanced neural architectures for complex visual tasks and multi-objective optimization.

Multi-output Regression: A hybrid model predicting age (regression) and multiple categories simultaneously from images using ResNet backbones.
Image Classification: High-fidelity classifiers for animal and cartoon domains using Transfer Learning and the FastAI/PyTorch ecosystem.
Core Skills: Computer Vision, Transfer Learning, Multi-task Learning and GPU Acceleration (CUDA).

2. Machine Learning Models

Implementation of classical AI algorithms categorized by learning paradigm with a focus on robust evaluation.

Supervised Learning:
- Sentiment Analysis (NLP) using Naive Bayes.
- Medical Outcome Prediction for Horse Colic using SVM and Logistic Regression.
Unsupervised Learning:
- Market Basket Analysis for ingredient patterns within the wolrds kitchen. * Topic Modeling (NMF) for unstructured text analysis.
Core Skills: Feature Engineering, NLP Preprocessing, Dimensionality Reduction, Association Rules, and Hyperparameter Tuning.

3. Distributed Systems & SQL

Engineering the backbone of data: from relational foundations to high-volume ETL pipelines.

SQL Advanced Analytics: * High-Volume ETL: Normalization and ingestion of 155,000+ Mexican Postal records.
- Relational Logic: Advanced implementations of Set Theory (Joins, Unions) and complex subqueries.
Distributed Banking (WIP): Architecture for multi-node Galera Clusters and ACID-compliant transactions.
Core Skills: ETL Pipeline Design, Relational Algebra, Database Normalization, and Query Optimization.

4. Data Science Fundamentals

The core of data engineering, model assessment, and statistical validation.

Featured Project: Pokemon Statistical Study (Gen 1-8)
- A comprehensive three-stage study on "Power Creep" and franchise balance.
- Stage 1 (R): Inferential statistics, T-Tests, and OLS diagnostics (Normality & Homoscedasticity) to validate design shifts.
- Stage 2 (Python): Advanced visualization and predictive modeling of base stats.
- Stage 3 (Multivariate Analysis): Study of internal correlations, covariance matrices, and the impact of categorical types on power scaling.
- Titanic Wrangling: Data cleaning and heuristic prediction modeling (Non-black-box approach).
Web Scraping: Automated extraction of unstructured news and book data using BeautifulSoup.
Cross Validation: Implementation of K-Fold metrics for robust model assessment.
Core Skills: Model Validation (K-Fold Cross-Validation, Precision-Recall), Inferential Statistics, Web Scraping (BeautifulSoup), and Data Wrangling.

5. Math Foundations

The mathematical engine driving the code through algorithmic implementation.

Linear Algebra: Matrix-based image manipulation, SVD, and color space transformations.
Calculus: Interactive Python visualizers for derivatives and local linearity.
Core Skills: Numerical Computing, Matrix Decomposition, Algorithmic Visualization, and Optimization Theory.

Expertise & Technical Stack

Programming & Environments

Languages: Python 3.10+ (Main), R 4.x (Statistical Rigor), SQL (MariaDB/MySQL).
OS & Tools: Ubuntu Linux, Git, Visual Studio Code, DBeaver, Jupyter Ecosystem, Conda.

Data Engineering & AI Ecosystem

Databases: MariaDB, MySQL, SQL Server (Theory), Galera Cluster.
Deep Learning: PyTorch, FastAI, Torchvision.
Machine Learning: Scikit-Learn, SciPy, Statsmodels, Yellowbrick (Visual Analysis).
Data Manipulation: Pandas, NumPy, Tidyverse (R), Unidecode, SQLAlchemy, Python-dotenv.
Visualization: Matplotlib, Seaborn, Ggplot2 (R).

Engineering Philosophy

Bilingual Approach: Validation of results across different languages (R & Python) to ensure statistical reliability.
Explainability: Focus on model diagnostics (Residuals, Normality tests) rather than just accuracy.
Reproducibilidad: Structured modularity with clear dependency management.

Developed by Francisco Solís Pedraza | Bachelor's in Information Technology for Science | ENES Morelia, UNAM.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Data_Science_Fundamentals		Data_Science_Fundamentals
Deep_Learning		Deep_Learning
Distributed_Systems_SQL		Distributed_Systems_SQL
Machine_Learning_Models		Machine_Learning_Models
Math_Foundations		Math_Foundations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applied Computer Science & Data Science Portfolio

Portfolio Structure

1. Deep Learning & Computer Vision

2. Machine Learning Models

3. Distributed Systems & SQL

4. Data Science Fundamentals

5. Math Foundations

Expertise & Technical Stack

Programming & Environments

Data Engineering & AI Ecosystem

Engineering Philosophy

About

Uh oh!

Releases

Packages

Languages

License

francis4all/Applied-CS-Data-Journey

Folders and files

Latest commit

History

Repository files navigation

Applied Computer Science & Data Science Portfolio

Portfolio Structure

1. Deep Learning & Computer Vision

2. Machine Learning Models

3. Distributed Systems & SQL

4. Data Science Fundamentals

5. Math Foundations

Expertise & Technical Stack

Programming & Environments

Data Engineering & AI Ecosystem

Engineering Philosophy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages