Skip to content

A comprehensive portfolio of Data Science, Applied Math, and Distributed Systems projects.

License

Notifications You must be signed in to change notification settings

francis4all/Applied-CS-Data-Journey

Repository files navigation

Applied Computer Science & Data Science Portfolio

Welcome to my professional repository. This space documents my journey as an Information Technologies for Science student at UNAM (ENES Morelia). It showcases a specialized blend of mathematical rigor, statistical inference, and state-of-the-art Deep Learning.


Portfolio Structure

Advanced neural architectures for complex visual tasks and multi-objective optimization.

  • Multi-output Regression: A hybrid model predicting age (regression) and multiple categories simultaneously from images using ResNet backbones.
  • Image Classification: High-fidelity classifiers for animal and cartoon domains using Transfer Learning and the FastAI/PyTorch ecosystem.
  • Core Skills: Computer Vision, Transfer Learning, Multi-task Learning and GPU Acceleration (CUDA).

Implementation of classical AI algorithms categorized by learning paradigm with a focus on robust evaluation.

  • Supervised Learning:
    • Sentiment Analysis (NLP) using Naive Bayes.
    • Medical Outcome Prediction for Horse Colic using SVM and Logistic Regression.
  • Unsupervised Learning:
    • Market Basket Analysis for ingredient patterns within the wolrds kitchen. * Topic Modeling (NMF) for unstructured text analysis.
  • Core Skills: Feature Engineering, NLP Preprocessing, Dimensionality Reduction, Association Rules, and Hyperparameter Tuning.

Engineering the backbone of data: from relational foundations to high-volume ETL pipelines.

  • SQL Advanced Analytics: * High-Volume ETL: Normalization and ingestion of 155,000+ Mexican Postal records.
    • Relational Logic: Advanced implementations of Set Theory (Joins, Unions) and complex subqueries.
  • Distributed Banking (WIP): Architecture for multi-node Galera Clusters and ACID-compliant transactions.
  • Core Skills: ETL Pipeline Design, Relational Algebra, Database Normalization, and Query Optimization.

The core of data engineering, model assessment, and statistical validation.

  • Featured Project: Pokemon Statistical Study (Gen 1-8)
    • A comprehensive three-stage study on "Power Creep" and franchise balance.
    • Stage 1 (R): Inferential statistics, T-Tests, and OLS diagnostics (Normality & Homoscedasticity) to validate design shifts.
    • Stage 2 (Python): Advanced visualization and predictive modeling of base stats.
    • Stage 3 (Multivariate Analysis): Study of internal correlations, covariance matrices, and the impact of categorical types on power scaling.
    • Titanic Wrangling: Data cleaning and heuristic prediction modeling (Non-black-box approach).
  • Web Scraping: Automated extraction of unstructured news and book data using BeautifulSoup.
  • Cross Validation: Implementation of K-Fold metrics for robust model assessment.
  • Core Skills: Model Validation (K-Fold Cross-Validation, Precision-Recall), Inferential Statistics, Web Scraping (BeautifulSoup), and Data Wrangling.

The mathematical engine driving the code through algorithmic implementation.

  • Linear Algebra: Matrix-based image manipulation, SVD, and color space transformations.
  • Calculus: Interactive Python visualizers for derivatives and local linearity.
  • Core Skills: Numerical Computing, Matrix Decomposition, Algorithmic Visualization, and Optimization Theory.

Expertise & Technical Stack

Programming & Environments

  • Languages: Python 3.10+ (Main), R 4.x (Statistical Rigor), SQL (MariaDB/MySQL).
  • OS & Tools: Ubuntu Linux, Git, Visual Studio Code, DBeaver, Jupyter Ecosystem, Conda.

Data Engineering & AI Ecosystem

  • Databases: MariaDB, MySQL, SQL Server (Theory), Galera Cluster.
  • Deep Learning: PyTorch, FastAI, Torchvision.
  • Machine Learning: Scikit-Learn, SciPy, Statsmodels, Yellowbrick (Visual Analysis).
  • Data Manipulation: Pandas, NumPy, Tidyverse (R), Unidecode, SQLAlchemy, Python-dotenv.
  • Visualization: Matplotlib, Seaborn, Ggplot2 (R).

Engineering Philosophy

  • Bilingual Approach: Validation of results across different languages (R & Python) to ensure statistical reliability.
  • Explainability: Focus on model diagnostics (Residuals, Normality tests) rather than just accuracy.
  • Reproducibilidad: Structured modularity with clear dependency management.

Developed by Francisco Solís Pedraza | Bachelor's in Information Technology for Science | ENES Morelia, UNAM.

About

A comprehensive portfolio of Data Science, Applied Math, and Distributed Systems projects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages