Home

Welcome to the Neural Networks Workshop!

(Image credit: Steve Johnson. Unsplash.com)

From Neural Networks to Transformers: The Evolution of Deep Learning

Topic Overview

This intensive eight-session workshop charts the remarkable evolution of machine learning, guiding students/staff/postdocs from foundational neural networks to the cutting-edge architectures reshaping the AI landscape. Participants will explore the journey of innovation that has led to models capable of understanding complex patterns, generating novel content, and tackling previously intractable problems (Goodfellow et al., 2016; Vaswani et al., 2017). The series is designed for participants with some foundational understanding of machine learning concepts who wish to dig deeper into the "how" and "why" behind pivotal deep learning models.

Each session dissects a key architectural milestone: beginning with Perceptrons and basic Neural Networks, moving through the ingenious designs of Autoencoders for representation learning, Generative Adversarial Networks (GANs) for synthetic data generation, Convolutional Neural Networks (CNNs) for image processing, and Recurrent Neural Networks (RNNs) for sequential data. The workshop culminates in a two-part exploration of Transformers, the architecture now dominating natural language processing and beyond, and an introduction to Diffusion Models, which are powering the next generation of generative AI (Ho et al., 2020; Rombach et al., 2022).

The workshop will emphasize the conceptual underpinnings, core mechanisms, and significant breakthroughs associated with each model. Interdisciplinary applications will be highlighted, showcasing how these advanced AI techniques are revolutionizing fields such as drug discovery, medical imaging, climate science, robotics, natural language understanding, and creative arts. Participants will also touch upon the societal impact and ethical considerations of these powerful tools, fostering a responsible approach to AI development and deployment. While theoretical understanding is paramount, connections to practical implementation using popular open-source frameworks will be made, encouraging further independent exploration.

References for Overview:

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (A foundational textbook in deep learning).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. (The seminal paper introducing the Transformer architecture).
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851. (A key paper on diffusion models).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695). (Paper for * Stable Diffusion, a prominent latent diffusion model).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. (A high-level overview of deep learning by pioneers in the field).

Learning Goals

Upon completion of this eight-session workshop series, participants will be able to:

Trace Key Architectural Innovations: Understand the historical progression and evolutionary relationships between major neural network architectures, from early perceptrons to contemporary models like Transformers and Diffusion Models.
Explain Core Model Mechanics: Articulate the fundamental principles, mathematical concepts, and operational mechanisms of Perceptrons, Autoencoders, GANs, CNNs, RNNs, Transformers, and Diffusion Models.
Identify Appropriate Model Applications: Recognize the types of data and tasks for which each discussed architecture is best suited (e.g., images, sequences, generative tasks) and understand their typical application domains.
Appreciate Implementation Frameworks: Gain awareness of how these models are implemented using industry-standard open-source deep learning libraries (e.g., TensorFlow, PyTorch), facilitating further self-study and practical application.
Critically Evaluate Advanced AI: Discuss the capabilities, current limitations, potential societal impacts, and ethical considerations associated with sophisticated AI models, particularly Transformers and generative AI like GANs and Diffusion Models.

Fall 2024

Instructors: Brennon Huppenthal, Megh Krishnaswamy

Workshop Overview

Topic	Description	Materials	Code	YouTube
Session 1: Perceptrons and Neural Networks 🧠	This session will cover the foundational concepts, starting with the Perceptron algorithm, its biological inspiration, and limitations (e.g., the XOR problem). It will then introduce Multi-Layer Perceptrons (MLPs), the role of activation functions, the intuitive idea behind backpropagation for learning, and the structure of a basic neural network (input, hidden, output layers), along with an introduction to loss functions and gradient descent at a conceptual level.	Notes	Code	Video()
Session 2: Autoencoders 🗜️	This session focuses on Autoencoders as a neural network architecture for unsupervised learning, primarily for representation learning and dimensionality reduction. It will explore the encoder-bottleneck-decoder structure, the concept of reconstruction loss, and briefly touch upon variants like Denoising Autoencoders and an introduction to the generative capabilities of Variational Autoencoders (VAEs). Applications like anomaly detection and data compression will be discussed.	Notes	Code	Video()
Session 3: GAN (Generative Adversarial Networks) 🎭	This session will introduce Generative Adversarial Networks (GANs), explaining the innovative adversarial training process involving a generator and a discriminator locked in a minimax game. Key concepts like loss functions for both networks and common challenges such as mode collapse and training instability will be discussed, alongside examples of GANs' ability to generate realistic data, particularly images.	Notes	Code 1 Code 2	Video()
Session 4: CNN (Convolutional Neural Networks) 🖼️	This session delves into Convolutional Neural Networks (CNNs), the workhorse for image processing and computer vision tasks. It will cover the core components: convolutional layers (filters, stride, padding), pooling layers, and the role of fully connected layers. The concepts of parameter sharing, local receptive fields, and hierarchical feature learning will be explained, with a brief mention of influential CNN architectures.	Notes	Code	Video()
Session 5: RNN (Recurrent Neural Networks) 🔄	This session focuses on Recurrent Neural Networks (RNNs), designed to handle sequential data like text and time series. It will explain the concept of a hidden state that allows RNNs to have "memory," the challenges of training RNNs (vanishing/exploding gradients), and how variants like LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) address these issues through gating mechanisms.	Notes	Code	Video()

Collaborators: Brennon Huppenthal, Megh Krishnaswamy, Enrique Noriega, Carlos Lizárraga.

Created: 06/11/2024 (C. Lizárraga)
Updated: 06/09/2025 (C. Lizárraga)

2025. University of Arizona DataLab, Data Science Institute.

UArizona DataLab, Data Science Institute, University of Arizona, 2025.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly