A curated collection of foundational and influential papers in artificial intelligence, spanning from 1943 to 2025 covering foundational theory, deep learning, NLP, computer vision, and modern foundation models.
| Year | Title | Citations | Key Contribution | Authors | Link |
|---|---|---|---|---|---|
| 1943 | A Logical Calculus of the Ideas Immanent in Nervous Activity | Classic (~25k) | First formal model of artificial neurons | Warren McCulloch, Walter Pitts | Link |
| 1950 | Computing Machinery and Intelligence | Classic (~15k) | Turing Test; framing machine intelligence | Alan Turing | Link |
| 1956 | The Logic Theorist | Classic | First AI program; symbolic reasoning as search | Allen Newell, Herbert Simon, Cliff Shaw | Link |
| 1958 | Perceptron: A Probabilistic Model for Information Storage | Classic (~8k) | Early neural network learning rule | Frank Rosenblatt | Link |
| 1959 | Programs with Common Sense (Advice Taker) | Classic | Knowledge-based reasoning, symbolic AI | John McCarthy | Link |
| 1961 | Steps Toward Artificial Intelligence | Classic (~5k) | Search, heuristics, symbolic AI agenda | Marvin Minsky | Link |
| 1969 | Perceptrons | Classic (~10k) | Limits of single-layer perceptrons | Marvin Minsky, Seymour Papert | Link |
| 1975 | A Framework for Representing Knowledge | Classic (~8k) | Frames; structured knowledge representation | Marvin Minsky | Link |
| 1977 | Knowledge Representation and Reasoning | Classic | Formal KR foundations | John McCarthy | Link |
| 1986 | Induction of Decision Trees | ~20k | ID3; decision tree learning | J. Ross Quinlan | Link |
| 1986 | Learning Representations by Back-Propagating Errors | ~60k | Backpropagation for multilayer nets | Rumelhart, Hinton, Williams | Link |
| 1986 | Explanation-Based Learning | Classic (~3k) | Symbolic learning from explanation | Tom Mitchell et al. | Link |
| 1989 | Q-learning | ~40k | Model-free reinforcement learning | Christopher Watkins | Link |
| 1989 | A Tutorial on Hidden Markov Models | ~35k | Sequence modeling, speech recognition | Lawrence Rabiner | Link |
| 1992 | Reinforcement Learning: An Introduction | Very High (~50k+) | Formal RL framework | Richard Sutton, Andrew Barto | Link |
| 1993 | Keeping the neural networks simple by minimizing the description length of the weights | ~3k | MDL principle for neural nets | Geoffrey E. Hinton, Drew van Camp | Link |
| 1995 | Artificial Intelligence: A Modern Approach | Very High (~40k+) | Unified rational-agent view | Stuart Russell, Peter Norvig | Link |
| 1995 | Support-Vector Networks | ~50k | Margin-based learning (SVMs) | Corinna Cortes, Vladimir Vapnik | Link |
| 1997 | Long Short-Term Memory | ~80k | Solved long-term dependency problem | Sepp Hochreiter, Jürgen Schmidhuber | Link |
| 1998 | Boosting the Margin | ~25k | AdaBoost theory | Robert Schapire et al. | Link |
| 1998 | Gradient-Based Learning Applied to Document Recognition | ~52k | Convolutional nets & backprop for vision | Yann LeCun et al. | Link |
| 2001 | Random Forests | ~90k | Ensemble learning | Leo Breiman | Link |
| 2003 | Latent Dirichlet Allocation | ~45k | Probabilistic topic modeling | David Blei, Andrew Ng, Michael Jordan | Link |
| 2004 | A Tutorial Introduction to the Minimum Description Length Principle | ~6k | MDL principle overview | Peter Grünwald | Link |
| 2006 | A Fast Learning Algorithm for Deep Belief Nets | ~35k | Deep unsupervised pretraining | Geoffrey Hinton et al. | Link |
| 2008 | MapReduce: Simplified Data Processing on Large Clusters | ~35k | Distributed data processing paradigm | Jeffrey Dean, Sanjay Ghemawat | Link |
| 2008 | Machine Super Intelligence | ~500 | AIXI and universal intelligence | Shane Legg | Link |
| 2009 | ImageNet: A Large-Scale Hierarchical Image Database | ~45k | Data-driven deep learning era | Jia Deng et al. | Link |
| 2011 | Scikit-Learn: Machine Learning in Python | ~70k | Premier ML library for practitioners | Fabian Pedregosa et al. | Link |
| 2011 | The First Law of Complexodynamics | ~200 | Complexity evolution in systems | Scott Aaronson | Link |
| 2012 | A Few Useful Things to Know About Machine Learning | ~15k | Practical ML principles | Pedro Domingos | Link |
| 2012 | ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) | ~180k | Deep learning breakthrough in vision | Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton | Link |
| 2013 | Kolmogorov Complexity and Algorithmic Randomness | ~800 | Algorithmic information theory | Alexander Shen, Vladimir Uspensky, Nikolay Vereshchagin | Link |
| 2013 | Playing Atari with Deep Reinforcement Learning | ~12k | Deep Q-learning for games | Volodymyr Mnih et al. | Link |
| 2014 | Dropout: A Simple Way to Prevent Neural Networks from Overfitting | ~70k | Regularization for deep nets | Nitish Srivastava et al. | Link |
| 2014 | Adam: A Method for Stochastic Optimization | ~230k | Default deep learning optimizer | Diederik Kingma, Jimmy Ba | Link |
| 2014 | Generative Adversarial Networks | ~130k | Adversarial generative modeling | Ian Goodfellow et al. | Link |
| 2014 | Recurrent Neural Network Regularization | ~7k | RNN training techniques | Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals | Link |
| 2014 | Quantifying the Rise and Fall of Complexity in Closed Systems: the Coffee Automaton | ~100 | Complexity dynamics formalization | Scott Aaronson, Sean Carroll, Lauren Ouellette | Link |
| 2014 | Neural Turing Machines | ~3k | Memory-augmented neural nets | Alex Graves, Greg Wayne, Ivo Danihelka | Link |
| 2014 | DeepFace: Closing the Gap to Human-Level Performance in Face Verification | ~12k | Deep learning for face recognition | Yaniv Taigman et al. | Link |
| 2014 | Neural Machine Translation by Jointly Learning to Align and Translate | ~45k | Attention mechanism for NMT | Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio | Link |
| 2014 | Sequence to Sequence Learning with Neural Networks | ~50k | Seq2seq architecture | Ilya Sutskever, Oriol Vinyals, Quoc Le | Link |
| 2014 | Show and Tell: A Neural Image Caption Generator | ~8k | Image captioning with deep learning | Oriol Vinyals et al. | Link |
| 2014 | DeepSpeech: Scaling up end-to-end speech recognition | ~4k | End-to-end speech recognition | Awni Hannun et al. | Link |
| 2015 | Human-Level Control Through Deep Reinforcement Learning (DQN) | ~40k | Deep RL + perception | Volodymyr Mnih et al. | Link |
| 2015 | Deep Residual Learning for Image Recognition (ResNet) | ~150k | Very deep networks via skip connections | Kaiming He et al. | Link |
| 2015 | Very Deep Convolutional Networks (VGG) | ~95k | Deep convolutional vision models | Karen Simonyan, Andrew Zisserman | Link |
| 2015 | Batch Normalization | ~45k | Faster & stable deep training | Sergey Ioffe, Christian Szegedy | Link |
| 2015 | Deep Learning (Survey) | ~60k | Overview of deep representation learning | Yann LeCun, Yoshua Bengio, Geoffrey Hinton | Link |
| 2015 | Faster R-CNN: Towards Real-Time Object Detection | ~55k | Region proposal for detection | Shaoqing Ren et al. | Link |
| 2015 | The Unreasonable Effectiveness of Recurrent Neural Networks | Blog Post | RNN capabilities and applications | Andrej Karpathy | Link |
| 2015 | Understanding LSTM Networks | Blog Post | LSTM architecture explanation | Christopher Olah | Link |
| 2015 | Pointer Networks | ~3k | Attention-based output mechanism | Oriol Vinyals, Meire Fortunato, Navdeep Jaitly | Link |
| 2015 | Order Matters: Sequence to sequence for sets | ~2k | Set-to-sequence learning | Oriol Vinyals, Samy Bengio, Manjunath Kudlur | Link |
| 2015 | Multi-Scale Context Aggregation by Dilated Convolutions | ~8k | Dilated convolutions | Fisher Yu, Vladlen Koltun | Link |
| 2015 | Deep Speech 2: End-to-End Speech Recognition in English and Mandarin | ~3k | Multilingual speech recognition | Baidu Research | Link |
| 2015 | A Neural Algorithm of Artistic Style | ~15k | Neural style transfer | Leon Gatys, Alexander Ecker, Matthias Bethge | Link |
| 2015 | Deep Reinforcement Learning with Double Q-learning | ~12k | Improved Q-learning | Hado van Hasselt, Arthur Guez, David Silver | Link |
| 2016 | XGBoost: A Scalable Tree Boosting System | ~35k | Industrial-grade boosting | Tianqi Chen, Carlos Guestrin | Link |
| 2016 | TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems | ~45k | Scalable ML software platform | Martín Abadi et al. | Link |
| 2016 | Identity Mappings in Deep Residual Networks | ~15k | Improved ResNet design | Kaiming He et al. | Link |
| 2016 | WaveNet: A Generative Model for Raw Audio | ~8k | Audio generation with deep learning | Aäron van den Oord et al. | Link |
| 2016 | Neural Architecture Search with Reinforcement Learning | ~6k | Automated neural architecture design | Barret Zoph, Quoc Le | Link |
| 2017 | Attention Is All You Need | ~180k | Transformer architecture | Ashish Vaswani et al. | Link |
| 2017 | Proximal Policy Optimization (PPO) | ~25k | Stable policy-gradient RL | John Schulman et al. | Link |
| 2017 | Neural Message Passing for Quantum Chemistry | ~4k | Graph neural networks for chemistry | Justin Gilmer et al. | Link |
| 2017 | A Simple Neural Network Module for Relational Reasoning | ~3k | Relation networks | Adam Santoro et al. | Link |
| 2017 | Variational Lossy Autoencoder | ~1k | Improved VAE objective | Xi Chen et al. | Link |
| 2017 | A Survey of Deep Reinforcement Learning Techniques | ~2k | Deep RL overview | Yuxi Li | Link |
| 2017 | DeepFM: A Factorization-Machine based Neural Network for CTR Prediction | ~5k | CTR prediction with factorization | Huifeng Guo et al. | Link |
| 2017 | Neural Style Transfer: A Review | ~800 | NST comprehensive survey | Yongcheng Jing et al. | Link |
| 2017 | Deep Reinforcement Learning from Human Preferences | ~3k | RLHF foundation | Paul Christiano et al. | Link |
| 2017 | Deep Learning based Recommender System: A Survey and New Perspectives | ~1.2k | Recommender systems with deep learning | Shuai Zhang, Lina Yao, Aixin Sun, Yi Tay | Link |
| 2017 | Neural Collaborative Filtering | ~5k | Deep learning for collaborative filtering | Xiangnan He et al. | Link |
| 2017 | AlphaGo Zero: Mastering the game of Go without human knowledge | ~15k | Self-play RL without human data | David Silver et al. | Link |
| 2017 | VQ-VAE: Neural Discrete Representation Learning | ~3k | Discrete latent representations | Aäron van den Oord, Oriol Vinyals, Koray Kavukcuoglu | Link |
| 2018 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ~95k | Bidirectional language pretraining | Jacob Devlin et al. | Link |
| 2018 | The Illustrated Transformer | Blog Post | Transformer visualization and explanation | Jay Alammar | Link |
| 2018 | Relational Recurrent Neural Networks | ~1k | Memory and relational reasoning | Adam Santoro et al. | Link |
| 2018 | YOLOv3: An Incremental Improvement | ~40k | Object detection improvements | Joseph Redmon, Ali Farhadi | Link |
| 2019 | GPT-2: Language Models are Unsupervised Multitask Learners | ~25k | Scaling transformers | Alec Radford et al. | Link |
| 2019 | The Bitter Lesson | Essay | Computation and learning vs hand-coding | Rich Sutton | Link |
| 2019 | GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism | ~2k | Model parallelism technique | Yanping Huang et al. | Link |
| 2020 | Language Models are Few-Shot Learners (GPT-3) | ~50k | Emergent in-context learning | Tom Brown et al. | Link |
| 2020 | An Image Is Worth 16×16 Words: Transformers for Image Recognition (ViT) | ~35k | Transformers for vision | Alexey Dosovitskiy et al. | Link |
| 2020 | Scaling Laws for Neural Language Models | ~12k | Predictable scaling behavior | Jared Kaplan et al. | Link |
| 2020 | Denoising Diffusion Probabilistic Models | ~35k | Diffusion-based generation | Jonathan Ho, Ajay Jain, Pieter Abbeel | Link |
| 2020 | Dense Passage Retrieval for Open-Domain Question Answering | ~4k | Dense retrieval for QA | Vladimir Karpukhin et al. | Link |
| 2020 | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | ~8k | RAG architecture | Patrick Lewis et al. | Link |
| 2021 | Decision Transformer: Reinforcement Learning via Sequence Modeling | ~6k | RL as sequence modeling | Lili Chen et al. | Link |
| 2021 | Highly Accurate Protein Structure Prediction with AlphaFold | ~35k | Solved protein folding | John Jumper et al. | Link |
| 2021 | Zero-Shot Text-to-Image Generation (DALL-E) | ~5k | DALL-E model | Aditya Ramesh et al. | Link |
| 2022 | Training Language Models to Follow Instructions with Human Feedback (InstructGPT) | ~18k | RLHF alignment | Long Ouyang et al. | Link |
| 2022 | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | ~12k | Reasoning via prompting | Jason Wei et al. | Link |
| 2022 | PaLM: Scaling Language Modeling with Pathways | ~8k | Massive-scale LLMs | Aakanksha Chowdhery et al. | Link |
| 2022 | Constitutional AI: Harmlessness from AI Feedback | ~7k | Self-alignment via principles | Yuntao Bai et al. | Link |
| 2022 | Self-Instruct: Aligning language models with self-generated instructions | ~4k | Self-supervised instruction tuning | Yizhong Wang et al. | Link |
| 2022 | Chinchilla: Training Compute-Optimal Large Language Models | ~5k | Optimal scaling laws | Jordan Hoffmann et al. | Link |
| 2022 | Precise Zero-Shot Dense Retrieval Without Relevance Labels | ~1k | Hypothetical document embeddings | Luyu Gao et al. | Link |
| 2023 | Segment Anything | ~8k | Foundation models for vision | Alexander Kirillov et al. | Link |
| 2023 | LLaMA: Open and Efficient Foundation Language Models | ~15k | Open-weight LLM paradigm | Hugo Touvron et al. | Link |
| 2023 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | ~6k | Emergent general capabilities | Sébastien Bubeck et al. | Link |
| 2023 | Direct Preference Optimization: Your Language Model is Secretly a Reward Model | ~5k | Simpler alignment than RLHF | Rafael Rafailov et al. | Link |
| 2023 | Understanding Deep Learning | Textbook | Comprehensive deep learning textbook | Simon J.D. Prince | Link |
| 2023 | Zephyr: Direct Distillation of LM Alignment | ~1k | Distilled alignment models | Lewis Tunstall et al. | Link |
| 2023 | Lost in the Middle: How Language Models Use Long Contexts | ~2k | Context window utilization | Nelson F. Liu et al. | Link |
| 2023 | Alpaca: A Strong, Replicable Instruction-Following Model | ~3k | Low-cost instruction tuning | Stanford CRFM | Link |
| 2023 | Llama 2: Open Foundation and Fine-Tuned Chat Models | ~10k | Open commercial LLMs | Hugo Touvron et al. | Link |
| 2023 | LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | ~800 | Extended context training | Yukang Chen et al. | Link |
| 2023 | Are Emergent Abilities of Large Language Models a Mirage? | ~1k | Critique of emergence claims | Rylan Schaeffer et al. | Link |
| 2023 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | ~2k | State space models for sequences | Albert Gu, Tri Dao | Link |
| 2023 | QLoRA: Efficient Finetuning of Quantized LLMs | ~3k | Quantized model fine-tuning | Tim Dettmers et al. | Link |
| 2023 | Reflexion: Language Agents with Verbal Reinforcement Learning | ~1k | Self-reflection for agents | Noah Shinn et al. | Link |
| 2023 | Explainability for Large Language Models: A Survey | ~800 | LLM interpretability overview | Haiyan Zhao et al. | Link |
| 2024 | Gemini: A Family of Highly Capable Multimodal Models | Rapidly growing (~5k) | Native multimodal foundation models | Google DeepMind | Link |
| 2024 | AlphaFold 3 | ~3k | Molecular & interaction prediction | DeepMind | Link |
| 2024 | MiniGPT-4: Enhancing Vision-Language Understanding | ~2k | Open multimodal LLMs | Deyao Zhu et al. | Link |
| 2024 | Representation Engineering: A Top-Down Approach to AI Transparency | Emerging (~500) | Steering internal representations | Andy Zou et al. | Link |
| 2024 | Better & Faster Large Language Models Via Multi-token Prediction | ~800 | Multi-token prediction training | Fabian Gloeckle et al. | Link |
| 2024 | KAN: Kolmogorov-Arnold Networks | ~1k | Alternative to MLPs | Ziming Liu et al. | Link |
| 2025 | LLMs Will Always Hallucinate, and We Need to Live With This | ~200 | Establishes the mathematical certainty of hallucinations | Banerjee et al. | Link |
| 2025 | DINOv3 | ~200 | Vision Foundation Models | O Siméoni et al. | Link |
CC0-1.0 License - See LICENSE file for details. Individual papers remain under their respective copyrights.