GitHub - kaitosoga/ann-ma: What intrinsic limitations in explainability, the field concerned with understanding an AI's internal decision-making, arise from architectural differences to human reasoning.

Beyond Correlation: How the Explainability of Artificial Neural Networks Misaligns with Human Cognition

Kaito Soga
Betreuung: Lesther Zulauf
Fach: Informatik / Mathematik

Oberthema
What fundamental limitations in explainability, the field concerned with understanding an AI's internal decision-making, arise from architectural differences to human reasoning.

Themeneingrenzung
Neural networks, the AI technology behind applications from protein folding to image generation, have become so complex that while we know what they can do - sometimes even outperforming humans - we barely understand how they do it. Their internal decision-making processes remain largely unclear, earning them the title of "black boxes". A clearer view of these networks can improve our understanding of and trust in AI systems.

In my high school thesis (Maturaarbeit), I will compare models of identical feedforward neural network (FFN) architectures and visualise how internally different they can become, despite achieving similar accuracy on the same task. I will investigate why such differences emerge and what makes it so hard to explain the internal decision-making of neural networks. My objective is to analyse and explain:

What defines explainability.
Why different instances of FFNs develop distinct decision pathways and internal representations despite solving the same task - and what this means for explainability.
What makes it challenging to explain such decisions in human terms.

I will use optical character recognition (OCR) as the training task, choosing FFNs over convolutional neural networks (CNNs) to avoid the spatial biases from convolutional layers that are already visually interpretable. This allows me to focus on how fully connected networks internally represent and process visual information.

Fragestellung

What problems does the field of explainability concern itself with?
Where do the limitations that contribute to these problems originate?
Why do current neural architectures work as well as they do, despite these limitations?
How could resolving these limitations be valuable?

Vorgehen

Preparation / Planning
a. Study mathematical foundations and related computer vision techniques b. Implement all computational experiments using Python with PyTorch, NumPy, Matplotlib, OpenGL, and other tools for visualisations c. Find optimal FFN architecture for experiments (layers, layer sizes, batch sizes) d. Train and optimise FFNs with identical architectures on binary MNIST classification (e.g., "3" or not), keep the hyperparameters fixed for further experiments
Map functional neurons and activation patterns
a. Find relevant relations of neurons to the input using activation maximisation and layer-wise Relevance Propagation (LRP) b. Map activation patterns and dimensionality reduction c. Experiment with input perturbation, importance and function of neurons, and sub-networks d. Cluster neurons by spatial and functional similarity e. Compute metrics for differences and similarities between models
Theoretical Arguments / Experiments
a. Gather theoretical assumptions about the architecture of FFNs b. Use the theory to argue and predict what might cause the limitations in explainability c. Test these theoretical hypotheses
Visualisation / Explanation
a. Visualise results to make them intuitive b. Find and visualise decision boundaries c. Find simpler patterns / concepts that define how strongly a neuron activates d. Explain local and global decisions with decision graphs or spatial labelling
Interpretation
a. Expand to bigger dataset to test scalability b. Formalise results for analysis and interpretation (e.g., comparison to baseline methods, identified limitations, suggesting improvement in outlook) c. Conclude formal findings in LaTeX and use existing research to support and compare my findings. Persönlicher Beitrag

All coding, experimentation, analysis, and interpretation will be developed and implemented by me.
I will research existing methods and theory, conclude new perspectives, compare with my experiments, and formulate new findings.

Quellen und Material
Books / Papers:

Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Pearson.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Colin, J., et al. (2022). What I cannot predict, I do not understand: A human-centered evaluation framework for explainability methods. arXiv preprint arXiv:2208.09725
Wu, M., et al. (2023). VERIX: Towards verified explainability of deep neural networks. arXiv preprint arXiv:2306.09931
Bach, J. (2019). Phenomenal Experience and the Perceptual Binding State
Mohan, D. M., et al. (2016). Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier
Molnar, C. (2019). Interpretable machine learning: A guide for making black box models explainable
Erhan, D., Courville, A., & Bengio, Y. (2010). Understanding representations learned in deep architectures

Code Examples:

Tools:

Overleaf
Google Collab
GitHub
GPU-accelerated python libraries

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.idea		.idea
.unorganised_code		.unorganised_code
media		media
omitted_experiments		omitted_experiments
paper_media		paper_media
stored_model_weights		stored_model_weights
stored_model_weights_binary		stored_model_weights_binary
.directory		.directory
.gitignore		.gitignore
02_ffn_mnist_full.ipynb		02_ffn_mnist_full.ipynb
02_ffn_mnist_full.pth		02_ffn_mnist_full.pth
02_ffn_mnist_full_mod.pth		02_ffn_mnist_full_mod.pth
03_baseline_methods.ipynb		03_baseline_methods.ipynb
04_neuron_attention_instance_variation.ipynb		04_neuron_attention_instance_variation.ipynb
06_instance_variation_activations.ipynb		06_instance_variation_activations.ipynb
07_rule_testing.ipynb		07_rule_testing.ipynb
LICENSE		LICENSE
README.md		README.md
e		e
paper.pdf		paper.pdf
paper.tex		paper.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Correlation: How the Explainability of Artificial Neural Networks Misaligns with Human Cognition

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Beyond Correlation: How the Explainability of Artificial Neural Networks Misaligns with Human Cognition

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages