This is a project to develop a neural network which can find preconditioner matrices for solving linear systems of equations via the Preconditioned Conjugate Gradient method (PCG) under certain constraints.
As we progress through the setup/motivation, the problem becomes more and more mathematical. Depending on your maths background, you can feel free to stop whenever things start sounding like nonsense.
High-level summary: We aim to make use of a neural network to help us solve linear systems of equations.
Given a matrix
This is known as a linear system of equations, and these kinds of equations have unfathomably-many uses and applications across virtually any number of industries you can think of; thus, a lot of research has been done into the most efficient ways to solve these problems. In practice, it is usually nicer to try to approximate the solution (i.e we find
A square matrix
Unfortunately, this method can be very slow to converge to a solution if the matrix
Let
Thus, quantifiably,
Suppose we could find a matrix
*(You may have noticed that
This process, whereby we first apply a preconditioner matrix
You came to read about neural networks and cool AI stuff and I just made you sit through a wall of text about linear algebra and iterative methods. I promise, this is only partly because of my agenda and sworn duty as a maths guy to practice a form of mathematical evangelism; mostly, I think the problem we are trying to solve should be clear.
Currently, a neural network cannot outperform classical iterative methods end-to-end. In fact, given a total arbitrary matrix
In summary, if we expend the significant, initial cost of training a neural network to give us suitable preconditioner matrices, we can then call on it as needed prior to performing PCG and gradually save time in the macro sense. This is the objective of the project.
Will this particular neural network ultimately live up to these high expectations? Given my resources, both computationally and intellectually, I think I will take the under. Nonetheless, it will be a great learning experience for someone passionate about scentific computation.
The domain of the PDE from which our matrices arise is a discrete
We will train our model on a sample of 500 such matrices. This was based on some rough back-of-the-envelope calculations based on how much computing time and memory seems appropriate for this project. Furthermore, due to locality properties of this PDE, each training matrix encodes a lot of information, so we shouldn't necessarily need a massive training set to see decent results.
Recall that the diffusion equation models the distribution of a diffusing material in a medium. A canonical example of its use is to model the diffusion of a dye in a liquid. Intuitively, then, we can see that this problem has locality and translation-invariance properties (as mentioned above). Thus, it is natural to use a form of neural network tailored exactly to this kind of problem; namely, a convolutional neural network.
We shall divide this project into several phases.
This phase will consist of training and validating on a sample of 200
This phase will consist of training and validating on a sample of 300
TBD Bu projeye Rodi-Dervis olarak katkıda bulunmaya başladım.