In order for this struct to compile in the GPU kernel,
https://github.com/SciML/PSOGPU.jl/blob/7bbd997fb8f00e032d31224619d0e31fe716b812/src/PSOGPU.jl#L9-L15
The type T1 needs to be a static array. However, this will cause issues with high dimensional problems, say >100; the performance would not be great and might even fail to compile. A workaround for this might be to simply destructure the PSOParticle and initialize matrices for position, velocity, cost, best_position, best_cost, simply cudaconvert it to pass them to GPU kernel and update each view in the thread. This will allow us to work with high-dimensional parameters. The difference is somewhat similar to the difference between EnsembleGPUArray and EnsembleGPUKernel.
@ChrisRackauckas I believe this idea might work for any NN-based optimization. And always prefer the current implementation for low-dimensional ODE parameter estimation. Any thoughts?