Writing high-level code for parallel high-performance stencil computations
ParallelStencil.jl ParallelStencil empowers domain scientists to write architecture-agnostic high-level code for parallel high-performance stencil computations on GPUs and CPUs. Performance similar to CUDA C can be achieved, which is typically a large improvement over the performance reached when using only CUDA.jl Array programming. For example, a 2-D shallow ice solver presented at JuliaCon 2020 [1] achieved a nearly 20 times better performance than a corresponding CUDA.jl Array programming implementation; in absolute terms, it reached 70% of the theoretical upper performance […]
Read more