We present scalable iterative solvers and preconditioning strategies for Hybridizable Discontinuous Galerkin (HDG) discretizations of partial differential equations (PDEs) on graphics processing units (GPUs). The HDG method is implemented using GPU-tailored algorithms in which local element degrees of freedom are eliminated in parallel, and the globally condensed system is assembled directly on the device using dense-block operations. The global matrix is stored in a block format that reflects the natural HDG structure, enabling all iterative solver kernels to be executed with strided batched dense matrix-vector multiplications. This implementation avoids sparse data structures, increases arithmetic intensity, and sustains high memory throughput across a range of meshes and polynomial orders. The nonlinear solver combines Newton's method with preconditioned GMRES, integrating scalable preconditioners such as block-Jacobi, additive Schwarz domain decomposition, and polynomial smoothers. All preconditioners are implemented in batched form with architecture-aware optimizations--including dense linear algebra kernels, memory-coalesced vector operations, and shared-memory acceleration--to minimize memory traffic and maximize parallel occupancy. Comprehensive studies are conducted for a variety of PDEs (including Poisson equation, Burgers equation, linear and nonlinear elasticity, Euler equations, Navier-Stokes equations, and Reynolds-Averaged Navier-Stokes equations) using structured and unstructured meshes with different element types and polynomial orders on both NVIDIA and AMD GPU architectures.
翻译:暂无翻译