Non-uniform sampling of the Fourier transform appears in many important applications such as magnetic resonance imaging (MRI), optics, tomography and radio interferometry. Computing the inverse often requires fast application of the non-uniform discrete Fourier transform (NUDFT) and its adjoint operation. Non-Uniform Fast Fourier Transform (NUFFT) methods, such as gridding/regridding, are approximate algorithms which often leverage the highly-optimized Fast Fourier Transform (FFT) and localized interpolations. These approaches require selecting several parameters, such as interpolation and FFT grid sizes, which affect both the accuracy and runtime. In addition, different implementations lie on a spectrum of precomputation levels, which can further speed up repeated computations, with various trade-offs in planning time, execution time and memory usage. Choosing the optimal parameters and implementations is important for performance speed, but difficult to do manually since the performance of NUFFT is not well-understood for modern parallel processors. Inspired by the FFTW library, we demonstrate an empirical auto-tuning approach for the NUFFT on General Purpose Graphics Processors Units (GPGPU). We demonstrate order-of-magnitude speed improvements with auto-tuning compared to typical default choices. Our auto-tuning is implemented in an easy to use proof-of-concept library called gNUFFTW, which leverages existing open-source NUFFT packages, cuFFT and cuSPARSE libraries, as well as our own NUFFT implementations for high performance.