Description
We investigate this idea in the context of a widely-used and widely-studied indexing workload: range queries over 2-dimensional points. We present an algorithm that chooses randomly among logically distinct 2-d range queries. It has constant-time expected performance over uniformly distributed data, and exhibited good performance in experiments over a variety of real and synthetic data sets.
We observe nonuniformities in the way randomly chosen logical 2-d range queries are distributed over a variety of spatial properties. This raises questions about the quality of the workloads generated from such queries. To explore this further, we contrast our approach of choosing random logical range queries with previous work that generates workloads of random spatial ranges. We highlight pros and cons of the alternate approaches, and sketch directions for future work on the robust generation of workloads for studying index performance.