Description
A main challenge in this work is minimizing the computational expense of dynamically computing synopses on intermediate tables during query optimization. We show that one only needs to construct a very small subset of all possible synopses to get the full estimation benefits of multi-dimensional modeling. We also show that choosing this subset unwisely can have unacceptable performance overheads. This leads to an "estimation planning" problem that has been ignored to date in literature: how to choose the most efficiently-computable collection of intermediate synopses to construct, so that all the required cardinality estimates can be computed as accurately as possible from the base-table synopses?
In this paper, we identify and formalize the estimation planning problem in the context of a System R-style optimizer extended with complex synopsis techniques such as multi-dimensional histograms. We propose algorithms that find very good estimation plans efficiently, and discuss properties of synopsis structures that make them amenable to efficient use in an optimizer.